Home /Personnel /Assistant Professors /Jesús Arias Fisteus | |
Hashing and canonicalizing Notation 3 graphs
Jesús Arias Fisteus, Norberto Fernández García, Luis Sánchez Fernández, Carlos Delgado Kloos
Journal of Computer and System Sciences 76 (2010), pp. 663-685. doi:10.1016/j.jcss.2010.01.003
Abstract: This paper presents a hash and a canonicalization algorithm for Notation 3 (N3) and Resource Description Framework (RDF) graphs. The hash algorithm produces, given a graph, a hash value such that the same value would be obtained from any other equivalent graph. Contrary to previous related work, it is well--suited for graphs with blank nodes, variables and subgraphs. The canonicalization algorithm outputs a canonical serialization of a given graph (i.e. a canonical representative of the set of all the graphs that are equivalent to it). Potential applications of these algorithms include, among others, checking graphs for identity, computing differences between graphs and graph synchronization. The former could be specially useful for crawlers that gather RDF/N3 data from the Web, to avoid processing several times graphs that are equivalent. Both algorithms have been evaluated on a big dataset, with more than 29 million triples and several millions of subgraphs and variables.
Preprint: fisteus-hash-n3.pdf
NOTICE: this is the author's version of a work that was accepted for publication in Journal of Computer and System Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computer and System Sciences (Volume 76, Issue 7, November 2010, Pages 663-685) with DOI: 10.1016/j.jcss.2010.01.003. When citing this work, please use the reference above.