Online Dictionary Matching for Streams of XML Documents - Theoretical Computer Science
Conference Papers Year : 2010

Online Dictionary Matching for Streams of XML Documents

Abstract

We consider the online multiple-pattern matching problem for streams of XML documents, when the patterns are expressed as linear XPath expressions containing child operators (/), descendant operators (//) and wildcards (*) but no predicates. For each document in the stream, the task is to determine all occurrences in the document of all the patterns. We present a general multiple-pattern-matching algorithm that is based on a backtracking deterministic finite automaton derived from the classic Aho-Corasick pattern-matching automaton. This automaton is of size linear in the sum of the sizes of the XPath patterns, and the worst-case time bound of the algorithm is better than the time bound of the simulation of linear-size nondeterministic automata. In addition to the worst-case-efficient general solution we present an algorithm with a simple backtracking mechanism that works extremely well for cases in which the backtracking stack remains low. Our experiments show that, when applied to filtering, this simple algorithm scales well as regards the number of patterns (or filters) and is competitive with YFilter, a widely accepted software for XML filtering.
Fichier principal
Vignette du fichier
03230155.pdf (134.88 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-01054462 , version 1 (06-08-2014)

Licence

Identifiers

Cite

Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen. Online Dictionary Matching for Streams of XML Documents. 6th IFIP TC 1/WG 2.2 International Conference on Theoretical Computer Science (TCS) / Held as Part of World Computer Congress (WCC), Sep 2010, Brisbane, Australia. pp.153-164, ⟨10.1007/978-3-642-15240-5_12⟩. ⟨hal-01054462⟩
85 View
118 Download

Altmetric

Share

More