The Ring: Worst-Case Optimal Graph Joins in Compressed Space
Gonzalo Navarro
A recent development in the efficient processing of database queries is that of worst-case optimal (wco) join algorithms, which process join queries in time proportional to the AGM bound: the maximum possible output size produced by a join query over a relational database of a certain size. Such algorithms can be strictly better than any traditional query plan using pairwise joins. Leapfrog-TrieJoin (LTJ) is a seminal join algorithm that can be wco if we index the attributes of the relations in every possible order. For the important case of labeled (or RDF) graphs, this means indexing all the 3! = 6 possible orders of the (subject,predicate,object) triples. This blowup in space prevents using LTJ in large graph databases.
In this talk I will show how regarding the triples as cyclic bidirectional strings and indexing them in a BWT-like form enables the wco LTJ algorithm within the space of the raw data, and even less. Our ring index, consisting of wavelet trees representing the BWT transformed set, gives the needed support for all the LTJ operations. I will show experimentally that the ring competes with state-of-the-art graph database managers while using a fraction of their space. I will also show that, on tables of d attributes, the ring requires indexing far less than d! orders, making wco
algorithms realistic for much wider tables.