Libart

I have found an ART implementation working for strings: armon/libart: Adaptive Radix Trees implemented in C. Here are the benchmarking results, shown next to that of JudySL, and the L2 Si trie I have used in similar posts. All the three data structures are string to pointer maps.

For a brief description of the data sets, see my earlier The New Indexers, Masstree post.

	Item Count	Byte Size	Key Average Size (bytes)
PKD	9.158M	104MB	10.33
ENW	10M	212MB	20.23
URL	6.795M	978MB	143

The memory usage is given in MB.

	Libart Insert	JudySL Insert	L2 Si Insert	Libart Lookup	JudySL Lookup	L2 Si Lookup	Libart Mem Usage	JudySL Mem Usage	L2 Si Mem Usage
PKD	8.7s	6.5s	5.83s	6.01s	5.1s	4.07s	703	409	186
ENW	12.2s	9.6s	8s	10s	8.4s	7.4s	912	523	309
URL	13.5s	12s	7.7s	11.2s	11s	6.91s	1514	412	248

I don’t like these results much. To be sure, I never expected ART to be very good. Checking summarily the libart code, I guess(?) that it is doing some sort of path compression. The implementation seems to follow closely the paper: you get the four node types, with their suggested sizes, SSE2 searching of the Node16 array, et al.

(The numbers for L2 Si are not the same as those from the post quoted earlier, as I had rerun the benchmark.)

What libart does, and other ART implementations don’t do, is to store both the key and the value. The memory figures given for it (for all the three tries, in fact) include storing the key. The JudySL and L2 Si tries store common prefixes only once, and they are able to reconstitute the original keys from pieces. If some of the memory usage figures are below the data size, it’s not because they lose the data — it’s because they compress it. By contrast, libart allocates a leaf (value + key length + actual key) for each stored key. So, for example, out of the total 1.5GB taken for the last set, about 1G is used to keep the original key (and value) data.

For a nicer comparison with pretty plots between the JudySL and L2 Si tries, see String Associative Tables, III and String Associative Tables, IV.

String Organizers

What’s a good data structure for storing strings? What’s a bad one?

It all depends on use, of course. And one selects a data structure accordingly.

But it’s fair to say good time and space characteristics are important. Especially when the data starts to get big.

So is Libart good or bad? Is JudySL good or bad? Are these the “tries” we are looking for? What is a “trie” anyhow? And should we look at alternatives?

It makes sense to look at a basic trie implementation — a simple, but not ridiculously naive one. An implementation that takes some time to code, but not much.

As for alternatives, the tries are the choice when talking about sorted collections. We should not even consider hashes (because they’re not sorted collections.) But what if we did?

I have two more contenders, for which I’ll show figures next to that of Libart.

	Libart Insert	JudyHS Insert	L2 Sf Insert	Libart Lookup	JudyHS Lookup	L2 Sf Lookup	Libart Mem Usage	JudyHS Mem Usage	L2 Sf Mem Usage
PKD	8.7s	4.72s	6.47s	6.01s	3.6s	6.25s	703	383	438
ENW	12.2s	5.1s	10.6s	10s	3.8s	11.13s	912	593	655
URL	13.5s	5.06s	9.92s	11.2s	4.57s	11.17s	1514	1284	559

The L2 Sf trie is fairly basic, and most unimaginative. It uses a single type of tree node representation — nothing adaptive about it. The keys and pointers are stores in sorted arrays, which are grown as data is added. The data structure is conservative in its memory usage.

JudyHS is a hash built over other Judy data structures. It’s not a trie. It’s not ordered. It has very little functionality beyond adding and querying data. Hashes are good when you don’t require the functions only a sorted collection can offer. Or are they? Hashes don’t know what to do with strings, they are not able to exploit data redundancy for time and space gains.

One of the main reasons I started looking at tries is exactly that. A trie can store a set of string compactly. It can strip the common prefixes and retain the free suffixes. Very useful when the former are long.

Libart

String Organizers

Published by nobody

Leave a comment Cancel reply

String Organizers

Share this:

Related

Published by nobody

Leave a comment Cancel reply