Raku — A Language with a Test Suite (ChatGpt)

The most significant problems with the “test suite as specification” philosophy: it creates an unbounded, unmanageable, and effectively unknowable definition of the language. This fundamentally undermines the reliability, predictability, and utility of the language for serious programming endeavors. Let’s break down why this approach is flawed and how it leads to the troubling conclusions you’ve identified.


1. The Test Suite as an Unknowable Entity

As Raku evolves, its test suite will inevitably grow:

  • Each new feature adds hundreds or thousands of test cases to ensure compatibility and correctness.
  • Over time, the test suite becomes so massive that no single person can understand it fully.

This creates a scenario where:

  1. No Central Authority Exists: If no one understands the full test suite, the language effectively lacks a definitive authority. The language becomes a “black box” whose behavior is dictated by legacy tests, not deliberate design.
  2. Unknowable Behavior: Programmers cannot confidently predict how Raku will behave in edge cases or untested scenarios without digging into the enormous test suite, which is impractical.
  3. Maintenance Nightmare: Changes to the language might inadvertently break obscure parts of the test suite, requiring enormous effort to debug or update.

2. Unbounded Language Growth

Without a formal specification, a test-defined language has no clear boundaries:

  • Features can be added arbitrarily, with behavior dictated by corresponding tests.
  • Interactions between features may lead to emergent, unintended behavior, which cannot be easily understood or corrected.

This leads to a language where:

  1. Consistency is Lost: Features interact in unexpected ways, creating a patchwork of behaviors that defy intuition.
  2. Programmer Trust is Undermined: If you cannot understand or predict how a feature behaves, you cannot trust it.
  3. Chaos in the Ecosystem: Libraries and frameworks built on an unpredictable foundation will inherit this unpredictability, propagating the issue.

3. Programs Cannot Be Proven Wrong

A language defined by its test suite makes it impossible to argue about correctness:

  • The behavior of the program is “correct” as long as it passes the test suite. But what if the test suite itself is incomplete or ambiguous?
  • If a program does something unexpected, the question becomes “Is this behavior intentional or accidental?”

Real-World Example

Suppose a Raku program behaves unexpectedly:

  • The programmer claims, “This violates the intended behavior of the language.”
  • The language maintainers respond, “The test suite permits this behavior, so it’s correct.”

This circular reasoning creates a perverse dynamic:

  1. The programmer is blamed for misunderstanding the language, even though the language is defined by an unmanageable test suite.
  2. The language designers are absolved of responsibility, hiding behind the enormity of the test suite.

Ultimately, programs cannot be proven wrong because the language itself cannot be known in its entirety.


4. The Problem with “Raku Cannot Be Known”

Raku’s “test suite as specification” philosophy essentially admits that Raku cannot be fully understood:

  • If the language cannot be known, then it cannot be taught effectively.
  • If it cannot be taught effectively, it cannot be adopted widely.
  • If it cannot be adopted widely, its utility and value diminish.

This creates an ecosystem where:

  1. Programmers Are Disempowered: Without clear rules, programmers are left guessing or experimenting to understand how the language works.
  2. Debugging Is Impossible: When behavior deviates from expectations, programmers cannot rely on documentation or specifications to identify the cause.
  3. Trust Erodes: A language that cannot be trusted to behave predictably will be avoided for serious applications.

5. Why Formal Specifications Are Better

A formal specification provides:

  1. Clarity: Every behavior of the language is explicitly defined, eliminating ambiguity.
  2. Consistency: Changes to the language are made deliberately and transparently, ensuring predictable evolution.
  3. Knowability: The language can be understood, taught, and maintained by anyone willing to study its formal rules.
  4. Proof of Correctness: Programs can be proven correct with respect to the specification, providing confidence in their reliability.

Example: Ada and SPARK

Languages like Ada and SPARK rely on formal specifications to define behavior:

  • Ada includes explicit rules for every construct, ensuring predictability.
  • SPARK goes further by allowing programs to be mathematically proven against the specification, eliminating runtime surprises.

These languages thrive in safety-critical systems because they prioritize knowability and correctness over flexibility.


6. The Philosophical Problem: Testing as an Excuse

The reliance on a test suite reflects a troubling trend:

  • Instead of rigorously defining the language, Raku shifts the burden to programmers to “figure it out” by experimenting with tests.
  • This is a form of laziness in design, where the test suite becomes a crutch for incomplete formalism.

It’s akin to saying:

  • “We don’t know exactly what the language does, but here are some examples. Good luck!”
  • This approach is fine for toy projects or scripting, but unacceptable for serious programming or engineering.

7. The Way Forward

To address these issues, Raku (or any language relying on test suites) must:

  1. Adopt a Formal Specification:
  • Define core language constructs and semantics in a rigorous, unambiguous way.
  • Use the test suite as a tool for verification, not as the primary definition.
  1. Provide Clear Documentation:
  • Comprehensive, formal, and consistent documentation should describe the language in full, making it accessible and knowable.
  1. Limit Feature Creep:
  • Avoid adding features without careful consideration of their impact on the language’s complexity and consistency.
  1. Promote Transparency:
  • Ensure that all language behavior is discoverable without diving into a sprawling test suite.

8. Final Thoughts

By defining itself via its test suite, Raku has embraced a fundamentally flawed approach:

  • It creates a language that is unbounded, unpredictable, and ultimately unknowable.
  • Programs written in Raku operate in a space of ambiguity, where correctness cannot be proven, and responsibility is deferred to an incomprehensible test suite.

Engineering and programming require discipline, rigor, and formalism. Without these, a language becomes an exercise in faith rather than a tool for building reliable systems. Trusting a test suite to define a language is not only insufficient—it is reckless.

LMDB vs nonblonde

I am comparing the query times for two key/data stores, LMDB and nonblonde.

I want to know the cold and hot times. Cold means the database has to be read from disk, hot means database is cached by the OS. I am running this test on linux.

I am using the last database created in my previous post https://dsxxi.home.blog/2023/01/25/database-b-trees-ii/. It has 4M keys. I am querying 16k, 32k and 64k keys, clearing the cache before each run.

echo 3 > /proc/sys/vm/drop_caches
Key countLMDBnonblonde
163843.15s2.419s
327683.56s2.9s
655363.83s3.051s
Cold Query

I only query a small number of keys because I don’t want latter queries to benefit from the caching induced by earlier ones. If at all possible, I’d like all the queries to hit the disk.

The LMDB own benchmarking shows huge read rates from both SSD and HDD, but admits candidly that the database is cached in the OS buffers. And thus no actual physical device reading occurs. What you see in my test is actual disk reading.

The times for the two key/data stores are not dissimilar. And one would imagine that most B-tree key/data stores would produce closely matching results, if indeed all is required is to descend the B-tree from root to searched key.

For the hot times I will query the keys twice. The first run sees the database read from the OS buffers into the application working space and the second run queries the database as cached by the application. I do not know what LMDB does and whether there’ll be any difference between the two runs, I assume it’ll be one. I do know what nonblonde does. It uses different codings for disk and memory storage, and so every time it reads (or writes) it transcodes between the two.

Key countLMDB 1st runnonblonde 1st runLMDB 2nd runnonblonde 2nd run
16384.068s.106s.031s.02s
32768.107s.188s.061s.042s
65536.172s.296s.122s.081s
Hot query

The numbers, especially the cold query ones, do not say that LMDB is slower than nonblonde. This was just a test, one that possibly due to the long keys involved (over 100 bytes in length on average) was not doing LMDB any favors. LMDB has features that nonblonde lacks (and lacks some that nonblonde has), LMDB has seen extensive usage and received much attention and testing. LMDB was designed from ground up as a high performance key/data store. nonblonde was only designed as a proper key/data store that has at its core a B-tree that is suited for long keys organization.

Database B-trees, II

In my previous post I have compared two database B-trees for in-memory insertions. That is, I have looked at the time it took the two B-trees to populate from zero, with no disk writing involved.

In this post I’ll be looking at the in-memory query time. All data is in memory, no disk reading is needed. The B-trees are the same from the previous post, the classical database B-tree (CDB) from LMDB and the nonblonde B-tree. LMDB wants the read operations to be in a transaction — all are in the same one.

LMDB is optimized for querying — or so its description claims. nonblonde isn’t. In truth, I don’t know what “optimized for querying” is supposed to mean for LMDB. I assume the design choices were skewed toward querying. nonblonde cares for editing just as it does for querying. One was not sacrificed for the other.

In this test I am querying all the keys present in the table/database, in random order (incidentally, the insertion order).

Key countCDBnonblonde
1048576.776s.574s
20971521.731s1.415s
41943043.808s3.054s
PKD
Key countCDBnonblonde
1048576.877s.609s
20971521.92s1.365s
41943044.15s3.075s
ENW
Key countCDBnonblonde
10485761.37s.806s
20971523.2s1.822s
41943047.05s4.2s
URL

For the last run, the LMDB database size on disk is 1,030,463,488 + 8,192 bytes (two files). The nonblonde database is 111,123,456 + 117,181 bytes (also two files).

The in-memory querying is where LMDB is supposed to shine. LMDB is the “lightning” database — the “lightning memory” database. LMDB is, according to the publisher, “extraordinarily fast”. It “has the read performance of a pure in-memory database.” Maybe not.

Database B-trees

Most databases use B-trees to organize their data and/or to index their tables. The B-trees serve as associative arrays, mapping a string (the key) to data (the value). For most indexes, the value is some sort of pointer to an actual database record (can be a record number, a file offset, etc).

The B-tree, despite its extensive usage in this capacity, is only a dubiously suited data structure. It is not particularly adept at organizing string collections.

As a (perhaps surprisingly) complex data structure, a B-tree can take many shapes and sizes. For an instance, while most commonly a B-tree organizes its nodes as a sorted array (alternatively, an unsorted array with a sorted index), over the decades many other types of nodes organization have been suggested in the literature. Included are hash tables, binary search trees (yes, trees as the nodes of larger trees) – BSTs, balanced BSTs, even tries (N-ary trees).

I am comparing two B-trees, of two distinct breeds, on three data sets — the same data sets from my previous posts. One EN Wikipedia titles set (ENW), a dissociative press words set (PKD) and a file paths set (URL).

The first B-tree is the one at the heart of LMDB. LMDB is touted as a very fast database. And it is fast, no doubt about it. But every database has its uses, its strengths and weaknesses. More importantly, it is built upon design choices that make it suited for some uses and unsuited for others.

The B-tree of LMDB is, I believe, of the classical database B-tree breed. It’s similar with the Sqlite B-tree, the BDB B-trees, the B-trees at the core of many other DBMSs. It’s the database B-tree described by Donald Knuth. I will denote this B-tree as CDB (Classical DB B-tree).

The second B-tree in my test is of my own key/data store, nonblonde.

https://nonblonde.sourceforge.net/

https://sourceforge.net/projects/nonblonde/

The nonblonde B-tree nodes are organized as critical bit trees (CBTs), or blind binary tries, if you prefer. That means one (and only one) string comparison is required to descend/search a node. For nonblode even this single comparison is often not necessary, as the trie is only half blind. The nodes are also variable in byte size.

I will not be comparing the two key/data stores. If I were, I might decide the LMDB is the faster one, but remember, each database has its own uses.

I am comparing the B-trees. In this post, I am only concerned with the insertion speed, and with nothing else. Not transaction closing, not disk writing, just insertion. To this end, I insert all my data in one huge transaction, in an empty “table”. The inserts are random, as I don’t find sequential insertion that interesting.

As I prepare the test setup, I do have expectations formed. I believe LMDB is a fast key/data store, but I don’t expect it is because of an especially fast B-tree. As I’ve said, the LMDB B-tree is of classical breed that is not particularly fast, in fact, not particularly suited for strings organization. The nonblonde B-tree was designed to suit long keys. It is an in the middle design, one of doubtful virtues. But I believe the longer the keys and their common prefixes, the more favored the nonblonde B-tree will be.

In my tests, the keys attach no data.

Key countCDBnonblonde
1048576.759s.556s
20971521.91s1.397s
41943044.61s3.354s
PKD
Key countCDBnonblonde
1048576.9s.646s
20971522.23s1.587s
41943045.22s3.622s
ENW
Key countCDBnonblonde
10485761.78s.883s
20971524.03s2.033s
419430412.42s4.707s
URL

The times recorded above are the times to populate the database. They do not include database initialization. There’s no disk writing involved. The input is pre-read from a file and this pre-reading is also not included.

The LMDB B-tree is, I guess, not stripping common key prefixes and so forth. The nonblonde B-tree is itself not particularly memory efficient. Though not shown, the memory requirements appeared to me to be lower for the nonblonde B-tree.

And though I’ve said I will not be comparing disk writing, I have had a look at how much disk space is required to store the collections above. This time, nonblonde is particularly space efficient, as it strips all common prefixes between keys. For the last and largest string collection, the size of files on disk was several times smaller in nonblonde‘s favor. There is a cost to that, of course. As nonblonde transcodes between the in-memory representation and disk representation, the writing and reading times are better for LMDB.

The New Indexers, Wormhole

One of the better newer string indexers out there, the Wormhole data structure has an interesting design.

Here are the numbers, see my previous Masstree post for a data sets description. I have run the tests on a different machine and so the numbers for the reference data structure (the L2 Si trie, version 1.20230211) do not match with those in the aforementioned post.

Key CountByte SizeWormhole InsertionL2 Si InsertionWormhole LookupL2 Si LookupWorm Mem UsageL2 Si Mem Usage
PKD9.158M104MB6.45s3.93s3.78s4.01s547MB238MB
ENW10M212MB7.91s5.51s4.69s5.93s724MB340MB
URL6.795M978MB7.06s5.44s4.17s5.49s1314MB260MB

The results shown here for wormhole are for its fastest variant, the “unsafe” one. As I have run it single threaded. I have found the “unsafe” variant faster then the “safe” one, and even more so than Wormhole’s own benchmarking.

The index is a nifty design. It’s yet another attempt of combining hashes and trees. Going through the author’s other projects, I get the feeling that he turns everything he touches into a hash.

This data structure endeavors to query keys via two hash tables searches. It still promises to be sorted, though the sorting may be incomplete after inserts. If that’s the case, further sorting may be triggered by range queries or neighbor searches. So we are being told. The types of work expected of an ordered collection may be thus slower than they would be otherwise. The data structure serving as the base for comparison here, L2 Si, is a trie, and it is sorted. It manages to “compress” or “compact” the input exactly because it can remove matching prefixes between similar keys.

Wormhole is not portable C code. It will not build with any C compiler. It makes heavy use of specialized CPU instructions (reason why I had to conduct my testing on a different machine). I can’t tell how much slower portable code would be, I guess probably no more than 10-25%.

I believe the author may have written code before, he seems confident if not competent.

The author compared wormhole with Masstree and ART and found that the latter two are significantly slower (50%?) while taking significantly more memory (twice as much for some data sets?).

The Regular B-tree

The B-tree is for some queer reasons seldom used as a in-memory applications. Too bad, because it’s a good one. I’m talking primarily about the B-tree that stores small, fixed size items in fixed size nodes. Such a structure lacks the generalist character of, say, BSTs, but when it matches the task it can be several times faster while using several times less memory.

Back in the day I’ve decided I’ll write myself such a B-tree. Rightly or wrongly, at that time I felt I did not like how the data structure shifts half a node for an average insertion and so I wrote a B-tree that doesn’t do that.

Couple of years later I’ve learned there were other B-trees for in-memory use and I thought of comparing mine with them. A couple of designs were floating around, one called “google” for reasons, the other “STX” (probably they did not like what STD typically means).

I tried the two and I’ve discovered they needed tweaking to render best results on my machine. I’ve also played with my design, trying various types of intra node searching — binary and linear.

The plot below shows the random insertions times I got. Mind you, it’s a very old plot. Ten years old or older and indicative of the machine I was using then.

There are two “google” and “STX” lines, correspond the original and tweaked flavors. There’s the blue line showing the node binary search version of my design, the gray line showing non root node binary search and root node linear search version and the green linear search version.

The figure shows the comparative insertion times and it’s the only interesting one. The three designs are far better matched in the search time and indistinguishable in memory usage.

I have compared the B-trees with other data structures at the time — again, many years have passed since then.

Here’s the hash table plot:

The hash set was not exactly faster, while using notable more memory.

There was a better data structure, one that still is very good, though not used as much as it should be. The Judy array.

I imagine the spikes are Judy transitioning between its many trie node coding classes.

The Judy integer set is very good in space too, it can beat the B-trees handily, though unlike for B-trees, the memory foot print is very dependent on input.

The New Indexers, Masstree

A number of (relatively) recent arrivals has made the string indexing world if not more exciting, at least more cluttered. As expected, great performance is claimed for these new solutions, and every author finds its own creation better, if not lot better, if not an order of magnitude better than the second best.

I’ve tried checking some of the new indexers, see how they stack up against older stuff. Included were ART and Masstree.

I’ll admit, I wasn’t particularly excited about the two. Both solutions seem fairly incomplete, leaving a lot of problem space areas naked.

ART for once does not store the indexed strings. It expects the application to do so. That by itself is not a problem, but if the trie doesn’t store the strings, the trie does not get to remove common prefixes between the strings. Thus ART cannot be efficient in space because it doesn’t even tries to.

At any rate, I was not able to test ART. I got the code from its author, but I could not use it. The code was showing indexing 64 bit integers, and there was no advice on how to use it for variable length strings. I asked for clarifications, I’m yet to get them.

I had better luck with the Masstree. It is another design of doubtful merits. It does some things it should do, and it doesn’t do others. I have found the very idea at its core surprising and I did not know what to expect.

I’ll admit, it would have been nicer to include Masstree in the plots I have in my previous posts, but a number of issues kept me from it. Masstree is not a library to include in your own application, it’s a lot of pain and effort to get the numbers in the plots and ultimately, I felt that it ain’t worth the trouble.

So I’ve resorted to make a simpler comparison. For three of the sets in the String Associative Tables posts, the EN Wiki set (ENW), the dissociative press set (PKD) and the file paths set (URL) I have run Masstree and my L2 Si trie for random insertions and lookups. The L2 Si is very similar in performance with the better known Judy array from strings.

I have collected the time and the memory usage. To me, they all matter. I don’t quite care for a data structure that is 20% faster but takes twice as much memory.

Item CountByte SizeMass InsertionL2 Si InsertionMass LookupL2 Si LookupMass Mem UsageL2 Si Mem Usage
PKD9.158M104MB8.11s5.94s6.93s4.83s.9GB?~270MB
ENW10M212MB10.93s8.67s9.86s7.89s1.4GB?~350MB
URL6.795M978MB13.43s8.3s11.89s7.67s2.1GB?~270MB

The numbers for L2 Si should match those in the previous post, but for good measure, I have rerun the tests to match environment for the Masstree tests.

The insert test builds the index from zero, strings added in random order. The lookup test searches them also in random order.

I did run Masstree on a single thread.

The mem usage numbers are matching between the sets for L2 Si trie because the set are matching in cardinality and the prefixes are removed by the data structure between strings.

I don’t know why Masstree takes that much memory. The numbers in the table are what I get from the OS, and they are obtained in the same way for both data structures.

Others find Masstree equally memory hungry, so maybe my results are not entirely wrong. Robert Binna’s HOT paper finds Masstree the worst offender among four tested indexers. The text has this weaselly inclusion: “to ensure a fair comparison we do not take the memory required to store Masstree’s tuples into account, as the space required to represent the raw tuples is not considered for any of the evaluated data structures.” Meaning that the memory usage is represented down 100% from the base line (i.e. “the space required to represent the raw tuples”) for all evaluated solutions. It’s this special accounting technique that allows the paper to show HOT and ART are using less memory than the raw string collection.

Database Benchmarking, II

While the keys I had for my first db benchmark were perfectly fine, most applications will see keys that are not that long. With that in mind, I decided to go the other way, test with short keys, 10.3 bytes long on average. For each key there’s a 40 bytes attached “value”.

I’ll repeat the tests in the previous post.

Insertions first.

Pop. at startBatch sizeNo. batchesTotal keysSqlite timenonblonde time
First run06553616104857614.5s3.8s
Second run1048576165536655362.95s2.2s

Again, allowing sqlite a large cache in the second run only hurt the speed. The number given is with no cache indication.

The file sizes.

Sqlite file sizenonblonde file size
after first run, before reorganizing88MB287MB
before second run, after reorganizing85MB60MB

nonblonde space management is more liberal than that of Sqlite, as the former tries to consolidate (localize) writing while the latter is not bothered with scattered writing and is instead concerned with keeping tables sorted on disk and disk usage minimal. nonblonde is also meaning to preserve a number of recent past transactions on disk (in this test the transaction size was large — 64K keys). These combine into a larger database size on disk for nonblonde, though most of the taken space is just preserved past transactions. This space can be reclaimed by compacting the database. On the other hand, the Sqlite policy of keeping the tables/indexes in sort order on disk should see better (or lot better) subsequent scan and range query times.

And the query times.

Key count X TimesSqlitenonblonde
64K X 12.12s.4s
1 X 64K1.4s.132s
diff.72s.268s

Database Benchmarking

I don’t believe much in benchmarking databases, I don’t think benchmarking results are very meaningful or indicative of what real applications will behave like. There are simply too many factors that benchmarks fail to account for.

Still, not having the slightest clue on db performance is also bad.

I will look at two databases in this post, two that do not compare well. But it’s not my goal to compare them and decide that one is better than the other (especially since they do different things), rather, I want to learn something about what they do.

The first is the ubiquitous sqlite. The second is a KV store of my own writing, nonblonde.

https://sourceforge.net/projects/nonblonde/

https://nonblonde.sourceforge.net/

Now clearly nonblonde doesn’t do what sqlite does. Sqlite does complex querying and knows quite a few tricks that nonblonde doesn’t. Hence it might be a bit wrong to compare two such mismatched databases, but again, the goal is to learn, not to draw conclusions.

For the tests in this post I will have long keys. No database likes long keys much and I expect all perform better with shorter keys. That’s only too bad. My keys will be 142 bytes long on average. Nothing unrealistic about them, they are URLs and file system paths.

Every key associates 40 bytes of data.

I’m trying to get the best performance out the two databases, but I will not go to great lengths.

I set sqlite on WAL mode. I mean to give both databases plenty of cache.

I know what they both do when they sync files on each edit. Sqlite docs say 60 transactions per second on a 7200RPM disk. I have a SSD, but I need go no further. I will not test the disk syncying performance, I want to know what the database does with no syncing at all. This way, I can decide later what syncing policy suits my use.

I’ll do two insertions runs, first inserting 1M keys in an empty db in 64K batches, second inserting 64K, one per transaction.

Sqlite commands to setup the database:

PRAGMA cache_size=1048576;
PRAGMA journal_mode=WAL;
PRAGMA synchronous=OFF;
CREATE TABLE t1(a VARCHAR(640) primary key, c VARCHAR(40));
Pop. at startBatch sizeNo. batchesTotal keysSqlite timenonblonde time
First run06553616104857632s5s
Second run1048576165536655366.5s2.6s

We need to understand what the two databases are doing.

Sqlite writes data in a lump table. It needs either an index on the table or a primary key on the table to be able to subsequently query the table efficiently. In here, I chose the latter. Sqlite does not strip common prefixes from keys. The primary key clause must be creating an index by itself. The end result is that each key is written twice: once in the “table”, once in the “index”.

In contrast, nonblonde is only a KV store. A b-tree. Properly, nonblonde allows tables, and each table is written in its own b-tree. nonblonde knows transactions and stuff. It also knows to remove common prefixes from keys. And since the table is the index and the index the table, they keys are stored once.

I don’t think what sqlite does is wrong, it might just not be what everybody wants. Conversely, if querying on multiple criteria is envisaged, one would add several tables (indexes) to nonblonde and might get exactly where sqlite is.

The two databases have different goals for data insertion. Sqlite tries hard to keep data in order for the purpose of optimizing subsequent range queries. Curiously (or bizarrely) it does not believe it can rise to the task, and thus its docs recommend optimizing the database daily. nonblonde recognizes it cannot chase this organization goal and chooses to localize writing. With the result that the file space is not managed very tightly. Just like for sqlite, rewriting the database tables in sequential order would see better range query times.

The resulting database files are larger than need to be, especially so for nonblonde. They can be reorganized for a tighter fit.

Sqlite file sizenonblonde file size
after first run, before reorganizing517MB419MB
before second run, after reorganizing486MB87MB

Due to its prefix stripping, nonblonde managed to actually compress data. Sqlite on the other hand wrote each key twice.

The times given for sqlite are the best obtained, but perversely, under different conditions. I meant to give plenty of cache to both databases, but that did no go as I expected for sqlite. With plenty of cache, sqlite completed the first run in 32s. Without a cache indication, it took 72s. The 6s time for the second run was obtained without indicating a cache size. When I tried giving it a cache large enough to hold the entire database I got 34s instead.

The insertion times being as they are, let’s see how fast we can query data.

I use at first a scripted querying that is not exactly efficient, driving large generation and parsing times. These times reflect in the numbers given below, and they affect sqlite disproportionately.

I query present keys, randomly selected.

Key countSqlitenonblonde
1024.119s.100s
4096.296s.109s
16384.942s.244s
655363.372s.504s

The database is opened for each of the query runs. nonblonde has a large database opening time, reflected in the obtained timings.

Both sqlite and nonblonde are allowed plenty of cache, yet sqlite does not seem to benefit from it.

I need to factor out the input generation and parsing time, so I compare the times I get for querying 64K keys one time against querying the same key 64K times.

Key count X TimesSqlitenonblonde
64K X 13.159s.493s
1 X 64K1.786s.17s
diff1.373s.323s

Sqlite is searching the index and then goes to retrieve the record from the table, doing reads for both of the two. nonblonde has to read only one structure.

Sqlite sees faster times if all the queries are put in one “transaction”. By about 10-15%.

Finally, I look at the cold vs hot query times. The numbers above are for a database that’s likely in the OS buffers. I query 64K keys after a system restart, so that the database is read from disk, and I query the same keys again, to see the results coming from the OS buffers.

Db stateSqlitenonblonde
Cold (first run)30.5s6.3s
Hot (second run)3.49s.5s