I’m working on constructing DITA key spaces for our content. My current implementation builds an XQuery map that contains the key space data as well as the XML data from which the key space was constructed, which can be quite a bit (1-2 megabytes of XML all told—key space construction effectively requires constructing a single “resolved” DITA map from a tree of map documents, which I’m then storing in my key space map). A given key space might have 100,000 individual name-to-element mappings in addition to the raw DITA content.
I tried saving these maps into a database using db:put-value(), which worked.
But, when I went to fetch them back out, it took 30 seconds to fetch one of these maps. My expectation was that retrieving a map would be quite fast.
I’m sure that it was silly to try to store all the raw markup in these maps, and I will change that aspect of my implementation to store the resolved maps separately and then use node IDs in the key space map.
Worst case I can do what I have been doing with BaseX 9 and convert the XQuery map to an XML representation and use that for retrieval, which of course will work fine, at the cost of more code and a few more milliseconds of ingestion processing.
Before I spend any more time trying to optimize the storage and retrieval of these maps, I wanted to ask a few questions:
1. Is my expectation that map retrieval should generally be very fast correct, modulo not having gobs of raw XML in them? 2. Is there a more efficient way to query over maps that are stored in a database? My initial attempt was to simply pull the entire map from the database and then operate on it as I would any other map, i.e.:
$keySpace as map(*) := getKeySpace($rootMap) (: Pulls the map from the database :) $keydefs as element()* := $keySpace?keyscopes?($scopeKey)?keydefs?($keyName)
3. Is there some other maps-in-a-database optimization technique I’m overlooking?
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
Hi Eliot,
When stored XQuery values are requested, they are always fully materialized in main-memory. Depending on the size, that may take a while.
The following query can be used to create a map with 1 million entries and store it in a database. It takes around 1200 ms on my machine:
let $data := map:merge((1 to 1000000) ! map:entry(., string())) return db:put-value('my-db', $data, 'map')
The result is a file called `map.basex` that’s locally stored in `data/my-db/values/`. With the following query, this file is parsed, and the map is re-created, which takes around 500 ms:
map:size(db:get-value('my-db', 'map'))
If you increase the number of entries, you’ll observe that the execution times increase more or less linearly.
If you need index-based access to database entries with a short startup time, classical databases may still be the best fit.
Cheers, Christian
I reworked my map to replace all the literal XML with node IDs, storing the XML data separately.
With that change, it takes 150ms to load a large map on my laptop.
I agree that storing the map as XML will almost certainly be fastest, which I’ll probably do, but this result is definitely good enough for my current application.
I think I probably have Java/Python data structure brain damage, which means it just seems much easier to build maps and then access them using “?” operators, which is so easy, rather than putting the same data into XML and taking advantage of attribute and token indexes to optimize retrieval.
But in the sequence of “make it work, make it right, make it fast”, this is a pretty good result.
Cheers,
E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
From: Christian Grün christian.gruen@gmail.com Date: Saturday, July 1, 2023 at 1:53 AM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance of db:get-value() for large maps [External Email]
Hi Eliot,
When stored XQuery values are requested, they are always fully materialized in main-memory. Depending on the size, that may take a while.
The following query can be used to create a map with 1 million entries and store it in a database. It takes around 1200 ms on my machine:
let $data := map:merge((1 to 1000000) ! map:entry(., string())) return db:put-value('my-db', $data, 'map')
The result is a file called `map.basex` that’s locally stored in `data/my-db/values/`. With the following query, this file is parsed, and the map is re-created, which takes around 500 ms:
map:size(db:get-value('my-db', 'map'))
If you increase the number of entries, you’ll observe that the execution times increase more or less linearly.
If you need index-based access to database entries with a short startup time, classical databases may still be the best fit.
Cheers, Christian
basex-talk@mailman.uni-konstanz.de