I reworked my map to replace all the literal XML with node IDs, storing the XML data separately.
With that change, it takes 150ms to load a large map on my laptop.
I agree that storing the map as XML will almost certainly be fastest, which I’ll probably do, but this result is definitely good enough for my current application.
I think I probably have Java/Python data structure brain damage, which means it just seems much easier to build maps and then access them using “?” operators, which is so easy, rather than putting the same data into XML and taking advantage of attribute and token indexes to optimize retrieval.
But in the sequence of “make it work, make it right, make it fast”, this is a pretty good result.
Cheers,
E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
From: Christian Grün christian.gruen@gmail.com Date: Saturday, July 1, 2023 at 1:53 AM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance of db:get-value() for large maps [External Email]
Hi Eliot,
When stored XQuery values are requested, they are always fully materialized in main-memory. Depending on the size, that may take a while.
The following query can be used to create a map with 1 million entries and store it in a database. It takes around 1200 ms on my machine:
let $data := map:merge((1 to 1000000) ! map:entry(., string())) return db:put-value('my-db', $data, 'map')
The result is a file called `map.basex` that’s locally stored in `data/my-db/values/`. With the following query, this file is parsed, and the map is re-created, which takes around 500 ms:
map:size(db:get-value('my-db', 'map'))
If you increase the number of entries, you’ll observe that the execution times increase more or less linearly.
If you need index-based access to database entries with a short startup time, classical databases may still be the best fit.
Cheers, Christian