I’m measuring the specific db:token() lookup in order isolate effects of other processing.
These are page view records per document covering several different published versions of each document, so for a given path you would expect at most three or four results, as opposed to 1000s of results.
My implementation is quite naïve in that I’m just chunking the raw CSV data into a database and then hoping the token index will provide good look up results, which has been my experience with other queries
(look up times of 0.02 seconds or better), which makes the 0.3 second time a bit anomalous and makes me suspect an error on my end.
This is in the context of a generic “enable processing of any CSV data” feature, rather than a dedicated “report on page views data” feature, where I would construct a more efficient index (i.e., node IDs
to page view data or something).
Here are the settings for the analytics database, which holds the CSV XML data:
NAME |
_analytics |
SIZE |
257 MB |
NODES |
9793157 |
DOCUMENTS |
11 |
BINARIES |
0 |
VALUES |
0 |
TIMESTAMP |
2024-07-14T20:49:34.624Z |
UPTODATE |
✓ |
RESOURCEPROPERTIES
|
|
INPUTPATH |
|
INPUTSIZE |
0 b |
INPUTDATE |
2024-04-17T21:37:04.516Z |
INDEXES
|
|
TEXTINDEX |
✓ |
ATTRINDEX |
✓ |
TOKENINDEX |
✓ |
FTINDEX |
– |
TEXTINCLUDE |
|
ATTRINCLUDE |
|
TOKENINCLUDE |
|
FTINCLUDE |
|
LANGUAGE |
English |
STEMMING |
– |
CASESENS |
– |
DIACRITICS |
– |
STOPWORDS |
|
UPDINDEX |
✓ |
AUTOOPTIMIZE |
– |
MAXCATS |
100 |
MAXLEN |
255 |
SPLITSIZE |
0 |
Thanks,
Eliot
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
Digital Content & Design
O: 512 554 9368
M: 512 554 9368
LinkedIn | Twitter | YouTube | Facebook