Dear all at BaseX,
I 'm working on a big database (more than 100 Go) containing several millions of articles, identified by a unique element.
I frequently iterate on the identifiers list, each time I have to compute distinct-values(//myidentifier) in order to build that list.
Could it help to set MAXCATS to 2 000 000 or 3 000 000 before creating the database ?
I cannot try it, because text index generation throws a heap overflow.
Could it be possible to improve the MAXCATS option in order to set a individual MAXCATS value for a specified element or attribute ?
Finally, is there a way to complete the text index generation (Xmx is already set to maximum) ?
Best regards,
Fabrice ETANCHAUD Senior Software Engineer
edital
Berkenlaan 1
B-1831 Brussels, Belgium
+32 2 716 32 32 general
+32 2 716 32 20 fax
fetanchaud@edital.com
corsearch.com http://www.corsearch.com/ | edital.com http://www.edital.com/
Confidentiality Notice: This email and its attachments (if any) contain confidential information of the sender. The information is intended only for the use by the direct addressees of the original sender of this email. If you are not an intended recipient of the original sender (or responsible for delivering the message to such person), you are hereby notified that any review, disclosure, copying, distribution or the taking of any action in reliance of the contents of and attachments to this email is strictly prohibited. If you have received this email in error, please immediately notify the sender at the address shown herein and permanently delete any copies of this email (digital or paper) in your possession.
Fabrice,
sorry for letting you wait.
Yes, it might help to increase the value for MAXCATS. All resulting statistics will be stored in the database, even if the text and/or attribute indexes are turned off. As you already indicated, it may happen that memory gets low.
If your data is rather static, one common alternative is to manually extract and generate index data in additional databases. In your particular case, this could look as follows:
1. Create "index" database
2. Perform queries to fill index database, e.g.:
let $name := "myidentifier" let $data := document { distinct-values ( db:open("db")//descendant::*[name() = $name] ) } db:add("index", $data, $name)
3. Perform queries, based on both databases
let $db := db:open("db") let $index := db:open("index", "myidentifier") for $i in $index/* return ...
Reg. the memory constraints: you may also split up your data in several databases and access all of them in a single query:
for $i in 1 to 100 let $db := db:open("db" || $i) return ...
Hope this helps, Christian ___________________________
I 'm working on a big database (more than 100 Go) containing several millions of articles, identified by a unique element.
I frequently iterate on the identifiers list, each time I have to compute distinct-values(//myidentifier) in order to build that list.
Could it help to set MAXCATS to 2 000 000 or 3 000 000 before creating the database ?
I cannot try it, because text index generation throws a heap overflow.
Could it be possible to improve the MAXCATS option in order to set a individual MAXCATS value for a specified element or attribute ?
Finally, is there a way to complete the text index generation (Xmx is already set to maximum) ?
Best regards,
Fabrice ETANCHAUD Senior Software Engineer
edital
Berkenlaan 1
B-1831 Brussels, Belgium
+32 2 716 32 32 general
+32 2 716 32 20 fax
fetanchaud@edital.com
corsearch.com | edital.com
Confidentiality Notice: This email and its attachments (if any) contain confidential information of the sender. The information is intended only for the use by the direct addressees of the original sender of this email. If you are not an intended recipient of the original sender (or responsible for delivering the message to such person), you are hereby notified that any review, disclosure, copying, distribution or the taking of any action in reliance of the contents of and attachments to this email is strictly prohibited. If you have received this email in error, please immediately notify the sender at the address shown herein and permanently delete any copies of this email (digital or paper) in your possession.
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de