Ok, so I tried to split the file into 256MB chunks. Now I'm getting:
"chunk_3.xml" (Line 7988121): Too many distinct element names (limit: 32768).
Which is actually true :-( The document has weird element names, like <i1>, <i2> ... and so on, up to <i50000>
This may be related to the out of memory error, too.
Is there a way to raise this limit?
Thanks


On Thu, 27 Feb 2025 at 18:25, Csaba Fekete <feketecsaba@gmail.com> wrote:
Yeah, I get the same error using this command, too.
Thanks

On Thu, 27 Feb 2025 at 17:43, Christian Grün <christian.gruen@gmail.com> wrote:
Just some quick feedback: Does it work if you specify the input along with CREATE DB?

basex -c"CREATE DB taurus SPANYOLORSZÁG.xml"

You can also specify a directory as input.

Thanks,
Christian



Csaba Fekete <feketecsaba@gmail.com> schrieb am Do., 27. Feb. 2025, 17:36:
Hi Christian
Sorry, I thought I was sending this to the mailing list. Thanks for answering anyway!
Now I'm trying with a smaller dataset and I am adding the documents one by one. I also upgraded BaseX to the latest version.
The largest document is 1151M in size and it can't be imported, even if I use attrindex and textindex.
The file is actually publicly available: http://taurusreisen.hu/partner/v2/SPANYOLORSZAG.zip
Here is my command and the output:
/opt/basex/bin/basex -Oattrindex=true -Otextindex=true -v -V -c"OPEN taurus; ADD ./SPANYOLORSZÁG.xml"
Database 'taurus' was opened in 18.21 ms.
Out of Main Memory.

I am thinking of solving the problem by splitting the file to several chunks, which will be CPU-demanding but could make it work.
Any ideas are welcome.
Thank you again, and a million thanks for BaseX! It is a fantastic tool.
Regards,
Csaba

On Thu, 27 Feb 2025 at 15:52, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Csaba,

It’s difficult to give a general advice; XML documents are just too different. In principle, a few GB or even MB can be sufficient to create databases for very large collections (10 GB and more), but sometimes namespaces are a showstopper. See [1] for some statistics.

What’s the total size of your XML documents? Can you create the database if you enable the text and attribute index?

Best,
Christian





On Tue, Feb 25, 2025 at 2:10 PM Csaba Fekete <feketecsaba@gmail.com> wrote:
Hi
I have a web server that runs Basex 11.1. The server is a VPS with 18G of RAM.
I have a directory of documents in various sizes, ranging from a few kilobytes up to 2G.
I am trying to import these documents with the command
CREATE DB mydb /path/to/docs
With the default jvm max heap size (2GB) I get the error: Out of main memory 
If I raise the max heap size to 4GB, I get the same error.
If I raise it to 8GB, the system becomes unresponsive.
How can I determine how much system memory I need to be able to carry out this task?
Thanks