Hi all - I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected. I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error? I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Hi all - I can parse the files - adding them to a database is the next step. Sorry for the noise (user/operator error for the win!). Best, Bridger PS let $tsv := file:read-text("/home/bridger/Downloads/achs_Global_AllTitles_2026-04-01.txt", "UTF-8", true()) let $opts := { 'header': true(), 'separator': fn:char('\t')} return csv:parse($tsv, $opts) On Thu, Apr 23, 2026 at 10:17 PM Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Hi Bridger, Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)? Thanks in advance, Christian ________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors Hi all - I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected. I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error? I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Hi Christian - Thanks for taking a look at this. I apologize for not being clearer: I am using the BaseX GUI. Database > New > Create Database. Input format = CSV; and CSV Parsing options are Separator = Tab, Format = direct, Parse first line as table header = True. The 'creating database' progress bar gets to the 'Finishing' stage and then the OOM is thrown. Let me know if I can share any other information. Enjoy the weekend! Best, Bridger On Fri, Apr 24, 2026 at 7:24 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)?
Thanks in advance, Christian
________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Christian, et al, I am also stumbling a bit on one of the CSV parsing options - values for the field-delimiter (or separator) are expected to be a single character. In the context of an options map in a function (csv:parse, eg) , `fn:char('\t')` evaluates and is usable. However, in the context of setting a value in a command script, there doesn't seem to be this relatively simple work-around. Is there some other approach? I haven't explored command scripts sufficiently.... Thanks for your thoughts and advice! Best, Bridger PS command script attempt: SET CSVPARSER header=true,separator=char('\t') CREATE DB kbart-examples ADD /home/bridger/Downloads/some_kbart_file_maybe.txt the above results in 'The value of "separator" is not a single character: "char('\t')";'. PPS An attempt with the XML syntax: <commands> <set option='csvparser'>header=true,field-delimiter=fn:char('\t')</set> <create-db name='kbart'/> <add path="achs">/home/bridger/Downloads/some_kbart_file_maybe.txt</add> </commands> throws the same error (The value of 'field-delimiter' is not a single character...). On Fri, Apr 24, 2026 at 8:57 AM Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:
Hi Christian - Thanks for taking a look at this. I apologize for not being clearer: I am using the BaseX GUI. Database > New > Create Database. Input format = CSV; and CSV Parsing options are Separator = Tab, Format = direct, Parse first line as table header = True. The 'creating database' progress bar gets to the 'Finishing' stage and then the OOM is thrown.
Let me know if I can share any other information. Enjoy the weekend! Best, Bridger
On Fri, Apr 24, 2026 at 7:24 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)?
Thanks in advance, Christian
________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Thanks, Bridger, I appreciate your feedback, it has resulted in two new issues [1,2]. The OOM error is caused by an innocent empty line in the CSV input. And I agree, it has become surprisingly cumbersome to specify delimiters on command line (I won’t go into detail how it *can* be done – it should definitely be more obvious ;·). We’ll take care of it soon. Best, Christian [1] https://github.com/BaseXdb/basex/issues/2653 [2] https://github.com/BaseXdb/basex/issues/2654 ________________________________________ Von: Bridger Dyson-Smith <bdysonsmith@gmail.com> Gesendet: Freitag, 24. April 2026 16:04 An: Christian Grün Cc: BaseX Betreff: Re: [basex-talk] TSV inputs and OOM errors Christian, et al, I am also stumbling a bit on one of the CSV parsing options - values for the field-delimiter (or separator) are expected to be a single character. In the context of an options map in a function (csv:parse, eg) , `fn:char('\t')` evaluates and is usable. However, in the context of setting a value in a command script, there doesn't seem to be this relatively simple work-around. Is there some other approach? I haven't explored command scripts sufficiently.... Thanks for your thoughts and advice! Best, Bridger PS command script attempt: SET CSVPARSER header=true,separator=char('\t') CREATE DB kbart-examples ADD /home/bridger/Downloads/some_kbart_file_maybe.txt the above results in 'The value of "separator" is not a single character: "char('\t')";'. PPS An attempt with the XML syntax: <commands> <set option='csvparser'>header=true,field-delimiter=fn:char('\t')</set> <create-db name='kbart'/> <add path="achs">/home/bridger/Downloads/some_kbart_file_maybe.txt</add> </commands> throws the same error (The value of 'field-delimiter' is not a single character...). On Fri, Apr 24, 2026 at 8:57 AM Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:
Hi Christian - Thanks for taking a look at this. I apologize for not being clearer: I am using the BaseX GUI. Database > New > Create Database. Input format = CSV; and CSV Parsing options are Separator = Tab, Format = direct, Parse first line as table header = True. The 'creating database' progress bar gets to the 'Finishing' stage and then the OOM is thrown.
Let me know if I can share any other information. Enjoy the weekend! Best, Bridger
On Fri, Apr 24, 2026 at 7:24 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)?
Thanks in advance, Christian
________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Dear Christian - Thank you for the discovery! I confess that an empty line never occurred to me - the different ways (libreoffice, excel, and command line utilities) I had checked the data did not make it obvious that empty lines would be hanging out (and sullying the import process). I'll look forward to learning more about the delimiter options :D Best, Bridger On Tue, Apr 28, 2026 at 9:09 AM Christian Grün <cg@basex.org> wrote:
Thanks, Bridger,
I appreciate your feedback, it has resulted in two new issues [1,2]. The OOM error is caused by an innocent empty line in the CSV input. And I agree, it has become surprisingly cumbersome to specify delimiters on command line (I won’t go into detail how it *can* be done – it should definitely be more obvious ;·). We’ll take care of it soon.
Best, Christian
[1] https://github.com/BaseXdb/basex/issues/2653 [2] https://github.com/BaseXdb/basex/issues/2654
________________________________________ Von: Bridger Dyson-Smith <bdysonsmith@gmail.com> Gesendet: Freitag, 24. April 2026 16:04 An: Christian Grün Cc: BaseX Betreff: Re: [basex-talk] TSV inputs and OOM errors
Christian, et al,
I am also stumbling a bit on one of the CSV parsing options - values for the field-delimiter (or separator) are expected to be a single character. In the context of an options map in a function (csv:parse, eg) , `fn:char('\t')` evaluates and is usable. However, in the context of setting a value in a command script, there doesn't seem to be this relatively simple work-around. Is there some other approach? I haven't explored command scripts sufficiently....
Thanks for your thoughts and advice! Best, Bridger
PS command script attempt:
SET CSVPARSER header=true,separator=char('\t') CREATE DB kbart-examples ADD /home/bridger/Downloads/some_kbart_file_maybe.txt
the above results in 'The value of "separator" is not a single character: "char('\t')";'.
PPS An attempt with the XML syntax:
<commands> <set option='csvparser'>header=true,field-delimiter=fn:char('\t')</set> <create-db name='kbart'/> <add path="achs">/home/bridger/Downloads/some_kbart_file_maybe.txt</add> </commands>
throws the same error (The value of 'field-delimiter' is not a single character...).
On Fri, Apr 24, 2026 at 8:57 AM Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:
Hi Christian - Thanks for taking a look at this. I apologize for not being clearer: I am using the BaseX GUI. Database > New > Create Database. Input format = CSV; and CSV Parsing options are Separator = Tab, Format = direct, Parse first line as table header = True. The 'creating database' progress bar gets to the 'Finishing' stage and then the OOM is thrown.
Let me know if I can share any other information. Enjoy the weekend! Best, Bridger
On Fri, Apr 24, 2026 at 7:24 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)?
Thanks in advance, Christian
________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Hi Bridger,
Thank you for the discovery! I confess that an empty line never occurred to me
That’s completely understandable. I was surprised to discover that this bug existed at all. A new stable snapshot is available [1]. The empty-line issue has been resolved, and you can now specify backslash-escape sequences such as \t for delimiters (or \\t, if your command-line interpreter unescapes the input before it is passed to BaseX). Hope this helps, Christian [1] https://files.basex.org/releases/latest/ On Tue, Apr 28, 2026 at 9:09 AM Christian Grün <cg@basex.org> wrote:
Thanks, Bridger,
I appreciate your feedback, it has resulted in two new issues [1,2]. The OOM error is caused by an innocent empty line in the CSV input. And I agree, it has become surprisingly cumbersome to specify delimiters on command line (I won’t go into detail how it *can* be done – it should definitely be more obvious ;·). We’ll take care of it soon.
Best, Christian
[1] https://github.com/BaseXdb/basex/issues/2653 [2] https://github.com/BaseXdb/basex/issues/2654
________________________________________ Von: Bridger Dyson-Smith <bdysonsmith@gmail.com> Gesendet: Freitag, 24. April 2026 16:04 An: Christian Grün Cc: BaseX Betreff: Re: [basex-talk] TSV inputs and OOM errors
Christian, et al,
I am also stumbling a bit on one of the CSV parsing options - values for the field-delimiter (or separator) are expected to be a single character. In the context of an options map in a function (csv:parse, eg) , `fn:char('\t')` evaluates and is usable. However, in the context of setting a value in a command script, there doesn't seem to be this relatively simple work-around. Is there some other approach? I haven't explored command scripts sufficiently....
Thanks for your thoughts and advice! Best, Bridger
PS command script attempt:
SET CSVPARSER header=true,separator=char('\t') CREATE DB kbart-examples ADD /home/bridger/Downloads/some_kbart_file_maybe.txt
the above results in 'The value of "separator" is not a single character: "char('\t')";'.
PPS An attempt with the XML syntax:
<commands> <set option='csvparser'>header=true,field-delimiter=fn:char('\t')</set> <create-db name='kbart'/> <add path="achs">/home/bridger/Downloads/some_kbart_file_maybe.txt</add> </commands>
throws the same error (The value of 'field-delimiter' is not a single character...).
On Fri, Apr 24, 2026 at 8:57 AM Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:
Hi Christian - Thanks for taking a look at this. I apologize for not being clearer: I am using the BaseX GUI. Database > New > Create Database. Input format = CSV; and CSV Parsing options are Separator = Tab, Format = direct, Parse first line as table header = True. The 'creating database' progress bar gets to the 'Finishing' stage and then the OOM is thrown.
Let me know if I can share any other information. Enjoy the weekend! Best, Bridger
On Fri, Apr 24, 2026 at 7:24 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Could you describe how you triggered the OOM error (even if you may have found a solution to parse the file in the meanwhile)?
Thanks in advance, Christian
________________________________ Von: Bridger Dyson-Smith via BaseX-Talk <basex-talk@mailman.uni-konstanz.de> Gesendet: Freitag, April 24, 2026 4:18:15 AM An: BaseX <basex-talk@mailman.uni-konstanz.de> Betreff: [basex-talk] TSV inputs and OOM errors
Hi all -
I'm trying to work with some files that are TSV (tab separated values) that are fairly small (510K) but are causing OOM errors, even when launching the BaseX GUI with 8GB of memory. I've attached an example. Parsing options are separator = tab, format = direct, parse first line as table header = true; no indexes or full-text are selected.
I recall this was an issue with v12.1, but I think I found an alternate representation of the data and didn't report the problem. Is this a bug, or just user/operator error?
I'm using v12.3. Thanks in advance for any suggestions! Best, Bridger
Hi Christian - On Fri, May 1, 2026 at 3:37 AM Christian Grün <cg@basex.org> wrote:
Hi Bridger,
Hope this helps, Christian
YES - thank you so much! Have a wonderful weekend. Best, Bridger
participants (2)
-
Bridger Dyson-Smith -
Christian Grün