Hi all,
I was wondering whether command scripts are optimized before they are executed. I would think so, because when I had made an error in the, say, sixth command in my script, it was caught before the first command was executed.
But more importantly for my application: it seems that combinations of "CLOSE; OPEN <same DB as just closed>" are skipped. I created an XQuery to copy 680000 records from my XML DB to MySQL. These records are MODS, found inside record elements at XPath /file/record. The database is about 2.5 GB and eats even more of my main memory when I try to copy all records in one run. I found out that doing subsequences of 50000 to 10000 records speeds up the process significantly and the server uses a lot less memory. For instance, when I start processing records from 400001, BaseX apparently doesn't keep the first 400000 in memory. But just doing 50000 records in a row, shifting the begin parameter of the subsequence and going on, eventually slows the process from tens to more than 100 records per second to less than one per second, probably because memory consumption goes up to all available memory (I assigned 3 GB to the JVM at startup of the server). Therefore I tried to free some memory by closing the database, re-opening it and then resume processing. But I don't see the memory usage reduction that I hoped for.
From my XQuery mods2sql5.xq:
<namespace declarations> declare variable $begin as xs:integer external := 1; declare variable $length as xs:integer external := 50000;
<function declarations: prepare statement return sql:execute-prepared(...)>
(: main process :) for $record at $rodb in subsequence(/file/record[mods:mods][@id != ""], $begin, $length) <call functions to process records> return <functions' output as sequence>
From my command script mods2sql.bxs:
SET QUERYINFO true OPEN monly2 SET BINDINGS $begin=1,$length=50000 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=50001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq CLOSE OPEN monly2 SET BINDINGS $begin=100001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=150001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=200001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq CLOSE OPEN monly2 ...
Are my observations about optimizations of command scripts correct? Is there any way to speed up my process?
Thanks for any advice!
Regards,
Ben
Hi Ben,
I was wondering whether command scripts are optimized before they are executed. I would think so, because when I had made an error in the, say, sixth command in my script, it was caught before the first command was executed.
in fact all commands will be parsed before being executed. No further optimizations take place, though.
But more importantly for my application: it seems that combinations of "CLOSE; OPEN <same DB as just closed>" are skipped.
Both commands should be executed, but the result will indeed be the same.. which is an opened instance of the previously opened DB.
Regarding the queries you were calling in your script, we’ll probably need to see them in their full beauty in order to give you more helpful hints. Do you perform any XQuery Update operations that may blow up main memory or similar stuff?
Christian
I created an XQuery to copy 680000 records from my XML DB to MySQL. These records are MODS, found inside record elements at XPath /file/record. The database is about 2.5 GB and eats even more of my main memory when I try to copy all records in one run. I found out that doing subsequences of 50000 to 10000 records speeds up the process significantly and the server uses a lot less memory. For instance, when I start processing records from 400001, BaseX apparently doesn't keep the first 400000 in memory. But just doing 50000 records in a row, shifting the begin parameter of the subsequence and going on, eventually slows the process from tens to more than 100 records per second to less than one per second, probably because memory consumption goes up to all available memory (I assigned 3 GB to the JVM at startup of the server). Therefore I tried to free some memory by closing the database, re-opening it and then resume processing. But I don't see the memory usage reduction that I hoped for.
From my XQuery mods2sql5.xq:
<namespace declarations> declare variable $begin as xs:integer external := 1; declare variable $length as xs:integer external := 50000;
<function declarations: prepare statement return sql:execute-prepared(...)>
(: main process :) for $record at $rodb in subsequence(/file/record[mods:mods][@id != ""], $begin, $length)
<call functions to process records> return <functions' output as sequence>
From my command script mods2sql.bxs: SET QUERYINFO true OPEN monly2 SET BINDINGS $begin=1,$length=50000 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=50001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq CLOSE OPEN monly2 SET BINDINGS $begin=100001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=150001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq SET BINDINGS $begin=200001 RUN /Users/ben/Documents/xquery/metadata_only/mods2sql5.xq CLOSE OPEN monly2 ...
Are my observations about optimizations of command scripts correct? Is there any way to speed up my process?
Thanks for any advice!
Regards,
Ben _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de