Hi, I would like to know how to copy a database from one server to another server, please let me know. Is it export and get and load xml or is there another way to do it, thanks for the clarification, Regards Martin Lourduswamy
On Mon, May 21, 2018 at 6:00 AM, <basex-talk-request@mailman.uni-konstanz.de
wrote:
Send BaseX-Talk mailing list submissions to basex-talk@mailman.uni-konstanz.de
To subscribe or unsubscribe via the World Wide Web, visit https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk or, via email, send a message with subject or body 'help' to basex-talk-request@mailman.uni-konstanz.de
You can reach the person managing the list at basex-talk-owner@mailman.uni-konstanz.de
When replying, please edit your Subject line so it is more specific than "Re: Contents of BaseX-Talk digest..."
Today's Topics:
- Fwd: BaseX performance improvements (Christian Gr?n)
- Re: BaseX performance improvements (Christian Gr?n)
Message: 1 Date: Sun, 20 May 2018 18:51:03 +0200 From: Christian Gr?n christian.gruen@gmail.com To: Martin Lourduswamy martin.louis@gmail.com, BaseX basex-talk@mailman.uni-konstanz.de Subject: [basex-talk] Fwd: BaseX performance improvements Message-ID: <CAP94bnPHW=NqzXxY-cfubGSLzWLoiiPzj_WfNj6tWYZFg0DtJA@mail.gmail. com> Content-Type: text/plain; charset="UTF-8"
Hi Martin (cc to the list),
- each XML node => 500 bytes
- total size of XML DB => 10million nodes(which will grow continuously
as
new files are added).
So one node is one document, right? I?m just asking because, in the XML terminology, each XML document has nodes itself: element nodes, attributes nodes, text nodes, etc.
If you are working with millions of documents, it can be helpful to work with a "daily" database, which contains auxiliary references on documents that have been deleted or updated. If there is a time interval in which no people access your database (e.g. each night), you can merge these databases and recreate your index structures. If you query your database, you can a few more lines of XQuery: first check your daily database, and (if it doesn?t contain the required document) look up the document in your fully indexed database.
I have to do a update of the nodes as I need to replace them. I chose
delete
- insert instead of replace
Please note that a replace operation can be much faster: If the structure of the updated document is similar to the old document, the document can be replace in-place.
Also, I am thinking of indexing on the node id, as that is like the
primary
key on which I base my operations, any suggestions might also be helpful,
Here it may also be interesting to know if you will only address documents or arbitrary nodes of your document. In the first case, addressing the document path via db:open should be sufficient [1]. In the latter case, you could possibly use the existing function db:node-id [2].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Database_Module#db:node-id
Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:28 AM, Christian Gr?n <
christian.gruen@gmail.com>
wrote:
Hi Martin,
I am new to BaseX, I would like ot speed up XQuery of insert and
delete
and replace through options.
Welcome to the list. As there are numerous ways to do updates in BaseX, feel free to give us more information on your insert and delete operations. Do you work with large single documents or many small documents? How large is your database?
While I query basex through perl, and try to connect through GUI, the perl connection aborts. Is there a parameters for parallel connections, please let me know.
I can?t tell why your perl connection is interrupted by opening the GUI (because they should be completely independent from each other). A step-by-step description on how you proceeded might be helpful.
Because there is no coupling between GUI and the client/server architecture, however, you should avoid running updates outside the GUI. Please check out [1] for more information.
Best, Christian
[1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
Thanks, Regards Martin Lourduswamy
Message: 2 Date: Sun, 20 May 2018 18:56:45 +0200 From: Christian Gr?n christian.gruen@gmail.com To: Martin Lourduswamy martin.louis@gmail.com, BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX performance improvements Message-ID: <CAP94bnN6w7ZaaNu43AqdzrKYsaqaxKGSamRNP8zy3fmEEJZ6Hg@mail. gmail.com> Content-Type: text/plain; charset="UTF-8"
Java heap space at C:\Users\lourduswamym\oracle_scripts/BaseXClient.pm
line
How much memory have you assigned to BaseX [1]?
Actually the error on the server log. The insert operation was taking
1+minutes due to slow network and then the error happened. I have been running this server instance continuously for 1 day with continuous insert from a per script.
If you create a new database, you can specify a directory with the files that are to be initially added to your database. This might save you a lot of time.
So I think I should be careful to run only one single process of client
at a
time or make sure memory is enough to handle multiple client operations, please let me know your thoughts,
If your documents are small, you shouldn?t usually encounter any errors. But it might have to do with the large number of documents you are dealing with. One more option is to distribute your docs across multiple databases (you can access all of them via a single XQuery expression).
Best, Christian
[1] http://docs.basex.org/wiki/Start_Scripts
Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:48 AM, Martin Lourduswamy <
martin.louis@gmail.com>
wrote:
Hi,
Thanks for the clarifications. My database is
DB Size:
- each XML node => 500 bytes
- total size of XML DB => 10million nodes(which will grow continuously
as
new files are added). 3. I will be having multiple DB's of same size and more size as the DB continuously grows
I have to do a update of the nodes as I need to replace them. I chose delete + insert instead of replace for each of the node through perl,
as I
will make sure there are no duplicate nodes left in the database.(just a precaution that even if some nodes get duplicated, through some other processes I will be able to remove them by delete)
DB Speed:
I will disable logging as it might speed up the database operations.
DB Architecture
1 server(windows or Linux) running baseX multiple client computers trying to query the same machine for select through fn:doc(...) API to get the data through perl then do a delete + insert from perl
I would like to know if you think this might work for production and any suggestion for data retrieval and update Any changes to the way I architecture it might help
Also, I am thinking of indexing on the node id, as that is like the primary key on which I base my operations, any suggestions might also be helpful, Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:28 AM, Christian Gr?n christian.gruen@gmail.com wrote:
Hi Martin,
I am new to BaseX, I would like ot speed up XQuery of insert and
delete
and replace through options.
Welcome to the list. As there are numerous ways to do updates in BaseX, feel free to give us more information on your insert and delete operations. Do you work with large single documents or many small documents? How large is your database?
While I query basex through perl, and try to connect through GUI, the perl connection aborts. Is there a
parameters
for parallel connections, please let me know.
I can?t tell why your perl connection is interrupted by opening the GUI (because they should be completely independent from each other). A step-by-step description on how you proceeded might be helpful.
Because there is no coupling between GUI and the client/server architecture, however, you should avoid running updates outside the GUI. Please check out [1] for more information.
Best, Christian
[1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
Thanks, Regards Martin Lourduswamy
End of BaseX-Talk Digest, Vol 101, Issue 37
Hi Martin,
export/import is one option. backup/restore another [1]. And a third could be to copy the database directory (provided the servers have a more or less similar setup).
Cheers, Alex
[1] http://docs.basex.org/wiki/Backups (can also be done via DBA [2]) [2] http://docs.basex.org/wiki/DBA
On 21. May 2018, at 18:32, Martin Lourduswamy martin.louis@gmail.com wrote:
Hi, I would like to know how to copy a database from one server to another server, please let me know. Is it export and get and load xml or is there another way to do it, thanks for the clarification, Regards Martin Lourduswamy
On Mon, May 21, 2018 at 6:00 AM, basex-talk-request@mailman.uni-konstanz.de wrote: Send BaseX-Talk mailing list submissions to basex-talk@mailman.uni-konstanz.de
To subscribe or unsubscribe via the World Wide Web, visit https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk or, via email, send a message with subject or body 'help' to basex-talk-request@mailman.uni-konstanz.de
You can reach the person managing the list at basex-talk-owner@mailman.uni-konstanz.de
When replying, please edit your Subject line so it is more specific than "Re: Contents of BaseX-Talk digest..."
Today's Topics:
- Fwd: BaseX performance improvements (Christian Gr?n)
- Re: BaseX performance improvements (Christian Gr?n)
Message: 1 Date: Sun, 20 May 2018 18:51:03 +0200 From: Christian Gr?n christian.gruen@gmail.com To: Martin Lourduswamy martin.louis@gmail.com, BaseX basex-talk@mailman.uni-konstanz.de Subject: [basex-talk] Fwd: BaseX performance improvements Message-ID: CAP94bnPHW=NqzXxY-cfubGSLzWLoiiPzj_WfNj6tWYZFg0DtJA@mail.gmail.com Content-Type: text/plain; charset="UTF-8"
Hi Martin (cc to the list),
- each XML node => 500 bytes
- total size of XML DB => 10million nodes(which will grow continuously as
new files are added).
So one node is one document, right? I?m just asking because, in the XML terminology, each XML document has nodes itself: element nodes, attributes nodes, text nodes, etc.
If you are working with millions of documents, it can be helpful to work with a "daily" database, which contains auxiliary references on documents that have been deleted or updated. If there is a time interval in which no people access your database (e.g. each night), you can merge these databases and recreate your index structures. If you query your database, you can a few more lines of XQuery: first check your daily database, and (if it doesn?t contain the required document) look up the document in your fully indexed database.
I have to do a update of the nodes as I need to replace them. I chose delete
- insert instead of replace
Please note that a replace operation can be much faster: If the structure of the updated document is similar to the old document, the document can be replace in-place.
Also, I am thinking of indexing on the node id, as that is like the primary key on which I base my operations, any suggestions might also be helpful,
Here it may also be interesting to know if you will only address documents or arbitrary nodes of your document. In the first case, addressing the document path via db:open should be sufficient [1]. In the latter case, you could possibly use the existing function db:node-id [2].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Database_Module#db:node-id
Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:28 AM, Christian Gr?n christian.gruen@gmail.com wrote:
Hi Martin,
I am new to BaseX, I would like ot speed up XQuery of insert and delete and replace through options.
Welcome to the list. As there are numerous ways to do updates in BaseX, feel free to give us more information on your insert and delete operations. Do you work with large single documents or many small documents? How large is your database?
While I query basex through perl, and try to connect through GUI, the perl connection aborts. Is there a parameters for parallel connections, please let me know.
I can?t tell why your perl connection is interrupted by opening the GUI (because they should be completely independent from each other). A step-by-step description on how you proceeded might be helpful.
Because there is no coupling between GUI and the client/server architecture, however, you should avoid running updates outside the GUI. Please check out [1] for more information.
Best, Christian
[1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
Thanks, Regards Martin Lourduswamy
Message: 2 Date: Sun, 20 May 2018 18:56:45 +0200 From: Christian Gr?n christian.gruen@gmail.com To: Martin Lourduswamy martin.louis@gmail.com, BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX performance improvements Message-ID: CAP94bnN6w7ZaaNu43AqdzrKYsaqaxKGSamRNP8zy3fmEEJZ6Hg@mail.gmail.com Content-Type: text/plain; charset="UTF-8"
Java heap space at C:\Users\lourduswamym\oracle_scripts/BaseXClient.pm line 213.
How much memory have you assigned to BaseX [1]?
Actually the error on the server log. The insert operation was taking 1+minutes due to slow network and then the error happened. I have been running this server instance continuously for 1 day with continuous insert from a per script.
If you create a new database, you can specify a directory with the files that are to be initially added to your database. This might save you a lot of time.
So I think I should be careful to run only one single process of client at a time or make sure memory is enough to handle multiple client operations, please let me know your thoughts,
If your documents are small, you shouldn?t usually encounter any errors. But it might have to do with the large number of documents you are dealing with. One more option is to distribute your docs across multiple databases (you can access all of them via a single XQuery expression).
Best, Christian
[1] http://docs.basex.org/wiki/Start_Scripts
Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:48 AM, Martin Lourduswamy martin.louis@gmail.com wrote:
Hi,
Thanks for the clarifications. My database is
DB Size:
- each XML node => 500 bytes
- total size of XML DB => 10million nodes(which will grow continuously as
new files are added). 3. I will be having multiple DB's of same size and more size as the DB continuously grows
I have to do a update of the nodes as I need to replace them. I chose delete + insert instead of replace for each of the node through perl, as I will make sure there are no duplicate nodes left in the database.(just a precaution that even if some nodes get duplicated, through some other processes I will be able to remove them by delete)
DB Speed:
I will disable logging as it might speed up the database operations.
DB Architecture
1 server(windows or Linux) running baseX multiple client computers trying to query the same machine for select through fn:doc(...) API to get the data through perl then do a delete + insert from perl
I would like to know if you think this might work for production and any suggestion for data retrieval and update Any changes to the way I architecture it might help
Also, I am thinking of indexing on the node id, as that is like the primary key on which I base my operations, any suggestions might also be helpful, Thanks again, Regards Martin Lourduswamy
On Sun, May 20, 2018 at 9:28 AM, Christian Gr?n christian.gruen@gmail.com wrote:
Hi Martin,
I am new to BaseX, I would like ot speed up XQuery of insert and delete and replace through options.
Welcome to the list. As there are numerous ways to do updates in BaseX, feel free to give us more information on your insert and delete operations. Do you work with large single documents or many small documents? How large is your database?
While I query basex through perl, and try to connect through GUI, the perl connection aborts. Is there a parameters for parallel connections, please let me know.
I can?t tell why your perl connection is interrupted by opening the GUI (because they should be completely independent from each other). A step-by-step description on how you proceeded might be helpful.
Because there is no coupling between GUI and the client/server architecture, however, you should avoid running updates outside the GUI. Please check out [1] for more information.
Best, Christian
[1] http://docs.basex.org/wiki/Startup#Concurrent_Operations
Thanks, Regards Martin Lourduswamy
End of BaseX-Talk Digest, Vol 101, Issue 37
basex-talk@mailman.uni-konstanz.de