I have a Java process that continually scans for incoming .xml files deposited into a system folder on the OS. When it finds a file (say a.xml), it will create a DB for that file (call it DB-A) and load the XML file into that database (DB-A).Note: DB names will be guaranteed to be unique when created, so there will never be 2 DB-A databases created. As this file is processed by a seondar, other XML files will be added into DB-A - but all documents will relate only to the processing of the a.xml file (for example, various statistics, etc.). Now, there is a secondary/separate java process, that scans the database instances that have been created with the XML file loaded. This secondary process does some querying on that file and adds additional XML documents to that database. That is only this process will add new documents to the database.
So my question is this: 1. With process 1 creating and inserting the original .xml file, is there a chance for database contention or is this architecture pretty safe from contention? Note: both Java process are simply using the BaseX.jar file. 2. If I added a third Java process in the future, that would a) only access existing documents in read only mode b) could add new documents to the database that no other process would read or update, is this safe from contention?
Thanks in advance.
Hi Buddy on web,
For all your questions, I can probably give you a short answer: you will need to use the client/server architecture of BaseX if you want to concurrently read and update one database. For general information on our transaction management, you could have a look at our Wiki article [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Transaction_Management
On Thu, Dec 3, 2015 at 9:18 PM, buddyonweb-software@yahoo.com wrote:
I have a Java process that continually scans for incoming .xml files deposited into a system folder on the OS. When it finds a file (say a.xml), it will create a DB for that file (call it DB-A) and load the XML file into that database (DB-A). Note: DB names will be guaranteed to be unique when created, so there will never be 2 DB-A databases created.
As this file is processed by a seondar, other XML files will be added into DB-A - but all documents will relate only to the processing of the a.xml file (for example, various statistics, etc.).
Now, there is a secondary/separate java process, that scans the database instances that have been created with the XML file loaded. This secondary process does some querying on that file and adds additional XML documents to that database. That is only this process will add new documents to the database.
So my question is this:
- With process 1 creating and inserting the original .xml file, is there a
chance for database contention or is this architecture pretty safe from contention? Note: both Java process are simply using the BaseX.jar file.
- If I added a third Java process in the future, that would a) only access
existing documents in read only mode b) could add new documents to the database that no other process would read or update, is this safe from contention?
Thanks in advance.
Hi,
Is there a way to return all matches when searching a large XML structure? For example, return the genomic keywords that matched anywhere in $study using the following query:
for $study in db:open('CTGov')/clinical_study let $result := $study contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' } let score $score := $result where $score >= 0.01 return $study/id_info/nct_id (: this is just the Study ID :)
Ideally it would include an indication of where in the tree the matches are (e.g., that ‘exome’ was found in $study/official_title and in $article/keywords).
This could presumably be done using regular expression matching (after serializing the tree into a text string) but it does not seem an elegant solution.
Thanks, Ron
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark
On Thu, Dec 10, 2015 at 4:33 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Is there a way to return all matches when searching a large XML structure? For example, return the genomic keywords that matched anywhere in $study using the following query:
for $study in db:open('CTGov')/clinical_study let $result := $study contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' } let score $score := $result where $score >= 0.01 return $study/id_info/nct_id (: this is just the Study ID :)
Ideally it would include an indication of where in the tree the matches are (e.g., that ‘exome’ was found in $study/official_title and in $article/keywords).
This could presumably be done using regular expression matching (after serializing the tree into a text string) but it does not seem an elegant solution.
Thanks, Ron
Thanks, Christian. The following works as expected (the output contains the matches with their surrounding context)
for $study in db:open('CTGov')/clinical_study let $result := $study contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' } let score $score := $result where $score >= 0.01 return ft:extract($study//*[text() contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' }])
Is it possible to combine the two patterns (i.e., the selection criteria and the extraction in the return) into a single one?
Perhaps this is what ft:mark is supposed to do but I could not get it to work...
Best, Ron
On December 10, 2015 at 11:24:38 AM, Christian Grün (christian.gruen@gmail.com) wrote:
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark
On Thu, Dec 10, 2015 at 4:33 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Is there a way to return all matches when searching a large XML structure? For example, return the genomic keywords that matched anywhere in $study using the following query:
for $study in db:open('CTGov')/clinical_study let $result := $study contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' } let score $score := $result where $score >= 0.01 return $study/id_info/nct_id (: this is just the Study ID :)
Ideally it would include an indication of where in the tree the matches are (e.g., that ‘exome’ was found in $study/official_title and in $article/keywords).
This could presumably be done using regular expression matching (after serializing the tree into a text string) but it does not seem an elegant solution.
Thanks, Ron
Hi Ron,
Is it possible to combine the two patterns (i.e., the selection criteria and the extraction in the return) into a single one?
ft:extract works the same as ft:mark, but it additionally chops your results down to the relevant parts of the result.
Here are two ways how to shorten your query:
(: Variant 1 :) let $terms := ('genomics', 'genomic') for $study in db:open('CTGov')/clinical_study//* [text() contains text { $terms }] return ft:extract($study[text() contains text { $terms }])
(: Variant 2 :) let $terms := ('genomics', 'genomic') return ft:extract(db:open('CTGov')/clinical_study//* [text() contains text { $terms }])
Christian
On December 10, 2015 at 11:24:38 AM, Christian Grün (christian.gruen@gmail.com) wrote:
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark
On Thu, Dec 10, 2015 at 4:33 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Is there a way to return all matches when searching a large XML structure? For example, return the genomic keywords that matched anywhere in $study using the following query:
for $study in db:open('CTGov')/clinical_study let $result := $study contains text { 'genomics', 'genomic', 'transcriptome', 'exome', 'whole genome', 'microarray', 'proteome', 'metabolome' } let score $score := $result where $score >= 0.01 return $study/id_info/nct_id (: this is just the Study ID :)
Ideally it would include an indication of where in the tree the matches are (e.g., that ‘exome’ was found in $study/official_title and in $article/keywords).
This could presumably be done using regular expression matching (after serializing the tree into a text string) but it does not seem an elegant solution.
Thanks, Ron
On Thu, 2015-12-10 at 17:24 +0100, Christian Grün wrote:
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
And what happens if a full text match crosses an element boundary, e.g. a search for "blue socks" matching, <p>He wore <sc>dark blue</sc> socks that day.</p> could not return, <p>He wore <sc>dark <match>blue</sc> socks</match> that day.</p>
(Yes, I should test it, sorry! but the docs should probably mention it. it was a big part of the XPath/XQuery Full Text design early on)
Liam
Dear Liam,
I am afraid that full text index will not find "blue socks", because it does not cross text() node boundaries:
http://docs.basex.org/wiki/Full-Text#Mixed_Content
Best regards, Fabrice
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Liam R. E. Quin Envoyé : jeudi 10 décembre 2015 21:37 À : Christian Grün christian.gruen@gmail.com; Ron Katriel rkatriel@mdsol.com Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Returning text matches
On Thu, 2015-12-10 at 17:24 +0100, Christian Grün wrote:
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
And what happens if a full text match crosses an element boundary, e.g. a search for "blue socks" matching, <p>He wore <sc>dark blue</sc> socks that day.</p> could not return, <p>He wore <sc>dark <match>blue</sc> socks</match> that day.</p>
(Yes, I should test it, sorry! but the docs should probably mention it. it was a big part of the XPath/XQuery Full Text design early on)
Liam
-- Liam R. E. Quin liam@w3.org The World Wide Web Consortium (W3C)
I am afraid that full text index will not find "blue socks", because it does not cross text() node boundaries:
Exactly. You’ll need to do something like:
(: "... update () is used to transform the node to a "database node" (find more info in the Wiki) :)
for $xml in <xml> <p>He wore <sc>dark blue</sc> socks that day.</p> </xml> update () where $xml contains text 'blue socks' return ft:mark( $xml[.//text() contains text { 'blue', 'socks' }] )
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Liam R. E. Quin Envoyé : jeudi 10 décembre 2015 21:37 À : Christian Grün christian.gruen@gmail.com; Ron Katriel rkatriel@mdsol.com Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Returning text matches
On Thu, 2015-12-10 at 17:24 +0100, Christian Grün wrote:
Hi Ron,
You can use ft:mark and ft:extract to highlights matches in a full-text result [1].
And what happens if a full text match crosses an element boundary, e.g. a search for "blue socks" matching,
<p>He wore <sc>dark blue</sc> socks that day.</p> could not return, <p>He wore <sc>dark <match>blue</sc> socks</match> that day.</p>
(Yes, I should test it, sorry! but the docs should probably mention it. it was a big part of the XPath/XQuery Full Text design early on)
Liam
-- Liam R. E. Quin liam@w3.org The World Wide Web Consortium (W3C)
basex-talk@mailman.uni-konstanz.de