I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname
return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu eakarsu@gmail.com wrote:
Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.comwrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler < andreas.weiler@uni-konstanz.de> wrote:
Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/ "; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.comwrote:
> I imported wikipedia xml into basex and tried to search it. > > But searching it takes longer. > > I tried to search one element that is first child of whole document > and it took 52 sec. > I know the XML file is very big 31GB. How can I optimize the search? > > declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; > > let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo > return $d > > Database info: > > > open enwiki-latest-pages-articles > Database 'enwiki-latest-pages-articles' opened in 778.49 ms. > > info database > Database Properties > Name: enwiki-latest-pages-articles > Size: 23356 MB > Nodes: 228090153 > Height: 6 > > Database Creation > Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml > Time Stamp: 03.04.2011 12:29:15 > Input Size: 30025 MB > Encoding: UTF-8 > Documents: 1 > Whitespace Chopping: ON > Entity Parsing: OFF > > Indexes > Up-to-date: true > Path Summary: ON > Text Index: ON > Attribute Index: ON > Full-Text Index: OFF > > > > > Timing info: > > Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/ > "; > Compiling: > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - optimizing descendant-or-self step(s) > - binding static variable $d > - removing variable $d > - simplifying flwor expression > Result: element siteinfo { ... } > Timing: > - Parsing: 1.4 ms > - Compiling: 52599.0 ms > - Evaluating: 0.28 ms > - Printing: 0.62 ms > - Total Time: 52601.32 ms > Query plan: > <DBNode name="enwiki-latest-pages-articles" pre="5"/> > > > > Result of query: > > <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance%22%3E > <sitename>Wikipedia</sitename> > <base>http://en.wikipedia.org/wiki/Main_Page</base> > <generator>MediaWiki 1.17wmf1</generator> > <case>first-letter</case> > <namespaces> > <namespace key="-2" case="first-letter">Media</namespace> > <namespace key="-1" case="first-letter">Special</namespace> > <namespace key="0" case="first-letter"/> > <namespace key="1" case="first-letter">Talk</namespace> > <namespace key="2" case="first-letter">User</namespace> > <namespace key="3" case="first-letter">User talk</namespace> > <namespace key="4" case="first-letter">Wikipedia</namespace> > <namespace key="5" case="first-letter">Wikipedia talk</namespace> > <namespace key="6" case="first-letter">File</namespace> > <namespace key="7" case="first-letter">File talk</namespace> > <namespace key="8" case="first-letter">MediaWiki</namespace> > <namespace key="9" case="first-letter">MediaWiki talk</namespace> > <namespace key="10" case="first-letter">Template</namespace> > <namespace key="11" case="first-letter">Template talk</namespace> > <namespace key="12" case="first-letter">Help</namespace> > <namespace key="13" case="first-letter">Help talk</namespace> > <namespace key="14" case="first-letter">Category</namespace> > <namespace key="15" case="first-letter">Category talk</namespace> > <namespace key="100" case="first-letter">Portal</namespace> > <namespace key="101" case="first-letter">Portal talk</namespace> > <namespace key="108" case="first-letter">Book</namespace> > <namespace key="109" case="first-letter">Book talk</namespace> > </namespaces> > </siteinfo> > > > > Thanks > > Erol Akarsu > > _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Erol, Thanks for your report. Could you (off list) provide me with a small part of the source file for reproducing this issue?
Kind regards Michael
-- Mit freundlichen Grüßen Michael Seiferle
Am 27.05.2011 um 16:38 schrieb Erol Akarsu eakarsu@gmail.com:
I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu eakarsu@gmail.com wrote: Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote: I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote: I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote: Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote: Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler andreas.weiler@uni-konstanz.de wrote: Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Michael ,
Just run previous xquery script I have sent over *testxquery.xml. You will see it.
Thanks
* On Fri, May 27, 2011 at 1:44 PM, Michael Seiferle < michael.seiferle@uni-konstanz.de> wrote:
Hi Erol, Thanks for your report. Could you (off list) provide me with a small part of the source file for reproducing this issue?
Kind regards Michael
-- Mit freundlichen Grüßen Michael Seiferle
Am 27.05.2011 um 16:38 schrieb Erol Akarsu eakarsu@gmail.com:
I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname
return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID
cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu < eakarsu@gmail.com eakarsu@gmail.com> wrote:
Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu < eakarsu@gmail.com eakarsu@gmail.com> wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu < eakarsu@gmail.com eakarsu@gmail.com> wrote:
I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu < eakarsu@gmail.com eakarsu@gmail.com> wrote:
Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu < eakarsu@gmail.com eakarsu@gmail.com> wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler <andreas.weiler@uni-konstanz.de andreas.weiler@uni-konstanz.de> wrote:
> Hi, > > the following query could work for you: > > declare namespace w=" http://www.mediawiki.org/xml/export-0.5/ > http://www.mediawiki.org/xml/export-0.5/"; > for $i in doc("enwiki-latest-pages-articles")//w:sitename > return $i[. contains text "Wikipedia"]/.. > > -- Andreas > > Am 09.04.2011 um 20:43 schrieb Erol Akarsu: > > Hi > > I am having difficulty in running full text operators. This script > gives siteinfo below > declare namespace w=" http://www.mediawiki.org/xml/export-0.5/ > http://www.mediawiki.org/xml/export-0.5/"; > let $d := fn:doc ("enwiki-latest-pages-articles") > return ($d//w:siteinfo)[1] > > But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does > NOT give same node > Why "contains" ft operator behave incorrectly? I remember it was > working fine. I just dropped and recreated database and turn all indexes. > Can you help me? > Query info is here: > > Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/ > http://www.mediawiki.org/xml/export-0.5/"; > Compiling: > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - adding text() step > - optimizing descendant-or-self step(s) > - removing path with no index results > - pre-evaluating (())[1] > - binding static variable $res > - pre-evaluating fn:doc("enwiki-latest-pages-articles") > - binding static variable $d > - adding text() step > - optimizing descendant-or-self step(s) > - removing path with no index results > - simplifying flwor expression > Result: () > Timing: > - Parsing: 0.46 ms > - Compiling: 0.42 ms > - Evaluating: 0.17 ms > - Printing: 0.1 ms > - Total Time: 1.15 ms > Query plan: > <sequence size="0"/> > > > > <siteinfo xmlns=" <http://www.mediawiki.org/xml/export-0.5/> > http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance> > http://www.w3.org/2001/XMLSchema-instance"> > <sitename>Wikipedia</sitename> > <base> http://en.wikipedia.org/wiki/Main_Page > http://en.wikipedia.org/wiki/Main_Page</base> > <generator>MediaWiki 1.17wmf1</generator> > <case>first-letter</case> > <namespaces> > <namespace key="-2" case="first-letter">Media</namespace> > <namespace key="-1" case="first-letter">Special</namespace> > <namespace key="0" case="first-letter"/> > <namespace key="1" case="first-letter">Talk</namespace> > <namespace key="2" case="first-letter">User</namespace> > <namespace key="3" case="first-letter">User talk</namespace> > <namespace key="4" case="first-letter">Wikipedia</namespace> > <namespace key="5" case="first-letter">Wikipedia talk</namespace> > <namespace key="6" case="first-letter">File</namespace> > <namespace key="7" case="first-letter">File talk</namespace> > <namespace key="8" case="first-letter">MediaWiki</namespace> > <namespace key="9" case="first-letter">MediaWiki talk</namespace> > <namespace key="10" case="first-letter">Template</namespace> > <namespace key="11" case="first-letter">Template talk</namespace> > <namespace key="12" case="first-letter">Help</namespace> > <namespace key="13" case="first-letter">Help talk</namespace> > <namespace key="14" case="first-letter">Category</namespace> > <namespace key="15" case="first-letter">Category talk</namespace> > <namespace key="100" case="first-letter">Portal</namespace> > <namespace key="101" case="first-letter">Portal talk</namespace> > <namespace key="108" case="first-letter">Book</namespace> > <namespace key="109" case="first-letter">Book talk</namespace> > </namespaces> > </siteinfo> > > On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu < eakarsu@gmail.com > eakarsu@gmail.com> wrote: > >> I imported wikipedia xml into basex and tried to search it. >> >> But searching it takes longer. >> >> I tried to search one element that is first child of whole document >> and it took 52 sec. >> I know the XML file is very big 31GB. How can I optimize the search? >> >> declare namespace w=" http://www.mediawiki.org/xml/export-0.5/ >> http://www.mediawiki.org/xml/export-0.5/"; >> >> let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo >> return $d >> >> Database info: >> >> > open enwiki-latest-pages-articles >> Database 'enwiki-latest-pages-articles' opened in 778.49 ms. >> > info database >> Database Properties >> Name: enwiki-latest-pages-articles >> Size: 23356 MB >> Nodes: 228090153 >> Height: 6 >> >> Database Creation >> Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml >> Time Stamp: 03.04.2011 12:29:15 >> Input Size: 30025 MB >> Encoding: UTF-8 >> Documents: 1 >> Whitespace Chopping: ON >> Entity Parsing: OFF >> >> Indexes >> Up-to-date: true >> Path Summary: ON >> Text Index: ON >> Attribute Index: ON >> Full-Text Index: OFF >> > >> >> >> Timing info: >> >> Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/ >> http://www.mediawiki.org/xml/export-0.5/"; >> Compiling: >> - pre-evaluating fn:doc("enwiki-latest-pages-articles") >> - optimizing descendant-or-self step(s) >> - binding static variable $d >> - removing variable $d >> - simplifying flwor expression >> Result: element siteinfo { ... } >> Timing: >> - Parsing: 1.4 ms >> - Compiling: 52599.0 ms >> - Evaluating: 0.28 ms >> - Printing: 0.62 ms >> - Total Time: 52601.32 ms >> Query plan: >> <DBNode name="enwiki-latest-pages-articles" pre="5"/> >> >> >> >> Result of query: >> >> <siteinfo xmlns=" <http://www.mediawiki.org/xml/export-0.5/> >> http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="<http://www.w3.org/2001/XMLSchema-instance> >> http://www.w3.org/2001/XMLSchema-instance"> >> <sitename>Wikipedia</sitename> >> <base> http://en.wikipedia.org/wiki/Main_Page >> http://en.wikipedia.org/wiki/Main_Page</base> >> <generator>MediaWiki 1.17wmf1</generator> >> <case>first-letter</case> >> <namespaces> >> <namespace key="-2" case="first-letter">Media</namespace> >> <namespace key="-1" case="first-letter">Special</namespace> >> <namespace key="0" case="first-letter"/> >> <namespace key="1" case="first-letter">Talk</namespace> >> <namespace key="2" case="first-letter">User</namespace> >> <namespace key="3" case="first-letter">User talk</namespace> >> <namespace key="4" case="first-letter">Wikipedia</namespace> >> <namespace key="5" case="first-letter">Wikipedia >> talk</namespace> >> <namespace key="6" case="first-letter">File</namespace> >> <namespace key="7" case="first-letter">File talk</namespace> >> <namespace key="8" case="first-letter">MediaWiki</namespace> >> <namespace key="9" case="first-letter">MediaWiki >> talk</namespace> >> <namespace key="10" case="first-letter">Template</namespace> >> <namespace key="11" case="first-letter">Template >> talk</namespace> >> <namespace key="12" case="first-letter">Help</namespace> >> <namespace key="13" case="first-letter">Help talk</namespace> >> <namespace key="14" case="first-letter">Category</namespace> >> <namespace key="15" case="first-letter">Category >> talk</namespace> >> <namespace key="100" case="first-letter">Portal</namespace> >> <namespace key="101" case="first-letter">Portal talk</namespace> >> <namespace key="108" case="first-letter">Book</namespace> >> <namespace key="109" case="first-letter">Book talk</namespace> >> </namespaces> >> </siteinfo> >> >> >> >> Thanks >> >> Erol Akarsu >> >> > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk > > >
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Erol,
I received your file and ran a minimally modified version (added: let $prod := doc('textxquery.xml')) of your testquery.
The results looked fine to me, so I tested it with two other implementations, both equal to BaseX' results .
What is the expected output of your query? Maybe we can work it out the "other way round" :-)
Kind regards Michael
let $prod := doc('textxquery.xml') for $p in $prod//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Am 27.05.2011 um 20:29 schrieb Erol Akarsu:
Michael ,
Just run previous xquery script I have sent over testxquery.xml. You will see it.
Thanks
On Fri, May 27, 2011 at 1:44 PM, Michael Seiferle michael.seiferle@uni-konstanz.de wrote: Hi Erol, Thanks for your report. Could you (off list) provide me with a small part of the source file for reproducing this issue?
Kind regards Michael
-- Mit freundlichen Grüßen Michael Seiferle
Am 27.05.2011 um 16:38 schrieb Erol Akarsu eakarsu@gmail.com:
I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu eakarsu@gmail.com wrote: Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote: I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote: I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote: Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote: Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler andreas.weiler@uni-konstanz.de wrote: Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Thanks for looking at it.
As I send in email chain, you will see in output this. But even though we group (innermost group by) by @NAME attribute, stillwe are getting dublicate values here like "Spec", Features"
<CID cidid="384"> <PROPVALS name="Features"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> <PROPVALS name="Features"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> <PROPVALS name="Features"/> <PROPVALS name="Manufacturer Warranty"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> </CID>
On Sat, May 28, 2011 at 2:37 PM, Michael Seiferle < michael.seiferle@uni-konstanz.de> wrote:
Hi Erol,
I received your file and ran a minimally modified version (added: let $prod := doc('textxquery.xml')) of your testquery.
The results looked fine to me, so I tested it with two other implementations, both equal to BaseX' results .
What is the expected output of your query? Maybe we can work it out the "other way round" :-)
Kind regards Michael
let $prod := doc('textxquery.xml') for $p in $prod//RECORD let $cid := $p/PROP[@NAME eq
"SubCategoryId"]/PVAL
let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let
$propname := $prop/@NAME
let
$propvals := $prop/PVAL
group by $propname
order by $propname
return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID
cidid="{$cid}">{$propgprs}</CID>
Am 27.05.2011 um 20:29 schrieb Erol Akarsu:
Michael ,
Just run previous xquery script I have sent over testxquery.xml. You will
see it.
Thanks
On Fri, May 27, 2011 at 1:44 PM, Michael Seiferle <
michael.seiferle@uni-konstanz.de> wrote:
Hi Erol, Thanks for your report. Could you (off list) provide me with a small part
of the source file for reproducing this issue?
Kind regards Michael
-- Mit freundlichen Grüßen Michael Seiferle
Am 27.05.2011 um 16:38 schrieb Erol Akarsu eakarsu@gmail.com:
I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as
the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq
"SubCategoryId"]/PVAL
let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops
let $propname := $prop/@NAME
let $propvals := $prop/PVAL
group by $propname
order by $propname
return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID
cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu eakarsu@gmail.com wrote: Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com
wrote:
I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before
writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote: I tried to delete some nodes with "delete nodes .." xquery update
command. It actually did what was requested.
I checked the database size that is still same as previous one. I know
the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has
real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote: Hi,
Have we thought on clustering basex servers so we can partition xml
documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition
RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts.
Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com
wrote:
Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file
Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet
search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate
indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB
for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler <
andreas.weiler@uni-konstanz.de> wrote:
Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script
gives siteinfo below
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does
NOT give same node
Why "contains" ft operator behave incorrectly? I remember it was
working fine. I just dropped and recreated database and turn all indexes. Can you help me?
Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance%22%3E
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and
it took 52 sec.
I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance%22%3E
<sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Erol,
if you leave out the last group by you see what is actually happening:
( let $prod := doc('testxquery') for $p in $prod//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
)[./@cidid eq "384"]
The behavior you observe is correct for the data you provided.
To see what happened have a look at the following xquery:
( let $prod := doc('testxquery') for $p in $prod//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP
order by $cid return <CID cidid="{$cid}">{ for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS> }</CID>
)[./@cidid eq "384"]
I removed the last group by and inlined $propgprs to make the point clear: For @cidid 384: It returns 3 CID-elements, thus if you group by cid/@id those 3 elements will be concatenated, this is why you get duplicates.
Hope I was concise enough to help against the confusion.
Kind Regards Michael Am 28.05.2011 um 21:50 schrieb Erol Akarsu:
Thanks for looking at it.
As I send in email chain, you will see in output this. But even though we group (innermost group by) by @NAME attribute, stillwe are getting dublicate values here like "Spec", Features"
<CID cidid="384"> <PROPVALS name="Features"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> <PROPVALS name="Features"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> <PROPVALS name="Features"/> <PROPVALS name="Manufacturer Warranty"/> <PROPVALS name="Model"/> <PROPVALS name="Spec"/> </CID>
On Sat, May 28, 2011 at 2:37 PM, Michael Seiferle michael.seiferle@uni-konstanz.de wrote: Hi Erol,
I received your file and ran a minimally modified version (added: let $prod := doc('textxquery.xml')) of your testquery.
The results looked fine to me, so I tested it with two other implementations, both equal to BaseX' results .
What is the expected output of your query? Maybe we can work it out the "other way round" :-)
Kind regards Michael
let $prod := doc('textxquery.xml') for $p in $prod//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Am 27.05.2011 um 20:29 schrieb Erol Akarsu:
Michael ,
Just run previous xquery script I have sent over testxquery.xml. You will see it.
Thanks
On Fri, May 27, 2011 at 1:44 PM, Michael Seiferle michael.seiferle@uni-konstanz.de wrote: Hi Erol, Thanks for your report. Could you (off list) provide me with a small part of the source file for reproducing this issue?
Kind regards Michael
-- Mit freundlichen Grüßen Michael Seiferle
Am 27.05.2011 um 16:38 schrieb Erol Akarsu eakarsu@gmail.com:
I am having issue with "group by" operator.
If I use it within another "group by", nested "group by" won't work as the following. $propvals in nested "group by" won't include grouped values.
let $list := for $p in $prods//RECORD let $cid := $p/PROP[@NAME eq "SubCategoryId"]/PVAL let $allprops := $p/PROPGROUP/PROP let $propgprs := for $prop in $allprops let $propname := $prop/@NAME let $propvals := $prop/PVAL group by $propname order by $propname return <PROPVALS name="{$propname}">{$propvals}</PROPVALS>
group by $cid order by $cid return <CID cidid="{$cid}">{$propgprs}</CID>
Thanks
On Fri, May 13, 2011 at 9:08 AM, Erol Akarsu eakarsu@gmail.com wrote: Is there a way Basex server can send us JSON output instead of xml?
Thanks
On Wed, May 11, 2011 at 10:28 AM, Erol Akarsu eakarsu@gmail.com wrote: I am filtering one big xml db that 800Mb stored into basex db already.
When I write filtered data into a file, I am getting out of memory.
I know basex is generating result filtered document in memory before writing and facing memory error.
Can basex write results block by block?
Thanks
On Wed, May 4, 2011 at 9:38 AM, Erol Akarsu eakarsu@gmail.com wrote: I tried to delete some nodes with "delete nodes .." xquery update command. It actually did what was requested.
I checked the database size that is still same as previous one. I know the nodes deleted has a lot of data and I think database indexes would be adjusted accordingly.
Then, I exported database as xml file and recreated db. I can see it has real size.
My question is why DB is not adjusting indexes when nodes deleted?
Thanks
Erol Akarsu
On Fri, Apr 29, 2011 at 9:23 AM, Erol Akarsu eakarsu@gmail.com wrote: Hi,
Have we thought on clustering basex servers so we can partition xml documents?
Here, I am only interested in a global partitioning:
Let's say if we have xml document like this, I would like to partition RECORDS content so that each host will have equal number of RECORD elements. Then, we need to aggregate results of individual hosts. Can we implement this simple clustering framework with Basex?
<RECORDS> <RECORD> ..... </RECORD> <RECORD> ..... </RECORD> </RECORDS>
On Mon, Apr 11, 2011 at 12:35 PM, Erol Akarsu eakarsu@gmail.com wrote: Ok,
I was able to run full text search with another XML Database.
I am primarily interested in how Basex will play with big XML file Wikipedia.
Actually, Database create of wikipedia is fine. But when I add full tet search and indexes for it, it always throws out of memory exception error.
I have changed -Xmx with 6GB that is still not enough to generate indexes for Wikipedia.
Can you help me on how to generate indexes with a machine that case 6GB for Basex process?
Thanks
Erol Akarsu
On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler andreas.weiler@uni-konstanz.de wrote: Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de