I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling: - pre-evaluating fn:doc("enwiki-latest-pages-articles") - optimizing descendant-or-self step(s) - binding static variable $d - removing variable $d - simplifying flwor expression Result: element siteinfo { ... } Timing: - Parsing: 1.4 ms - Compiling: 52599.0 ms - Evaluating: 0.28 ms - Printing: 0.62 ms - Total Time: 52601.32 ms Query plan: <DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
Hi Erol,
you could try the following alternatives:
- use wildcards instead of explicit namespaces: fn:doc ("enwiki-latest-pages-articles")//*:siteinfo
- wrap the query with a position predicate: ( fn:doc ("enwiki-latest-pages-articles")//*:siteinfo ) [1]
Feel free to ask for more, Christian
On Mon, Apr 4, 2011 at 4:31 PM, Erol Akarsu eakarsu@gmail.com wrote:
I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing: - Parsing: 1.4 ms - Compiling: 52599.0 ms - Evaluating: 0.28 ms - Printing: 0.62 ms - Total Time: 52601.32 ms Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces>
</siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
I did a testing related to xquery update
It took 170 sec to insert a node into wikipedia xml database. Is there a faster way of doing it?
insert node <d/> after (fn:doc ("enwiki-latest-pages-articles")//*:page[w:title contains text "AccessibleComputing"] ) [1]
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling: - pre-evaluating fn:doc("enwiki-latest-pages-articles") - optimizing descendant-or-self step(s) Result: insert node element { "d" } { () } into (document-node { "enwiki-latest-pages-articles.xml" }/descendant::*:page[w:title contains text "AccessibleComputing"])[position() = 1] Timing: - Parsing: 0.27 ms - Compiling: 167.38 ms - Evaluating: 170264.31 ms - Printing: 45.12 ms - Total Time: 170477.1 ms Query plan: <Insert> <IterPosFilter> <IterPath> <DBNode name="enwiki-latest-pages-articles"/> <IterStep axis="descendant" test="*:page"> <FTContains> <AxisPath> <IterStep axis="child" test="w:title"/> </AxisPath> <FTWords> <Item value="AccessibleComputing" type="xs:string"/> </FTWords> </FTContains> </IterStep> </IterPath> <Pos min="1" max="1"/> </IterPosFilter> <CElem> <Item value="d" type="xs:QName"/> </CElem> </Insert>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote:
I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
Hi Erol,
running updates on large documents like yours is problematic in general - so I can't be sure what the problem is. Inserting a single element should not be a problem - but currently we have no test results that confirm this for very large documents.
At the moment I can only try to eliminate bottlenecks on the query side.
First you could try to rewrite your query. You could, again, use wildcards instead of an explicit namespace declaration to access the title element:
...//*:page[*:title ...]
Second, is the full text index active?
If nothing of this helps it is also convenient to carry out updates on a second instance of your database and switch between them to keep the application up and running ...
Please, let us know how it goes. Kind regards, Lukas
On Tue, Apr 5, 2011 at 4:20 PM, Erol Akarsu eakarsu@gmail.com wrote:
I did a testing related to xquery update
It took 170 sec to insert a node into wikipedia xml database. Is there a faster way of doing it?
insert node <d/> after (fn:doc ("enwiki-latest-pages-articles")//*:page[w:title contains text "AccessibleComputing"] ) [1]
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
Result: insert node element { "d" } { () } into (document-node { "enwiki-latest-pages-articles.xml" }/descendant::*:page[w:title contains text "AccessibleComputing"])[position() = 1] Timing:
- Parsing: 0.27 ms
- Compiling: 167.38 ms
- Evaluating: 170264.31 ms
- Printing: 45.12 ms
- Total Time: 170477.1 ms
Query plan:
<Insert> <IterPosFilter> <IterPath> <DBNode name="enwiki-latest-pages-articles"/> <IterStep axis="descendant" test="*:page"> <FTContains> <AxisPath> <IterStep axis="child" test="w:title"/> </AxisPath> <FTWords> <Item value="AccessibleComputing" type="xs:string"/> </FTWords> </FTContains> </IterStep> </IterPath> <Pos min="1" max="1"/> </IterPosFilter> <CElem> <Item value="d" type="xs:QName"/> </CElem> </Insert>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote:
I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling: - pre-evaluating fn:doc("enwiki-latest-pages-articles") - adding text() step - optimizing descendant-or-self step(s) - removing path with no index results - pre-evaluating (())[1] - binding static variable $res - pre-evaluating fn:doc("enwiki-latest-pages-articles") - binding static variable $d - adding text() step - optimizing descendant-or-self step(s) - removing path with no index results - simplifying flwor expression Result: () Timing: - Parsing: 0.46 ms - Compiling: 0.42 ms - Evaluating: 0.17 ms - Printing: 0.1 ms - Total Time: 1.15 ms Query plan: <sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote:
I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename>
<base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
Hi,
the following query could work for you:
declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; for $i in doc("enwiki-latest-pages-articles")//w:sitename return $i[. contains text "Wikipedia"]/..
-- Andreas
Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
Hi
I am having difficulty in running full text operators. This script gives siteinfo below declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; let $d := fn:doc ("enwiki-latest-pages-articles") return ($d//w:siteinfo)[1]
But return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT give same node Why "contains" ft operator behave incorrectly? I remember it was working fine. I just dropped and recreated database and turn all indexes. Can you help me? Query info is here:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- pre-evaluating (())[1]
- binding static variable $res
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- binding static variable $d
- adding text() step
- optimizing descendant-or-self step(s)
- removing path with no index results
- simplifying flwor expression
Result: () Timing:
- Parsing: 0.46 ms
- Compiling: 0.42 ms
- Evaluating: 0.17 ms
- Printing: 0.1 ms
- Total Time: 1.15 ms
Query plan:
<sequence size="0"/>
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu eakarsu@gmail.com wrote: I imported wikipedia xml into basex and tried to search it.
But searching it takes longer.
I tried to search one element that is first child of whole document and it took 52 sec. I know the XML file is very big 31GB. How can I optimize the search?
declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo return $d
Database info:
open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
info database
Database Properties Name: enwiki-latest-pages-articles Size: 23356 MB Nodes: 228090153 Height: 6
Database Creation Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml Time Stamp: 03.04.2011 12:29:15 Input Size: 30025 MB Encoding: UTF-8 Documents: 1 Whitespace Chopping: ON Entity Parsing: OFF
Indexes Up-to-date: true Path Summary: ON Text Index: ON Attribute Index: ON Full-Text Index: OFF
Timing info:
Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/"; Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... } Timing:
- Parsing: 1.4 ms
- Compiling: 52599.0 ms
- Evaluating: 0.28 ms
- Printing: 0.62 ms
- Total Time: 52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>
Result of query:
<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.17wmf1</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Media</namespace> <namespace key="-1" case="first-letter">Special</namespace> <namespace key="0" case="first-letter"/> <namespace key="1" case="first-letter">Talk</namespace> <namespace key="2" case="first-letter">User</namespace> <namespace key="3" case="first-letter">User talk</namespace> <namespace key="4" case="first-letter">Wikipedia</namespace> <namespace key="5" case="first-letter">Wikipedia talk</namespace> <namespace key="6" case="first-letter">File</namespace> <namespace key="7" case="first-letter">File talk</namespace> <namespace key="8" case="first-letter">MediaWiki</namespace> <namespace key="9" case="first-letter">MediaWiki talk</namespace> <namespace key="10" case="first-letter">Template</namespace> <namespace key="11" case="first-letter">Template talk</namespace> <namespace key="12" case="first-letter">Help</namespace> <namespace key="13" case="first-letter">Help talk</namespace> <namespace key="14" case="first-letter">Category</namespace> <namespace key="15" case="first-letter">Category talk</namespace> <namespace key="100" case="first-letter">Portal</namespace> <namespace key="101" case="first-letter">Portal talk</namespace> <namespace key="108" case="first-letter">Book</namespace> <namespace key="109" case="first-letter">Book talk</namespace> </namespaces> </siteinfo>
Thanks
Erol Akarsu
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Dear all,
I am trying to dynamically create an element using computed namespaces constructor and always I got the error [XPST0003] Expecting "}", found "m".
I am trying this example
let $nsURI := "http://example.org/metric-system", $attrname := "metric:unit", $attrvalue := "meter" return element {"altitude"} { namespace metric {$nsURI}, attribute {$attrname} {$attrvalue}, "10000" }
(from http://www.xml.com/pub/a/2003/09/10/xquery.html)
My question is is namespaces constructor supported?
Thanks in advance
Isidro
--- avast! Antivirus: Outbound message clean. Virus Database (VPS): 110411-0, 11-04-2011 Tested on: 11-04-2011 13:32:58 avast! - copyright (c) 1988-2011 AVAST Software. http://www.avast.com
Hi Isidro,
computed namespace constructors are part of XQuery 3.0 [1] and not yet part of BaseX. I also fear that, at the moment, there is no workaround that enables you to create namespaces dynamically (XQST0022 [2]).
Sorry for not being able to help ...
Kind regards, Lukas
[1] http://www.w3.org/TR/xquery-30/#id-computed-namespaces [2] http://www.w3.org/TR/xquery-30/#id-namespaces
On Mon, Apr 11, 2011 at 2:32 PM, Isidro Vila Verde jvverde@gmail.comwrote:
Dear all,
I am trying to dynamically create an element using computed namespaces constructor and always I got the error [XPST0003] Expecting "}", found "m".
I am trying this example
let $nsURI := "http://example.org/metric-system", $attrname := "metric:unit", $attrvalue := "meter" return element {"altitude"} { namespace metric {$nsURI}, attribute {$attrname} {$attrvalue}, "10000" }
(from http://www.xml.com/pub/a/2003/09/10/xquery.html)
My question is is namespaces constructor supported?
Thanks in advance
Isidro
avast! Antivirus: Outbound message clean. Virus Database (VPS): 110411-0, 11-04-2011 Tested on: 11-04-2011 13:32:58 avast! - copyright (c) 1988-2011 AVAST Software. http://www.avast.com
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Thank you, Lukas, for your quick response.
Best regards
Isidro
Em 11-04-2011 13:58, Lukas Kircher escreveu:
Hi Isidro,
computed namespace constructors are part of XQuery 3.0 [1] and not yet part of BaseX. I also fear that, at the moment, there is no workaround that enables you to create namespaces dynamically (XQST0022 [2]).
Sorry for not being able to help ...
Kind regards, Lukas
[1] http://www.w3.org/TR/xquery-30/#id-computed-namespaces [2] http://www.w3.org/TR/xquery-30/#id-namespaces
On Mon, Apr 11, 2011 at 2:32 PM, Isidro Vila Verde <jvverde@gmail.com mailto:jvverde@gmail.com> wrote:
Dear all, I am trying to dynamically create an element using computed namespaces constructor and always I got the error [XPST0003] Expecting "}", found "m". I am trying this example let $nsURI := "http://example.org/metric-system", $attrname := "metric:unit", $attrvalue := "meter" return element {"altitude"} { namespace metric {$nsURI}, attribute {$attrname} {$attrvalue}, "10000" } (from http://www.xml.com/pub/a/2003/09/10/xquery.html) My question is is namespaces constructor supported? Thanks in advance Isidro --- avast! Antivirus: Outbound message clean. Virus Database (VPS): 110411-0, 11-04-2011 Tested on: 11-04-2011 13:32:58 avast! - copyright (c) 1988-2011 AVAST Software. http://www.avast.com _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de <mailto:BaseX-Talk@mailman.uni-konstanz.de> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
--- avast! Antivirus: Outbound message clean. Virus Database (VPS): 110411-0, 11-04-2011 Tested on: 11-04-2011 14:15:45 avast! - copyright (c) 1988-2011 AVAST Software. http://www.avast.com
basex-talk@mailman.uni-konstanz.de