Christian,

Do we have Basex extension for writing out an xml document into a physical file system?
It would be very helpful to do xquery operations and write result into file directly from inside script.

Thanks

Erol Akarsu

On Mon, Apr 11, 2011 at 12:37 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Erol,

> I have changed -Xmx with 6GB that is still not enough to generate indexes
> for Wikipedia.

you might want to try the wildcard index, which you can activate via
"set wildcards on", or the GUI (Database -> New -> Full-Text ->
Support Wildcards); it has better optimizations for limited main
memory.

Hope this helps,
Christian

>
> Can you help me on how to generate indexes with a machine that case  6GB for
> Basex process?
>
> Thanks
>
> Erol Akarsu
>
>
>
> On Sun, Apr 10, 2011 at 4:07 AM, Andreas Weiler
> <andreas.weiler@uni-konstanz.de> wrote:
>>
>> Hi,
>> the following query could work for you:
>> declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
>> for $i in doc("enwiki-latest-pages-articles")//w:sitename
>> return $i[. contains text "Wikipedia"]/..
>> -- Andreas
>> Am 09.04.2011 um 20:43 schrieb Erol Akarsu:
>>
>> Hi
>>
>> I am having difficulty in running full text operators. This script gives
>> siteinfo below
>> declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
>>  let $d :=  fn:doc ("enwiki-latest-pages-articles")
>> return ($d//w:siteinfo)[1]
>>
>> But  return $d//w:siteinfo[w:sitename contains text 'Wikipedia'] does NOT
>> give same node
>> Why "contains" ft operator behave incorrectly? I remember it was working
>> fine. I just dropped and recreated database and turn all indexes. Can you
>> help me?
>> Query info is here:
>>
>> Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
>> Compiling:
>> - pre-evaluating fn:doc("enwiki-latest-pages-articles")
>> - adding text() step
>> - optimizing descendant-or-self step(s)
>> - removing path with no index results
>> - pre-evaluating (())[1]
>> - binding static variable $res
>> - pre-evaluating fn:doc("enwiki-latest-pages-articles")
>> - binding static variable $d
>> - adding text() step
>> - optimizing descendant-or-self step(s)
>> - removing path with no index results
>> - simplifying flwor expression
>> Result: ()
>> Timing:
>>  - Parsing:  0.46 ms
>>  - Compiling:  0.42 ms
>>  - Evaluating:  0.17 ms
>>  - Printing:  0.1 ms
>>  - Total Time:  1.15 ms
>> Query plan:
>> <sequence size="0"/>
>>
>>
>>
>> <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/"
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>   <sitename>Wikipedia</sitename>
>>   <base>http://en.wikipedia.org/wiki/Main_Page</base>
>>   <generator>MediaWiki 1.17wmf1</generator>
>>   <case>first-letter</case>
>>   <namespaces>
>>     <namespace key="-2" case="first-letter">Media</namespace>
>>     <namespace key="-1" case="first-letter">Special</namespace>
>>     <namespace key="0" case="first-letter"/>
>>     <namespace key="1" case="first-letter">Talk</namespace>
>>     <namespace key="2" case="first-letter">User</namespace>
>>     <namespace key="3" case="first-letter">User talk</namespace>
>>     <namespace key="4" case="first-letter">Wikipedia</namespace>
>>     <namespace key="5" case="first-letter">Wikipedia talk</namespace>
>>     <namespace key="6" case="first-letter">File</namespace>
>>     <namespace key="7" case="first-letter">File talk</namespace>
>>     <namespace key="8" case="first-letter">MediaWiki</namespace>
>>     <namespace key="9" case="first-letter">MediaWiki talk</namespace>
>>     <namespace key="10" case="first-letter">Template</namespace>
>>     <namespace key="11" case="first-letter">Template talk</namespace>
>>     <namespace key="12" case="first-letter">Help</namespace>
>>     <namespace key="13" case="first-letter">Help talk</namespace>
>>     <namespace key="14" case="first-letter">Category</namespace>
>>     <namespace key="15" case="first-letter">Category talk</namespace>
>>     <namespace key="100" case="first-letter">Portal</namespace>
>>     <namespace key="101" case="first-letter">Portal talk</namespace>
>>     <namespace key="108" case="first-letter">Book</namespace>
>>     <namespace key="109" case="first-letter">Book talk</namespace>
>>   </namespaces>
>> </siteinfo>
>>
>> On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu <eakarsu@gmail.com> wrote:
>>>
>>> I imported  wikipedia xml into basex and tried to search it.
>>>
>>> But searching it takes longer.
>>>
>>> I tried to search one element that is first child of whole document and
>>> it took 52 sec.
>>> I know the XML file is very big 31GB. How can I optimize the search?
>>>
>>> declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
>>>
>>> let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo
>>> return $d
>>>
>>> Database info:
>>>
>>> > open enwiki-latest-pages-articles
>>> Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
>>> > info database
>>> Database Properties
>>>  Name: enwiki-latest-pages-articles
>>>  Size: 23356 MB
>>>  Nodes: 228090153
>>>  Height: 6
>>>
>>> Database Creation
>>>  Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml
>>>  Time Stamp: 03.04.2011 12:29:15
>>>  Input Size: 30025 MB
>>>  Encoding: UTF-8
>>>  Documents: 1
>>>  Whitespace Chopping: ON
>>>  Entity Parsing: OFF
>>>
>>> Indexes
>>>  Up-to-date: true
>>>  Path Summary: ON
>>>  Text Index: ON
>>>  Attribute Index: ON
>>>  Full-Text Index: OFF
>>> >
>>>
>>>
>>> Timing info:
>>>
>>> Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
>>> Compiling:
>>> - pre-evaluating fn:doc("enwiki-latest-pages-articles")
>>> - optimizing descendant-or-self step(s)
>>> - binding static variable $d
>>> - removing variable $d
>>> - simplifying flwor expression
>>> Result: element siteinfo { ... }
>>> Timing:
>>>  - Parsing:  1.4 ms
>>>  - Compiling:  52599.0 ms
>>>  - Evaluating:  0.28 ms
>>>  - Printing:  0.62 ms
>>>  - Total Time:  52601.32 ms
>>> Query plan:
>>> <DBNode name="enwiki-latest-pages-articles" pre="5"/>
>>>
>>>
>>>
>>> Result of query:
>>>
>>> <siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/"
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>   <sitename>Wikipedia</sitename>
>>>   <base>http://en.wikipedia.org/wiki/Main_Page</base>
>>>   <generator>MediaWiki 1.17wmf1</generator>
>>>   <case>first-letter</case>
>>>   <namespaces>
>>>     <namespace key="-2" case="first-letter">Media</namespace>
>>>     <namespace key="-1" case="first-letter">Special</namespace>
>>>     <namespace key="0" case="first-letter"/>
>>>     <namespace key="1" case="first-letter">Talk</namespace>
>>>     <namespace key="2" case="first-letter">User</namespace>
>>>     <namespace key="3" case="first-letter">User talk</namespace>
>>>     <namespace key="4" case="first-letter">Wikipedia</namespace>
>>>     <namespace key="5" case="first-letter">Wikipedia talk</namespace>
>>>     <namespace key="6" case="first-letter">File</namespace>
>>>     <namespace key="7" case="first-letter">File talk</namespace>
>>>     <namespace key="8" case="first-letter">MediaWiki</namespace>
>>>     <namespace key="9" case="first-letter">MediaWiki talk</namespace>
>>>     <namespace key="10" case="first-letter">Template</namespace>
>>>     <namespace key="11" case="first-letter">Template talk</namespace>
>>>     <namespace key="12" case="first-letter">Help</namespace>
>>>     <namespace key="13" case="first-letter">Help talk</namespace>
>>>     <namespace key="14" case="first-letter">Category</namespace>
>>>     <namespace key="15" case="first-letter">Category talk</namespace>
>>>     <namespace key="100" case="first-letter">Portal</namespace>
>>>     <namespace key="101" case="first-letter">Portal talk</namespace>
>>>     <namespace key="108" case="first-letter">Book</namespace>
>>>     <namespace key="109" case="first-letter">Book talk</namespace>
>>>   </namespaces>
>>> </siteinfo>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Erol Akarsu
>>>
>>
>> _______________________________________________
>> BaseX-Talk mailing list
>> BaseX-Talk@mailman.uni-konstanz.de
>> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>>
>
>
> _______________________________________________
> BaseX-Talk mailing list
> BaseX-Talk@mailman.uni-konstanz.de
> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>
>