Hi Erol,


running updates on large documents like yours is problematic
in general -  so I can't be sure what the problem is. Inserting
a single element should not be a problem - but currently we
have no test results that confirm this for very large documents.


At the moment I can only try to eliminate bottlenecks on the
query side.

First you could try to rewrite your query. You could, 
again, use wildcards instead of an explicit namespace 
declaration to access the title element:

...//*:page[*:title ...]

Second, is the full text index active?


If nothing of this helps it is also convenient to carry out updates
on a second instance of your database and switch between
them to keep the application up and running ...

Please, let us know how it goes. Kind regards,
Lukas




On Tue, Apr 5, 2011 at 4:20 PM, Erol Akarsu <eakarsu@gmail.com> wrote:
I did a testing related to xquery update

It took 170 sec to insert a node into wikipedia xml database. Is there a faster way of doing it?

insert node <d/> after (fn:doc ("enwiki-latest-pages-articles")//*:page[w:title contains text "AccessibleComputing"] ) [1]


Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
Result: insert node element { "d" } { () } into (document-node { "enwiki-latest-pages-articles.xml" }/descendant::*:page[w:title contains text "AccessibleComputing"])[position() = 1]
Timing:
 - Parsing:  0.27 ms
 - Compiling:  167.38 ms
 - Evaluating:  170264.31 ms
 - Printing:  45.12 ms
 - Total Time:  170477.1 ms
Query plan:
<Insert>
  <IterPosFilter>
    <IterPath>
      <DBNode name="enwiki-latest-pages-articles"/>
      <IterStep axis="descendant" test="*:page">
        <FTContains>
          <AxisPath>
            <IterStep axis="child" test="w:title"/>
          </AxisPath>
          <FTWords>
            <Item value="AccessibleComputing" type="xs:string"/>
          </FTWords>
        </FTContains>
      </IterStep>
    </IterPath>
    <Pos min="1" max="1"/>
  </IterPosFilter>
  <CElem>
    <Item value="d" type="xs:QName"/>
  </CElem>
</Insert>



On Mon, Apr 4, 2011 at 7:31 AM, Erol Akarsu <eakarsu@gmail.com> wrote:
I imported  wikipedia xml into basex and tried to search it.

But searching it takes longer.

I tried to search one element that is first child of whole document and it took 52 sec.
I know the XML file is very big 31GB. How can I optimize the search?

declare namespace w="http://www.mediawiki.org/xml/export-0.5/";

let $d := fn:doc ("enwiki-latest-pages-articles")//w:siteinfo
return $d

Database info:

> open enwiki-latest-pages-articles
Database 'enwiki-latest-pages-articles' opened in 778.49 ms.
> info database
Database Properties
 Name: enwiki-latest-pages-articles
 Size: 23356 MB
 Nodes: 228090153
 Height: 6

Database Creation
 Path: /mnt/hgfs/C/tmp/enwiki-latest-pages-articles.xml
 Time Stamp: 03.04.2011 12:29:15
 Input Size: 30025 MB
 Encoding: UTF-8
 Documents: 1
 Whitespace Chopping: ON
 Entity Parsing: OFF

Indexes
 Up-to-date: true
 Path Summary: ON
 Text Index: ON
 Attribute Index: ON
 Full-Text Index: OFF
>


Timing info:

Query: declare namespace w="http://www.mediawiki.org/xml/export-0.5/";
Compiling:
- pre-evaluating fn:doc("enwiki-latest-pages-articles")
- optimizing descendant-or-self step(s)
- binding static variable $d
- removing variable $d
- simplifying flwor expression
Result: element siteinfo { ... }
Timing:
 - Parsing:  1.4 ms
 - Compiling:  52599.0 ms
 - Evaluating:  0.28 ms
 - Printing:  0.62 ms
 - Total Time:  52601.32 ms
Query plan:
<DBNode name="enwiki-latest-pages-articles" pre="5"/>



Result of query:

<siteinfo xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <sitename>Wikipedia</sitename>
  <base>http://en.wikipedia.org/wiki/Main_Page</base>
  <generator>MediaWiki 1.17wmf1</generator>
  <case>first-letter</case>
  <namespaces>
    <namespace key="-2" case="first-letter">Media</namespace>
    <namespace key="-1" case="first-letter">Special</namespace>
    <namespace key="0" case="first-letter"/>
    <namespace key="1" case="first-letter">Talk</namespace>
    <namespace key="2" case="first-letter">User</namespace>
    <namespace key="3" case="first-letter">User talk</namespace>
    <namespace key="4" case="first-letter">Wikipedia</namespace>
    <namespace key="5" case="first-letter">Wikipedia talk</namespace>
    <namespace key="6" case="first-letter">File</namespace>
    <namespace key="7" case="first-letter">File talk</namespace>
    <namespace key="8" case="first-letter">MediaWiki</namespace>
    <namespace key="9" case="first-letter">MediaWiki talk</namespace>
    <namespace key="10" case="first-letter">Template</namespace>
    <namespace key="11" case="first-letter">Template talk</namespace>
    <namespace key="12" case="first-letter">Help</namespace>
    <namespace key="13" case="first-letter">Help talk</namespace>
    <namespace key="14" case="first-letter">Category</namespace>
    <namespace key="15" case="first-letter">Category talk</namespace>
    <namespace key="100" case="first-letter">Portal</namespace>
    <namespace key="101" case="first-letter">Portal talk</namespace>
    <namespace key="108" case="first-letter">Book</namespace>
    <namespace key="109" case="first-letter">Book talk</namespace>
  </namespaces>
</siteinfo>



Thanks

Erol Akarsu



_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk