Sorry for not using "Reply All" earlier.
Setting FTINDEXSPLITSIZE to 20000000 enabled the process to get a little further, if the meaning of each dot is the same. FTINDEXSPLITSIZE at default:
..............................|..................................................................|..........................................................................|...............................................................................|..
FTINDEXSPLITSIZE = 20000000
.......|.......|........|.......|......|........|.............|.............|.............|.............|.............|.............|.............|.............|..............|.............|.............|.............|.............|.............|.............|............
If it's a matter of making the indexing process take longer, that's not a problem.
Thanks, Chuck
On Tue, Oct 20, 2015 at 1:27 PM, Chuck Bearden cfbearden@gmail.com wrote:
Thanks Christian, I'll try the FTINDEXSPLITSIZE option.
I'm also open to modifying the XML files it that would help. Because of limitations of the service from which we harvest them RESTfully, I have only 20 actual content elements in each file. If you think it would make a difference, I could consolidate them to have, say, 200 or 500 of the actual content elements per file, but I have no idea if that would change how the indexing falls out.
The files also have structures where some properties of each record are each represented by a URL, and ID value, and a string. I could XSLT the files to remove all but the string (human readable is better for our purposes) to make them less verbose.
BaseX is really super for doing data quality assessments of the XML, and if we could get full-text indexing working, it would speed things up considerably. Thanks to you & your team for all the work you've put in to the application!
Alles Gute Chuck Bearden
On Tue, Oct 20, 2015 at 12:55 PM, Christian Grün christian.gruen@gmail.com wrote:
I see; it seems that the index creation is failing at the very final step, in which partial index structures, which are temporarily written to disk, are merged.
You could either to increase Xmx even more (to 6 or 7G?). If this doesn't work, you could try assign different values to the FTINDEXSPLITSIZE option [1] (start e.g. with 20000000).
Sorry for the trouble. Feel free to keep me updated, maybe we find a way to fix this, Christian
[1] http://docs.basex.org/wiki/Options#FTINDEXSPLITSIZE
On Tue, Oct 20, 2015 at 7:48 PM, Chuck Bearden cfbearden@gmail.com
wrote:
Here's the stack trace:
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
create db pure_20151019 pure_20151019
Creating Database...
..;..;..;..;..;..;.;..;..;..;..;..;.;..;.;.;.;.....;.....;.....;......;.....;.....;.......;.;.;;.;.;;.;.;;.;.;;.;.;;.;.;;.;.................................................;..........................................................;..........................................................;..........................................................;..........................................................;..........................................................;...................................................
677584.62 ms (1435 MB) Indexing Text...
...........................................................................................................................................................................................................................................................
98215794 operations, 178526.99 ms (1611 MB) Indexing Attribute Values...
...........................................................................................................................................................................................................................................................
178304119 operations, 135613.26 ms (2005 MB) Indexing Full-Text...
..............................|..................................................................|..........................................................................|...............................................................................|..java.lang.OutOfMemoryError:
Java heap space at org.basex.index.ft.FTList.next(FTList.java:93) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:236) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:140) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:85) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:23) at org.basex.data.DiskData.createIndex(DiskData.java:187) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:103) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:91) at org.basex.core.cmd.CreateDB.run(CreateDB.java:104) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:103) at org.basex.core.CLI.execute(CLI.java:87) at org.basex.BaseX.console(BaseX.java:191) at org.basex.BaseX.<init>(BaseX.java:166) at org.basex.BaseX.main(BaseX.java:42) org.basex.core.BaseXException: Out of Main Memory. You can try to:
- increase Java's heap size with the flag -Xmx<size>
- deactivate the text and attribute indexes. at org.basex.core.Command.execute(Command.java:101) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:103) at org.basex.core.CLI.execute(CLI.java:87) at org.basex.BaseX.console(BaseX.java:191) at org.basex.BaseX.<init>(BaseX.java:166) at org.basex.BaseX.main(BaseX.java:42)
Out of Main Memory. You can try to:
- increase Java's heap size with the flag -Xmx<size>
- deactivate the text and attribute indexes.
d
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
Here's how the process looked in the output of 'ps -ef', in case that's relevant:
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
cfbeard+ 88769 88757 46 12:15 pts/7 00:00:24 java -cp
/home/cfbearden/opt/basex-8.3.0/BaseX.jar:/home/cfbearden/opt/basex-8.3.0/lib/basex-api-8.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/basex-xqj-1.5.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-codec-1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-fileupload-1.3.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-io-1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/igo-0.4.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/jansi-1.11.jar:/home/cfbearden/opt/basex-8.3.0/lib/javax.servlet-3.0.0.v201112011016.jar:/home/cfbearden/opt/basex-8.3.0/lib/jdom-1.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-continuation-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-http-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-io-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-security-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-server-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-servlet-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-util-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-webapp-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-xml-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jing-20091111.jar:/home/cfbearden/opt/basex-8.3.0/lib/jline-2.13.jar:/home/cfbearden/opt/basex-8.3.0/lib/jts-1.13.jar:/home/cfbearden/opt/basex-8.3.0/lib/lucene-stemmers-3.4.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/milton-api-1.8.1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/mime-util-2.1.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/slf4j-api-1.7.12.jar:/home/cfbearden/opt/basex-8.3.0/lib/slf4j-simple-1.7.12.jar:/home/cfbearden/opt/basex-8.3.0/lib/tagsoup-1.2.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/xmldb-api-1.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/xml-resolver-1.2.jar:/home/cfbearden/opt/basex-8.3.0/lib/xqj2-0.2.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/xqj-api-1.0.jar:
-Xmx4g org.basex.BaseX -d
On Tue, Oct 20, 2015 at 12:38 PM, Chuck Bearden cfbearden@gmail.com
wrote:
It hasn't failed yet; I've gotten the progress indicators, along with the phases that have been completed:
Creating Database... Indexing Text... Indexing Attribute Values...
It's still working on "Indexing Full-Text...". I'll post whatever I get when it fails. Maybe it won't this time :)
Chuck
On Tue, Oct 20, 2015 at 12:33 PM, Christian Grün christian.gruen@gmail.com wrote:
Creating Database... ..;..;..;..;..;..;.;..;..
Do you get any output after this line (I would expected to see a stack trace, or at least an error message…)?
Where 'pure_20151019' is both the name of the database and the subdirectory where all my XML files are.
It could well be that I'm missing a crucial option; I'm still relatively new to BaseX. It's great stuff, though.
Because of my employer's IT environment, I have to run my Linux workstation in a VMWare VM, though I doubt that that makes a difference.
Thanks, Chuck
On Tue, Oct 20, 2015 at 11:15 AM, Christian Grün christian.gruen@gmail.com wrote: > Hi Chuck, > > Usually, 4G is more than enough to create a full-text index for 16G
of
> XML. Obviously, however, that's not the case for your input data.
You
> could try to distribute your documents in multiple database. As as > alternative, we could have a look at your data and try to find out > what's going wrong. You can also use the -d flag and send us the
stack
> trace. > > Best, > Christian > > > On Tue, Oct 20, 2015 at 4:19 PM, Chuck Bearden cfbearden@gmail.com
wrote:
>> Hi all, >> >> I have about 16G of XML data in about 52000 files, and I was
hoping to
>> build a full-text index over it. I've tried two approaches: enable >> full-text indexing as I create the database and then loading the
data,
>> and creating the full-text index after loading the data. If I
enable
>> ADDCACHE and modify the basex shell script to use 4g of RAM
instead of
>> 512M, I have no problem loading the data. If I try to load with >> FTINDEX or create the index afterward, the process runs out of
memory.
>> >> I could believe that I'm overlooking some option that would make
this
>> possible, but I suspect I just have too much data. I welcome your >> thoughts & suggestions. >> >> All the best, >> Chuck Bearden
By giving 6G of RAM to the JVM I succeeded in building the full-text index, but it doesn't seem to be making any difference in query time.
I have a slightly older copy of the data that is probably a hundred or so records smaller than the one that is indexed for full text, and my query takes about 40s on each one, so the FTINDEX seems to make no difference. I'm not an old XQuery hand, so it's altogether possible that my queries are not optimal. I'll append my query below.
Using the GUI, I can see that the value of FTINDEX for this database is is true, though when I open the database with the 'basex' command and use INFO, it shows the value 'false'.
Query:
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.= xquery version "3.0";
declare namespace publication-template = " http://atira.dk/schemas/pure4/wsdl/template/abstractpublication/current"; declare namespace core="http://atira.dk/schemas/pure4/model/core/current" ; declare namespace xsi="http://www.w3.org/2001/XMLSchema-instance" ; declare namespace cur=" http://atira.dk/schemas/pure4/model/template/abstractpublication/current" ; declare namespace extensions-core=" http://atira.dk/schemas/pure4/model/core/extensions/current" ; declare namespace person-template=" http://atira.dk/schemas/pure4/model/template/abstractperson/current" ; declare namespace externalperson-template=" http://atira.dk/schemas/pure4/model/template/abstractexternalperson/current" ; declare namespace externalorganisation-template=" http://atira.dk/schemas/pure4/model/template/externalorganisation/current" ; declare namespace organisation-template=" http://atira.dk/schemas/pure4/model/template/abstractorganisation/current" ; declare namespace journal-template=" http://atira.dk/schemas/pure4/model/template/abstractjournal/current"; declare namespace cur1 = " http://atira.dk/schemas/pure4/model/template/abstractpublication/current";
for $pa in /publication-template:*/core:result/core:content/cur1:persons/person-template:personAssociation[person-template:externalperson] where $pa/person-template:externalperson/externalperson-template:name/core:lastName contains text {'Meric'} let $lname := $pa/person-template:name/core:lastName/text() let $fname := $pa/person-template:name/core:firstName/text() let $uuid := $pa/ancestor::core:content/@uuid/data() return ($lname, $fname, $uuid) =.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
All suggestions welcome, and thanks to Christian & John Mitchell for helping me so far.
Chuck
On Tue, Oct 20, 2015 at 2:16 PM, Chuck Bearden cfbearden@gmail.com wrote:
Sorry for not using "Reply All" earlier.
Setting FTINDEXSPLITSIZE to 20000000 enabled the process to get a little further, if the meaning of each dot is the same. FTINDEXSPLITSIZE at default:
..............................|..................................................................|..........................................................................|...............................................................................|..
FTINDEXSPLITSIZE = 20000000
.......|.......|........|.......|......|........|.............|.............|.............|.............|.............|.............|.............|.............|..............|.............|.............|.............|.............|.............|.............|............
If it's a matter of making the indexing process take longer, that's not a problem.
Thanks, Chuck
On Tue, Oct 20, 2015 at 1:27 PM, Chuck Bearden cfbearden@gmail.com wrote:
Thanks Christian, I'll try the FTINDEXSPLITSIZE option.
I'm also open to modifying the XML files it that would help. Because of limitations of the service from which we harvest them RESTfully, I have only 20 actual content elements in each file. If you think it would make a difference, I could consolidate them to have, say, 200 or 500 of the actual content elements per file, but I have no idea if that would change how the indexing falls out.
The files also have structures where some properties of each record are each represented by a URL, and ID value, and a string. I could XSLT the files to remove all but the string (human readable is better for our purposes) to make them less verbose.
BaseX is really super for doing data quality assessments of the XML, and if we could get full-text indexing working, it would speed things up considerably. Thanks to you & your team for all the work you've put in to the application!
Alles Gute Chuck Bearden
On Tue, Oct 20, 2015 at 12:55 PM, Christian Grün christian.gruen@gmail.com wrote:
I see; it seems that the index creation is failing at the very final step, in which partial index structures, which are temporarily written to disk, are merged.
You could either to increase Xmx even more (to 6 or 7G?). If this doesn't work, you could try assign different values to the FTINDEXSPLITSIZE option [1] (start e.g. with 20000000).
Sorry for the trouble. Feel free to keep me updated, maybe we find a way to fix this, Christian
[1] http://docs.basex.org/wiki/Options#FTINDEXSPLITSIZE
On Tue, Oct 20, 2015 at 7:48 PM, Chuck Bearden cfbearden@gmail.com
wrote:
Here's the stack trace:
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
create db pure_20151019 pure_20151019
Creating Database...
..;..;..;..;..;..;.;..;..;..;..;..;.;..;.;.;.;.....;.....;.....;......;.....;.....;.......;.;.;;.;.;;.;.;;.;.;;.;.;;.;.;;.;.................................................;..........................................................;..........................................................;..........................................................;..........................................................;..........................................................;...................................................
677584.62 ms (1435 MB) Indexing Text...
...........................................................................................................................................................................................................................................................
98215794 operations, 178526.99 ms (1611 MB) Indexing Attribute Values...
...........................................................................................................................................................................................................................................................
178304119 operations, 135613.26 ms (2005 MB) Indexing Full-Text...
..............................|..................................................................|..........................................................................|...............................................................................|..java.lang.OutOfMemoryError:
Java heap space at org.basex.index.ft.FTList.next(FTList.java:93) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:236) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:140) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:85) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:23) at org.basex.data.DiskData.createIndex(DiskData.java:187) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:103) at org.basex.core.cmd.CreateIndex.create(CreateIndex.java:91) at org.basex.core.cmd.CreateDB.run(CreateDB.java:104) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:103) at org.basex.core.CLI.execute(CLI.java:87) at org.basex.BaseX.console(BaseX.java:191) at org.basex.BaseX.<init>(BaseX.java:166) at org.basex.BaseX.main(BaseX.java:42) org.basex.core.BaseXException: Out of Main Memory. You can try to:
- increase Java's heap size with the flag -Xmx<size>
- deactivate the text and attribute indexes. at org.basex.core.Command.execute(Command.java:101) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:103) at org.basex.core.CLI.execute(CLI.java:87) at org.basex.BaseX.console(BaseX.java:191) at org.basex.BaseX.<init>(BaseX.java:166) at org.basex.BaseX.main(BaseX.java:42)
Out of Main Memory. You can try to:
- increase Java's heap size with the flag -Xmx<size>
- deactivate the text and attribute indexes.
d
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
Here's how the process looked in the output of 'ps -ef', in case that's relevant:
=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=
cfbeard+ 88769 88757 46 12:15 pts/7 00:00:24 java -cp
/home/cfbearden/opt/basex-8.3.0/BaseX.jar:/home/cfbearden/opt/basex-8.3.0/lib/basex-api-8.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/basex-xqj-1.5.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-codec-1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-fileupload-1.3.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/commons-io-1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/igo-0.4.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/jansi-1.11.jar:/home/cfbearden/opt/basex-8.3.0/lib/javax.servlet-3.0.0.v201112011016.jar:/home/cfbearden/opt/basex-8.3.0/lib/jdom-1.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-continuation-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-http-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-io-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-security-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-server-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-servlet-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-util-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-webapp-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jetty-xml-8.1.17.v20150415.jar:/home/cfbearden/opt/basex-8.3.0/lib/jing-20091111.jar:/home/cfbearden/opt/basex-8.3.0/lib/jline-2.13.jar:/home/cfbearden/opt/basex-8.3.0/lib/jts-1.13.jar:/home/cfbearden/opt/basex-8.3.0/lib/lucene-stemmers-3.4.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/milton-api-1.8.1.4.jar:/home/cfbearden/opt/basex-8.3.0/lib/mime-util-2.1.3.jar:/home/cfbearden/opt/basex-8.3.0/lib/slf4j-api-1.7.12.jar:/home/cfbearden/opt/basex-8.3.0/lib/slf4j-simple-1.7.12.jar:/home/cfbearden/opt/basex-8.3.0/lib/tagsoup-1.2.1.jar:/home/cfbearden/opt/basex-8.3.0/lib/xmldb-api-1.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/xml-resolver-1.2.jar:/home/cfbearden/opt/basex-8.3.0/lib/xqj2-0.2.0.jar:/home/cfbearden/opt/basex-8.3.0/lib/xqj-api-1.0.jar:
-Xmx4g org.basex.BaseX -d
On Tue, Oct 20, 2015 at 12:38 PM, Chuck Bearden cfbearden@gmail.com
wrote:
It hasn't failed yet; I've gotten the progress indicators, along with the phases that have been completed:
Creating Database... Indexing Text... Indexing Attribute Values...
It's still working on "Indexing Full-Text...". I'll post whatever I get when it fails. Maybe it won't this time :)
Chuck
On Tue, Oct 20, 2015 at 12:33 PM, Christian Grün christian.gruen@gmail.com wrote:
> Creating Database... > ..;..;..;..;..;..;.;..;..
Do you get any output after this line (I would expected to see a
stack
trace, or at least an error message…)?
> Where 'pure_20151019' is both the name of the database and the > subdirectory where all my XML files are. > > It could well be that I'm missing a crucial option; I'm still > relatively new to BaseX. It's great stuff, though. > > Because of my employer's IT environment, I have to run my Linux > workstation in a VMWare VM, though I doubt that that makes a > difference. > > Thanks, > Chuck > > On Tue, Oct 20, 2015 at 11:15 AM, Christian Grün > christian.gruen@gmail.com wrote: >> Hi Chuck, >> >> Usually, 4G is more than enough to create a full-text index for
16G of
>> XML. Obviously, however, that's not the case for your input data.
You
>> could try to distribute your documents in multiple database. As as >> alternative, we could have a look at your data and try to find out >> what's going wrong. You can also use the -d flag and send us the
stack
>> trace. >> >> Best, >> Christian >> >> >> On Tue, Oct 20, 2015 at 4:19 PM, Chuck Bearden <
cfbearden@gmail.com> wrote:
>>> Hi all, >>> >>> I have about 16G of XML data in about 52000 files, and I was
hoping to
>>> build a full-text index over it. I've tried two approaches: enable >>> full-text indexing as I create the database and then loading the
data,
>>> and creating the full-text index after loading the data. If I
enable
>>> ADDCACHE and modify the basex shell script to use 4g of RAM
instead of
>>> 512M, I have no problem loading the data. If I try to load with >>> FTINDEX or create the index afterward, the process runs out of
memory.
>>> >>> I could believe that I'm overlooking some option that would make
this
>>> possible, but I suspect I just have too much data. I welcome your >>> thoughts & suggestions. >>> >>> All the best, >>> Chuck Bearden
By giving 6G of RAM to the JVM I succeeded in building the full-text index,
Good news!
but it doesn't seem to be making any difference in query time.
Did you check the "query info" (either in the corresponding panel in the GUI, or by using -V)? If it doesn't show the info "applying full-text index", the index isn't utilized [1].
Christian
On Wed, Oct 21, 2015 at 3:13 AM, Christian Grün christian.gruen@gmail.com wrote:
By giving 6G of RAM to the JVM I succeeded in building the full-text index,
Good news!
but it doesn't seem to be making any difference in query time.
Did you check the "query info" (either in the corresponding panel in the GUI, or by using -V)? If it doesn't show the info "applying full-text index", the index isn't utilized [1].
Thanks for the suggestion. The full-text index is apparently not being used. I checked the query info using both the -V flag and in the GUI. Yet the GUI shows the full-text entry stats under Database -> Properties -> Full-Text (tab).
Is there anything else I should try by way of debugging?
Thanks, Chuck
Christian
Thanks for the suggestion. The full-text index is apparently not being used.
It is sometimes not obvious for the query optimizer how to rewrite a query to take full advantage of an index. You could try to start with a simple version of your query, see if the optimizer is used, and enhance it step by step..
1. //*:lastName[text() contains text 'Meric']
2. declare namespace core="http://atira.dk/schemas/pure4/model/core/current"; //core:lastName[text() contains text 'Meric']
3. ...
Does the results returned by core:lastName contain other descendant elements [1]? Christian
Okay, I figured it out. Evidently, I should have tested the evaluated value of 'text()' or '.' with the 'contains text' expression, not just the element itself. That's why both the simplified queries below use the full-text index. When I modify the WHERE clause of my query to read
where $pa/person-template:externalperson/externalperson-template:name/core:lastName/text() contains text {'Meric'}
it works lickety-split.
Thank you Christian for your help.
Chuck
On Wed, Oct 21, 2015 at 9:05 AM, Christian Grün christian.gruen@gmail.com wrote:
Thanks for the suggestion. The full-text index is apparently not being used.
It is sometimes not obvious for the query optimizer how to rewrite a query to take full advantage of an index. You could try to start with a simple version of your query, see if the optimizer is used, and enhance it step by step..
//*:lastName[text() contains text 'Meric']
declare namespace core="http://atira.dk/schemas/pure4/model/core/current"; //core:lastName[text() contains text 'Meric']
The full-text index is applied in both of the above queries. Oh wait, I get it.
...
Does the results returned by core:lastName contain other descendant elements [1]?
core:lastName & core:firstName are just PCDATA.
Christian
basex-talk@mailman.uni-konstanz.de