slow search performance across multiple databases
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My
xml files are distributed over 256 databases (00-ff)
with roughly 250MB each. All share a structure similar to [3]. I executed
OPTIMIZE ALL for all databases, see [4]. [5] shows the
info index output. The linux server which is running the basex instance
shows great amounts of io. I replicated the data to my local desktop
and see the same amount of high io usage across all database files. For
example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the
following [2], but a query across all dbs just stalls and does not
return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
[1]
Example query:
```
declare namespace mets = "http://www.loc.gov/METS/";
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
count(
for $db in db:list()
for $doc in db:open($db)
where
$doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi
le:format = 'urn:123'
return $doc)
```
[2]
```
Compiling:
- inline $db_0
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("fe") -> (db:open-pre("fe", 0), ...)
- rewrite cached filter to iter path: ((db:open-pre("fe", 0),
...))[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:t
... -> (db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
t...
- rewrite where clause(s)
- simplify FLWOR expression: (db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
t...
Optimized Query:
count((db:open-pre("fe", 0),
...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:
techMD)/element(mets:mdWrap)/element(mets:xmlData)/element(lmerFile:format)
= "urn:123")])
Query:
declare namespace mets = "http://www.loc.gov/METS/"; declare namespace
lmerFile = "http://www.ddb.de/LMERfile"; count( let $db := "fe" for $doc in
db:open($db) where
$doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi
le:format = 'urn:123' return $doc)
Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.41 ms
- Compiling: 161.91 ms
- Evaluating: 2155.26 ms
- Printing: 0.86 ms
- Total Time: 2318.44 ms
Query Plan:
<QueryPlan compiled="true" updating="false">
<FnCount name="count" type="xs:integer" size="1">
<IterPath type="document-node()*" database="fe">
<DBNodeSeq type="document-node()+" size="29196" database="fe">
<DBNode pre="0" type="document-node()" size="1" database="fe"/>
<DBNode pre="245" type="document-node()" size="1" database="fe"/>
<DBNode pre="417" type="document-node()" size="1" database="fe"/>
<DBNode pre="679" type="document-node()" size="1" database="fe"/>
<DBNode pre="945" type="document-node()" size="1" database="fe"/>
</DBNodeSeq>
<IterStep axis="self" test="node()" type="node()?">
<CmpG op="=" type="xs:boolean" size="1">
<IterPath type="element()*">
<IterStep axis="child" test="element(koala)" type="element()*"/>
<IterStep axis="child" test="element(mets:mets)"
type="element()*"/>
<IterStep axis="child" test="element(mets:amdSec)"
type="element()*"/>
<IterStep axis="child" test="element(mets:techMD)"
type="element()*"/>
<IterStep axis="child" test="element(mets:mdWrap)"
type="element()*"/>
<IterStep axis="child" test="element(mets:xmlData)"
type="element()*"/>
<IterStep axis="child" test="element(lmerFile:format)"
type="element()*"/>
</IterPath>
<Str type="xs:string" size="1">urn:123</Str>
</CmpG>
</IterStep>
</IterPath>
</FnCount>
</QueryPlan>
```
[3]
<?xml version="1.0" encoding="UTF-8"?>
<koala>
<aip>
<parent_asset_id />
<asset_ingest_timestamp>2016-05-04
13-57-17-943000</asset_ingest_timestamp>
<external_asset_id>urn:nbn:de:xyz</external_asset_id>
<internal_asset_id>4e1f-82-45c5-37-1678e81f</internal_asset_id>
<aip_is_deleted>false</aip_is_deleted>
<aip_deletion_timestamp>2016-05-04
13-57-17-943000</aip_deletion_timestamp>
<storage_backend>hdd</storage_backend>
<aip_size>179810</aip_size>
<asset_version>1</asset_version>
</aip>
<mets OBJID="" TYPE="ko"
xsi:schemaLocation="http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsd http://www.ddb.de/LMERfile
http://files.dnb.de/standards/lmer/lmer-file.xsd
http://www.ddb.de/LMERobject
http://files.dnb.de/standards/lmer/lmer-object.xsd
http://www.ddb.de/LMERprocess
http://files.dnb.de/standards/lmer/lmer-process.xsd
http://purl.org/dc/elements/1.1/
http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd"
xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:lmerFile="http://www.ddb.de/LMERfile"
xmlns:lmerObject="http://www.ddb.de/LMERobject"
xmlns:lmerProcess="http://www.ddb.de/LMERprocess"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<metsHdr CREATEDATE="2015-03-23T11:03:19.079+01:00">
<agent ROLE="ARCHIVIST" TYPE="ORGANIZATION">
<name>test</name>
<note>test</note>
</agent>
</metsHdr>
<amdSec ID="AmdSec-0001">
<techMD ID="TechMD-LMER-Object">
<mdWrap ID="TechMD-LMER-Object-MdWrap" MIMETYPE="text/xml"
MDTYPE="OTHER" OTHERMDTYPE="lmerObject" LABEL="LMERobject">
<xmlData>
<lmerObject:persistentIdentifier>urn:nbn:de:test</lmerObject:persistentIdent
ifier>
<lmerObject:transferChecksum CHECKSUMTYPE="xor of
sha1 file
checksums">ac38416b34a7b1b55875798819220c092a77f3b7</lmerObject:transferChec
ksum>
<lmerObject:masterCreationDate>2015-03-23T11:03:19.080+01:00</lmerObject:mas
terCreationDate>
<lmerObject:metadataCreationDate>2015-03-23T11:03:19.080+01:00</lmerObject:m
etadataCreationDate>
<lmerObject:metadataRecordCreator>test</lmerObject:metadataRecordCreator>
<lmerObject:numberOfFiles>1</lmerObject:numberOfFiles>
</xmlData>
</mdWrap>
</techMD>
<techMD ID="TechMD-File--0">
<mdWrap ID="TechMD-File--0-MDWRAP" MIMETYPE="text/xml"
MDTYPE="OTHER" OTHERMDTYPE="lmerFile">
<xmlData>
<lmerFile:format
REGISTRYNAME="DIAS">urn:123</lmerFile:format>
<lmerFile:xmlData MDTYPE="FITS">
<fits
xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output
http://files.dnb.de/standards/fits/fits_output_dnb.xsd" version="0.6.1"
timestamp="23.03.15 09:24"
xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output">
<identification status="SINGLE_RESULT">
<identity format="Portable Document
Format" mimetype="application/pdf" toolname="FITS" toolversion="0.6.1">
<tool toolname="Droid"
toolversion="3.0"/>
<tool toolname="Jhove"
toolversion="1.11"/>
<tool toolname="file utility"
toolversion="5.17"/>
<tool toolname="Exiftool"
toolversion="7.74"/>
<tool toolname="NLNZ Metadata
Extractor" toolversion="3.4GA"/>
<tool toolname="ffident"
toolversion="0.2"/>
<version toolname="Droid"
toolversion="3.0">1.6</version>
<externalIdentifier
toolname="Droid" toolversion="3.0" type="puid">fmt/20</externalIdentifier>
</identity>
</identification>
<fileinfo>
<size toolname="Jhove"
toolversion="1.11">77973</size>
<creatingApplicationName
toolname="Jhove" toolversion="1.11">Acrobat 9.1.1 /InDesign: pictwpstops
filter 1.0</creatingApplicationName>
<lastmodified toolname="Exiftool"
toolversion="7.74" status="SINGLE_RESULT">2015:03:23
09:24:57+01:00</lastmodified>
<created toolname="Exiftool"
toolversion="7.74" status="SINGLE_RESULT">2009:06:12
11:53:28+02:00</created>
<filepath toolname="OIS File
Information" toolversion="0.1"
status="SINGLE_RESULT">/data/var/tmp/tomcatP-6/./impservice/GU_42_2009-S23-2
5.pdf</filepath>
<filename toolname="OIS File
Information" toolversion="0.1"
status="SINGLE_RESULT">./impservice/GU_42_2009-S23-25.pdf</filename>
<md5checksum toolname="OIS File
Information" toolversion="0.1"
status="SINGLE_RESULT">a2a1ff970c0f82dfce7e208b35eb5783</md5checksum>
<fslastmodified toolname="OIS File
Information" toolversion="0.1"
status="SINGLE_RESULT">1427099097000</fslastmodified>
</fileinfo>
<filestatus>
<well-formed toolname="Jhove"
toolversion="1.11" status="SINGLE_RESULT">true</well-formed>
<valid toolname="Jhove"
toolversion="1.11" status="SINGLE_RESULT">true</valid>
<message toolname="Jhove"
toolversion="1.11" status="SINGLE_RESULT">Page information is not displayed;
to display remove param value of p from the config file</message>
<ingestLevel toolname="DidigO"
toolversion="0.9" status="SINGLE_RESULT">4</ingestLevel>
</filestatus>
<metadata>
<document>
<pageCount toolname="Exiftool"
toolversion="7.74" status="SINGLE_RESULT">3</pageCount>
<isTagged toolname="Jhove"
toolversion="1.11">no</isTagged>
<hasOutline toolname="Jhove"
toolversion="1.11">no</hasOutline>
<hasAnnotations toolname="Jhove"
toolversion="1.11" status="SINGLE_RESULT">no</hasAnnotations>
<isRightsManaged
toolname="Exiftool" toolversion="7.74"
status="SINGLE_RESULT">no</isRightsManaged>
<isProtected toolname="Exiftool"
toolversion="7.74">no</isProtected>
<isPdfA toolname="Exiftool"
toolversion="7.74" status="SINGLE_RESULT">no</isPdfA>
<hasForms toolname="NLNZ Metadata
Extractor" toolversion="3.4GA" status="SINGLE_RESULT">no</hasForms>
</document>
</metadata>
</fits>
</lmerFile:xmlData>
</xmlData>
</mdWrap>
</techMD>
</amdSec>
<fileSec>
<fileGrp ID="ASSET" ADMID="TechMD-LMER-Object">
<file ID="FILE-0" ADMID="TechMD-File--0"
CREATED="2015-08-13T11:03:19.006+01:00" SIZE="77973"
CHECKSUM="ac38416b34a7bc092a77f3b7" CHECKSUMTYPE="SHA-1"
MIMETYPE="application/pdf">
<FLocat LOCTYPE="URL" xlink:type="simple"
xlink:href="file://./content/big.pdf"/>
</file>
</fileGrp>
</fileSec>
<structMap TYPE="ASSET">
<div ORDER="1" LABEL="File list" TYPE="ASSET">
<fptr FILEID="FILE-0"/>
</div>
</structMap>
</mets>
</koala>
[4]
Example for db 00:
```
BaseX 9.0.2 [Standalone]
Try 'help' to get more information.
> open 00
Database '00' was opened in 2040.58 ms.
> info database
Database Properties
NAME: 00
SIZE: 269 MB
NODES: 8782619
DOCUMENTS: 29471
BINARIES: 0
TIMESTAMP: 2020-01-09T13:31:43.000Z
UPTODATE: true
Resource Properties
INPUTPATH:
INPUTSIZE: 0 b
INPUTDATE: 2020-01-09T06:37:10.793Z
Indexes
TEXTINDEX: true
ATTRINDEX: true
TOKENINDEX: false
FTINDEX: true
TEXTINCLUDE:
ATTRINCLUDE:
TOKENINCLUDE:
FTINCLUDE:
LANGUAGE: English
STEMMING: false
CASESENS: false
DIACRITICS: false
STOPWORDS:
UPDINDEX: true
AUTOOPTIMIZE: false
MAXCATS: 100
MAXLEN: 96
SPLITSIZE: 0
>
```
[5] Info Index Output:
> info index
Elements
- Structure: Hash
- Entries: 147
tool 185692x, leaf
message 162967x, strings, leaf
mdWrap 68088x
xmlData 68088x
techMD 68088x
version 52735x, 22 distinct strings, leaf
externalIdentifier 52319x, 28 distinct strings, leaf
lmerFile:format 38617x, 15 distinct strings, leaf
lmerFile:xmlData 38617x
fits 38617x
identification 38617x
identity 38617x
fileinfo 38617x
filename 38617x, strings, leaf
filepath 38617x, strings, leaf
size 38617x, integers [948, 344172844], leaf
md5checksum 38617x, strings, leaf
fslastmodified 38617x, integers [1.354674301E12,
1.578101482E12], leaf
filestatus 38617x
ingestLevel 38617x, 4 distinct integers [1, 4], leaf
metadata 38617x
file 38617x
FLocat 38617x, leaf
fptr 38617x, leaf
document 31004x
well-formed 30515x, 2 distinct strings, leaf
valid 30515x, 2 distinct strings, leaf
lmerObject:metadataCreationDate 29471x, strings, leaf
lmerObject:metadataRecordCreator 29471x, string, leaf
lmerObject:numberOfFiles 29471x, 61 distinct integers [1, 235],
leaf
parent_asset_id 29471x, string, leaf
fileSec 29471x
fileGrp 29471x
structMap 29471x
div 29471x
aip 29471x
koala 29471x
aip_is_deleted 29471x, leaf
aip_deletion_timestamp 29471x, leaf
storage_backend 29471x, string, leaf
aip_size 29471x, integers [863, 1683060591], leaf
asset_version 29471x, integer [1, 1], leaf
mets 29471x
metsHdr 29471x
agent 29471x
name 29471x, string, leaf
note 29471x, string, leaf
amdSec 29471x
internal_asset_id 29471x, strings, leaf
external_asset_id 29471x, strings, leaf
asset_ingest_timestamp 29471x, strings, leaf
lmerObject:persistentIdentifier 29471x, strings, leaf
lmerObject:transferChecksum 29471x, strings, leaf
lmerObject:masterCreationDate 29471x, strings, leaf
isTagged 29315x, 2 distinct strings, leaf
hasOutline 29315x, 2 distinct strings, leaf
lastmodified 28499x, strings, leaf
creatingApplicationName 28145x, strings, leaf
isProtected 27796x, 2 distinct strings, leaf
hasAnnotations 26362x, 2 distinct strings, leaf
isRightsManaged 25577x, 4 distinct strings, leaf
created 23912x, strings, leaf
isPdfA 23902x, 2 distinct strings, leaf
pageCount 23673x, integers [1, 3128], leaf
hasForms 22050x, 2 distinct strings, leaf
title 19639x, strings, leaf
graphicsCount 19028x, integers [1, 20435], leaf
author 12483x, strings, leaf
pdfAVersion 11292x, 5 distinct strings, leaf
language 8013x, 51 distinct strings, leaf
lmerFile:category 7366x, string, leaf
image 2118x
imageWidth 2115x, integers [200, 7462], leaf
bitsPerSample 2115x, 3 distinct strings, leaf
imageHeight 2115x, integers [304, 10713], leaf
YCbCrSubSampling 2084x, 4 distinct strings, leaf
xSamplingFrequency 2083x, 55 distinct doubles [1, 1200],
leaf
ySamplingFrequency 2083x, 55 distinct doubles [1, 1200],
leaf
samplingFrequencyUnit 2082x, 3 distinct strings, leaf
lightSource 1739x, 3 distinct strings, leaf
channelMode 1649x, 3 distinct strings, leaf
channels 1649x, 2 distinct integers [1, 2], leaf
bitRate 1649x, 3 distinct integers [80000,
320000], leaf
sampleRate 1649x, 2 distinct integers [44100,
48000], leaf
audio 1649x
text 1231x
charset 1002x, 6 distinct strings, leaf
markupBasis 940x, 3 distinct strings, leaf
markupBasisVersion 939x, double [1, 1], leaf
markupLanguage 735x, 2 distinct strings, leaf
iccProfileName 708x, 15 distinct strings, leaf
iccProfileVersion 708x, 6 distinct strings, leaf
compressionScheme 578x, 3 distinct strings, leaf
orientation 561x, 2 distinct strings, leaf
scanningSoftwareName 533x, 66 distinct strings, leaf
copyrightNote 481x, strings, leaf
colorSpace 328x, 3 distinct strings, leaf
samplesPerPixel 323x, 3 distinct integers [3, 5], leaf
exifVersion 155x, 4 distinct integers [210, 230],
leaf
byteOrder 71x, string, leaf
linebreak 58x, string, leaf
YCbCrPositioning 21x, 2 distinct integers [1, 2], leaf
apertureValue 16x, 11 distinct doubles [2.2, 13.4],
leaf
exposureBiasValue 16x, 2 distinct strings, leaf
shutterSpeedValue 15x, 12 distinct strings, leaf
flash 15x, 6 distinct strings, leaf
digitalCameraModelName 15x, 13 distinct strings, leaf
digitalCameraManufacturer 15x, 8 distinct strings, leaf
isoSpeedRating 15x, 7 distinct integers [32, 1600],
leaf
focalLength 14x, 12 distinct doubles [4.2, 137],
leaf
exposureTime 14x, 11 distinct strings, leaf
fNumber 14x, 9 distinct doubles [2.2, 6.3], leaf
meteringMode 12x, 4 distinct strings, leaf
maxApertureValue 10x, 5 distinct doubles [2.7, 4.9], leaf
captureDevice 9x, string, leaf
exposureProgram 8x, 4 distinct strings, leaf
sensingMethod 8x, string, leaf
inhibitorType 7x, 3 distinct strings, leaf
scannerManufacturer 5x, 4 distinct strings, leaf
scannerModelName 5x, 4 distinct strings, leaf
creatingos 3x, string, leaf
duration 2x, 2 distinct strings, leaf
milliseconds 2x, 2 distinct integers [2209067,
6200496], leaf
brightnessvalue 2x, 2 distinct doubles [3737910,
3737956.3692762186], leaf
subjectDistance 2x, 2 distinct strings, leaf
brightnessValue 2x, 2 distinct doubles [6.369276219,
9.768303187], leaf
cfaPattern 2x, 2 distinct strings, leaf
referenceBlackWhite 1x, string, leaf
exposureIndex 1x, integer [100, 100], leaf
usesStyleSheets 1x, string, leaf
characterCount 1x, integer [451, 451], leaf
wordCount 1x, integer [32, 32], leaf
paragraphCount 1x, integer [1, 1], leaf
gpsDateStamp 1x, string, leaf
gpsLatitudeRef 1x, string, leaf
gpsLatitude 1x, string, leaf
gpsLongitudeRef 1x, string, leaf
gpsLongitude 1x, string, leaf
gpsAltitudeRef 1x, string, leaf
gpsAltitude 1x, string, leaf
gpsTimeStamp 1x, string, leaf
gpsSpeedRef 1x, string, leaf
gpsSpeed 1x, integer [0, 4.9E-324], leaf
gpsImgDirectionRef 1x, string, leaf
gpsImgDirection 1x, double [352.1332378, 352.1332378],
leaf
gpsDestBearingRef 1x, string, leaf
gpsDestBearing 1x, double [352.1332378, 352.1332378],
leaf
Attributes
- Structure: Hash
- Entries: 31
toolversion 1175464x, 20 distinct strings, leaf
toolname 1175464x, 11 distinct strings, leaf
status 773429x, 2 distinct strings, leaf
ID 233735x, strings, leaf
TYPE 117884x, 3 distinct strings, leaf
MIMETYPE 106705x, 11 distinct strings, leaf
MDTYPE 106705x, 2 distinct strings, leaf
xsi:schemaLocation 68088x, strings, leaf
CHECKSUMTYPE 68088x, 2 distinct strings, leaf
OTHERMDTYPE 68088x, 2 distinct strings, leaf
ADMID 68088x, strings, leaf
LABEL 58942x, 2 distinct strings, leaf
type 52319x, string, leaf
CREATED 38617x, strings, leaf
SIZE 38617x, integers [948, 344172844], leaf
CHECKSUM 38617x, strings, leaf
LOCTYPE 38617x, string, leaf
xlink:type 38617x, string, leaf
xlink:href 38617x, strings, leaf
FILEID 38617x, strings, leaf
REGISTRYNAME 38617x, string, leaf
version 38617x, string, leaf
timestamp 38617x, strings, leaf
format 38617x, 14 distinct strings, leaf
mimetype 38617x, 11 distinct strings, leaf
ROLE 29471x, string, leaf
CREATEDATE 29471x, strings, leaf
PROFILE 29471x, string, leaf
OBJID 29471x, strings, leaf
RECORDSTATUS 29471x, string, leaf
ORDER 29471x, integer [1, 1], leaf
Text Index
- Structure: Sorted List
- Names:
- Size: 5866 kB
- Entries: 446427
no
157290x
1
69949x
true
52426x
4
31437x
....
<removed for privacy reasons>
....
Attribute Index
- Structure: Sorted List
- Names:
- Size: 8288 kB
- Entries: 242140
SINGLE_RESULT 738299x
Jhove 442253x
1.11 435034x
Exiftool 215455x
7.74 215455x
OIS File Information 156783x
0.1 156783x
Droid 129111x
3.0 129111x
ASSET 88413x
FITS 77234x
0.6.1 77234x
text/xml 70424x
NLNZ Metadata Extractor 68965x
3.4GA 68965x
OTHER 68088x
application/pdf 58972x
TechMD-LMER-Object 58942x
TechMD-File--0 58942x
FILE-0 58942x
puid 52319x
simple 38617x
lmerFile 38617x
URL 38617x
SHA-1 38617x
0.9 38617x
0.2 35663x
CONFLICT 35130x
...
00472a0cefc564ddeeaf01ef4eedefa648ae10a7 1x
00445341-1a94-45d1-96e0-bfbd285d0d9b 1x
0042e488-c19a-4824-9619-d396230236ae 1x
0041e4bb7a4ab62e00a9292e5f99254c260fab9d 1x
0040b05044d6242c7c746b82ff20ba793ae95e83 1x
0040782b-c857-495f-a7b0-83aa0e359452 1x
004004d7946daa934ac3313bbe719a19204ac982 1x
003df86b-035b-44fa-9c3e-a0dfdcf385b7 1x
003db1fa9bf4eb1cc721e2944d3fc4128e18691b 1x
003d1d59ef25783c8b0b611c485ab94101419dae 1x
003a6afc-3d12-4e0a-afd6-a0fa27babc75 1x
003a273f76a52e69167bd7663ca5ba2333a51e8e 1x
0038393fd7440e962ab44397700290f67708322b 1x
00381c23-1181-4cdb-a389-280cee83fa8c 1x
003564e47d4b24eb4e36a29d951b0ba77ed36a41 1x
003493e5e743a8cb209170cd3545ab47281bf1fc 1x
0032449e-1641-4d7b-8eb6-c0b6ce4bd73e 1x
00306c1e-67f1-4653-b1af-6c9cc8f466e3 1x
002ef043-303b-4372-8747-d424ad5d06ab 1x
002e19fe325f1a79f9c07929ed9649e5a88598e2 1x
002d9889ac5e27206fbe44bf204ddf77ec6d825a 1x
0028ee044ab6547ff0d2d0f5cfdefee2202653d0 1x
0028a8563f280fc2f6d2234c8bd5a374348bea4b 1x
002834455c3cc319a94b5801770f4f4a7bdb6d5f 1x
0024e7afb42e3466305c8688df8b26e9722596fe 1x
002275c7fb5dbb62291ea1db105f6ddf354cd67a 1x
002122dacfe6e2c268421f9f5fcdc708fa64ba28 1x
00200878d71f7184518266e6962d868c6c341523 1x
001fa27f1c3c2aec81f2257ac975e0d24a825333 1x
001f194b917c9312038b8f9cba4a46b5c007d1aa 1x
Token Index
- Not available
Full-Text Index
- Names:
- Size: 39 MB
- Entries: 386830
pdf 183610x
1 181572x
<removed for privacy reasons>
Path Index
doc(): 29471x
koala: 29471x
aip: 29471x
parent_asset_id: 29471x, leaf
text(): 25457x, 2 distinct strings, leaf
asset_ingest_timestamp: 29471x, leaf
text(): 29471x, strings, leaf
external_asset_id: 29471x, leaf
text(): 29471x, strings, leaf
internal_asset_id: 29471x, leaf
text(): 29471x, strings, leaf
aip_is_deleted: 29471x, leaf
aip_deletion_timestamp: 29471x, leaf
storage_backend: 29471x, leaf
text(): 29471x, string, leaf
aip_size: 29471x, leaf
text(): 29471x, integers [863, 1683060591], leaf
asset_version: 29471x, leaf
text(): 29471x, integer [1, 1], leaf
mets: 29471x
@OBJID: 29471x, strings, leaf
@TYPE: 29471x, string, leaf
@PROFILE: 29471x, string, leaf
@xsi:schemaLocation: 29471x, strings, leaf
metsHdr: 29471x
@CREATEDATE: 29471x, strings, leaf
@RECORDSTATUS: 29471x, string, leaf
agent: 29471x
@ROLE: 29471x, string, leaf
@TYPE: 29471x, string, leaf
name: 29471x, leaf
text(): 29471x, string, leaf
note: 29471x, leaf
text(): 29471x, string, leaf
amdSec: 29471x
@ID: 29471x, string, leaf
techMD: 68088x
@ID: 68088x, strings, leaf
mdWrap: 68088x
@ID: 68088x, strings, leaf
@MIMETYPE: 68088x, string, leaf
@MDTYPE: 68088x, string, leaf
@OTHERMDTYPE: 68088x, 2 distinct strings, leaf
@LABEL: 29471x, string, leaf
xmlData: 68088x
lmerObject:persistentIdentifier: 29471x, leaf
text(): 29471x, strings, leaf
lmerObject:transferChecksum: 29471x, leaf
@CHECKSUMTYPE: 29471x, string, leaf
text(): 29471x, strings, leaf
lmerObject:masterCreationDate: 29471x, leaf
text(): 29471x, strings, leaf
lmerObject:metadataCreationDate: 29471x, leaf
text(): 29471x, strings, leaf
lmerObject:metadataRecordCreator: 29471x, leaf
text(): 29471x, string, leaf
lmerObject:numberOfFiles: 29471x, leaf
text(): 29471x, 61 distinct integers [1, 235], leaf
lmerFile:format: 38617x, leaf
@REGISTRYNAME: 38617x, string, leaf
text(): 38617x, 15 distinct strings, leaf
lmerFile:category: 7366x, leaf
text(): 7366x, string, leaf
lmerFile:xmlData: 38617x
@MDTYPE: 38617x, string, leaf
fits: 38617x
@xsi:schemaLocation: 38617x, strings, leaf
@version: 38617x, string, leaf
@timestamp: 38617x, strings, leaf
identification: 38617x
@status: 38617x, string, leaf
identity: 38617x
@format: 38617x, 14 distinct strings, leaf
@mimetype: 38617x, 11 distinct strings, leaf
@toolname: 38617x, string, leaf
@toolversion: 38617x, string, leaf
tool: 185692x, leaf
@toolname: 185692x, 8 distinct strings, leaf
@toolversion: 185692x, 17 distinct strings, leaf
version: 52735x, leaf
@toolname: 52735x, 5 distinct strings, leaf
@toolversion: 52735x, 10 distinct strings, leaf
@status: 35130x, string, leaf
text(): 52735x, 22 distinct strings, leaf
externalIdentifier: 52319x, leaf
@toolname: 52319x, string, leaf
@toolversion: 52319x, double [3, 3], leaf
@type: 52319x, string, leaf
text(): 52319x, 28 distinct strings, leaf
fileinfo: 38617x
filename: 38617x, leaf
@toolname: 38617x, 2 distinct strings, leaf
@toolversion: 38617x, 4 distinct strings, leaf
text(): 38617x, strings, leaf
@status: 35296x, string, leaf
filepath: 38617x, leaf
@toolname: 38617x, 2 distinct strings, leaf
@toolversion: 38617x, 4 distinct strings, leaf
text(): 38617x, strings, leaf
@status: 35296x, string, leaf
size: 38617x, leaf
@toolname: 38617x, 2 distinct strings, leaf
@toolversion: 38617x, 3 distinct doubles [0.1, 1.5],
leaf
@status: 8957x, string, leaf
text(): 38617x, integers [948, 344172844], leaf
md5checksum: 38617x, leaf
@toolname: 38617x, string, leaf
@toolversion: 38617x, double [0.1, 0.1], leaf
@status: 38617x, string, leaf
text(): 38617x, strings, leaf
fslastmodified: 38617x, leaf
@toolname: 38617x, string, leaf
@toolversion: 38617x, double [0.1, 0.1], leaf
@status: 38617x, string, leaf
text(): 38617x, integers [1.354674301E12,
1.578101482E12], leaf
creatingApplicationName: 28145x, leaf
@toolname: 28145x, 3 distinct strings, leaf
@toolversion: 28145x, 4 distinct strings, leaf
text(): 28145x, strings, leaf
@status: 8983x, string, leaf
lastmodified: 28499x, leaf
@toolname: 28499x, string, leaf
@toolversion: 28499x, double [7.74, 7.74], leaf
@status: 28499x, string, leaf
text(): 28499x, strings, leaf
created: 23912x, leaf
@toolname: 23912x, 2 distinct strings, leaf
@toolversion: 23912x, 2 distinct strings, leaf
@status: 23463x, string, leaf
text(): 23912x, strings, leaf
inhibitorType: 7x, leaf
@toolname: 7x, string, leaf
@toolversion: 7x, double [7.74, 7.74], leaf
@status: 7x, string, leaf
text(): 7x, 3 distinct strings, leaf
copyrightNote: 481x, leaf
@toolname: 481x, string, leaf
@toolversion: 481x, double [7.74, 7.74], leaf
@status: 481x, string, leaf
text(): 481x, strings, leaf
creatingos: 3x, leaf
@toolname: 3x, string, leaf
@toolversion: 3x, 3 distinct doubles [5.09, 5.32],
leaf
@status: 3x, string, leaf
text(): 3x, string, leaf
filestatus: 38617x
well-formed: 30515x, leaf
@toolname: 30515x, 2 distinct strings, leaf
@toolversion: 30515x, 5 distinct strings, leaf
@status: 30515x, string, leaf
text(): 30515x, 2 distinct strings, leaf
valid: 30515x, leaf
@toolname: 30515x, 2 distinct strings, leaf
@toolversion: 30515x, 5 distinct strings, leaf
@status: 30515x, string, leaf
text(): 30515x, 2 distinct strings, leaf
ingestLevel: 38617x, leaf
@toolname: 38617x, string, leaf
@toolversion: 38617x, double [0.9, 0.9], leaf
@status: 38617x, string, leaf
text(): 38617x, 4 distinct integers [1, 4], leaf
message: 162967x, leaf
@toolname: 162967x, 2 distinct strings, leaf
@toolversion: 162967x, 5 distinct strings, leaf
@status: 162967x, string, leaf
text(): 162967x, strings, leaf
metadata: 38617x
document: 31004x
author: 12483x, leaf
@toolname: 12483x, 3 distinct strings, leaf
@toolversion: 12483x, 6 distinct strings, leaf
@status: 5555x, string, leaf
text(): 12467x, strings, leaf
language: 8013x, leaf
@toolname: 8013x, 3 distinct strings, leaf
@toolversion: 8013x, 6 distinct strings, leaf
@status: 3013x, string, leaf
text(): 8013x, 51 distinct strings, leaf
title: 19639x, leaf
@toolname: 19639x, 3 distinct strings, leaf
@toolversion: 19639x, 6 distinct strings, leaf
@status: 10692x, string, leaf
text(): 19402x, strings, leaf
isRightsManaged: 25577x, leaf
@toolname: 25577x, 2 distinct strings, leaf
@toolversion: 25577x, 4 distinct strings, leaf
@status: 25577x, string, leaf
text(): 25577x, 4 distinct strings, leaf
pageCount: 23673x, leaf
@toolname: 23673x, 2 distinct strings, leaf
@toolversion: 23673x, 2 distinct doubles [1.5,
7.74], leaf
@status: 23534x, string, leaf
text(): 23673x, integers [1, 3128], leaf
graphicsCount: 19028x, leaf
@toolname: 19028x, string, leaf
@toolversion: 19028x, 2 distinct doubles [1.11,
1.5], leaf
@status: 19028x, string, leaf
text(): 19028x, integers [1, 20435], leaf
isTagged: 29315x, leaf
@toolname: 29315x, 2 distinct strings, leaf
@toolversion: 29315x, 3 distinct strings, leaf
text(): 29315x, 2 distinct strings, leaf
@status: 10218x, string, leaf
hasOutline: 29315x, leaf
@toolname: 29315x, 2 distinct strings, leaf
@toolversion: 29315x, 3 distinct strings, leaf
text(): 29315x, 2 distinct strings, leaf
@status: 10218x, string, leaf
hasAnnotations: 26362x, leaf
@toolname: 26362x, string, leaf
@toolversion: 26362x, 2 distinct doubles [1.11,
1.5], leaf
@status: 26362x, string, leaf
text(): 26362x, 2 distinct strings, leaf
isProtected: 27796x, leaf
@toolname: 27796x, 2 distinct strings, leaf
@toolversion: 27796x, 2 distinct strings, leaf
text(): 27796x, 2 distinct strings, leaf
@status: 9640x, string, leaf
isPdfA: 23902x, leaf
@toolname: 23902x, string, leaf
@toolversion: 23902x, double [7.74, 7.74], leaf
@status: 23902x, string, leaf
text(): 23902x, 2 distinct strings, leaf
hasForms: 22050x, leaf
@toolname: 22050x, string, leaf
@toolversion: 22050x, string, leaf
@status: 22050x, string, leaf
text(): 22050x, 2 distinct strings, leaf
pdfAVersion: 11292x, leaf
@toolname: 11292x, string, leaf
@toolversion: 11292x, double [7.74, 7.74], leaf
@status: 11292x, string, leaf
text(): 11292x, 5 distinct strings, leaf
text: 1231x
charset: 1002x, leaf
@toolname: 1002x, 3 distinct strings, leaf
@toolversion: 1002x, 4 distinct strings, leaf
text(): 1002x, 6 distinct strings, leaf
@status: 204x, string, leaf
markupBasis: 940x, leaf
@toolname: 940x, 3 distinct strings, leaf
@toolversion: 940x, 4 distinct strings, leaf
text(): 940x, 3 distinct strings, leaf
@status: 191x, string, leaf
markupBasisVersion: 939x, leaf
@toolname: 939x, 2 distinct strings, leaf
@toolversion: 939x, 3 distinct strings, leaf
text(): 939x, double [1, 1], leaf
@status: 200x, string, leaf
markupLanguage: 735x, leaf
@toolname: 735x, string, leaf
@toolversion: 735x, double [0.2, 0.2], leaf
@status: 735x, string, leaf
text(): 735x, 2 distinct strings, leaf
linebreak: 58x, leaf
@toolname: 58x, string, leaf
@toolversion: 58x, double [1.11, 1.11], leaf
@status: 58x, string, leaf
text(): 58x, string, leaf
usesStyleSheets: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, string, leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
characterCount: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, string, leaf
@status: 1x, string, leaf
text(): 1x, integer [451, 451], leaf
wordCount: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, string, leaf
@status: 1x, string, leaf
text(): 1x, integer [32, 32], leaf
paragraphCount: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, string, leaf
@status: 1x, string, leaf
text(): 1x, integer [1, 1], leaf
image: 2118x
imageWidth: 2115x, leaf
@toolname: 2115x, 3 distinct strings, leaf
@toolversion: 2115x, 3 distinct strings, leaf
text(): 2115x, integers [200, 7462], leaf
@status: 441x, string, leaf
imageHeight: 2115x, leaf
@toolname: 2115x, 3 distinct strings, leaf
@toolversion: 2115x, 3 distinct strings, leaf
text(): 2115x, integers [304, 10713], leaf
@status: 441x, string, leaf
iccProfileName: 708x, leaf
@toolname: 708x, string, leaf
@toolversion: 708x, double [7.74, 7.74], leaf
@status: 708x, string, leaf
text(): 708x, 15 distinct strings, leaf
YCbCrSubSampling: 2084x, leaf
@toolname: 2084x, string, leaf
@toolversion: 2084x, double [7.74, 7.74], leaf
@status: 2084x, string, leaf
text(): 2084x, 4 distinct strings, leaf
samplingFrequencyUnit: 2082x, leaf
@toolname: 2082x, 3 distinct strings, leaf
@toolversion: 2082x, 3 distinct strings, leaf
text(): 2082x, 3 distinct strings, leaf
@status: 394x, string, leaf
xSamplingFrequency: 2083x, leaf
@toolname: 2083x, 3 distinct strings, leaf
@toolversion: 2083x, 3 distinct strings, leaf
text(): 2083x, 55 distinct doubles [1, 1200], leaf
@status: 394x, string, leaf
ySamplingFrequency: 2083x, leaf
@toolname: 2083x, 3 distinct strings, leaf
@toolversion: 2083x, 3 distinct strings, leaf
text(): 2083x, 55 distinct doubles [1, 1200], leaf
@status: 394x, string, leaf
iccProfileVersion: 708x, leaf
@toolname: 708x, string, leaf
@toolversion: 708x, double [7.74, 7.74], leaf
@status: 708x, string, leaf
text(): 708x, 6 distinct strings, leaf
bitsPerSample: 2115x, leaf
@toolname: 2115x, 3 distinct strings, leaf
@toolversion: 2115x, 3 distinct strings, leaf
text(): 2115x, 3 distinct strings, leaf
@status: 441x, string, leaf
lightSource: 1739x, leaf
@toolname: 1739x, 2 distinct strings, leaf
@toolversion: 1739x, 2 distinct strings, leaf
@status: 1734x, string, leaf
text(): 1739x, 3 distinct strings, leaf
compressionScheme: 578x, leaf
@toolname: 578x, 2 distinct strings, leaf
@toolversion: 578x, 2 distinct doubles [1.5, 7.74],
leaf
@status: 577x, string, leaf
text(): 578x, 3 distinct strings, leaf
orientation: 561x, leaf
@toolname: 561x, 2 distinct strings, leaf
@toolversion: 561x, 2 distinct strings, leaf
text(): 561x, 2 distinct strings, leaf
@status: 76x, string, leaf
scanningSoftwareName: 533x, leaf
@toolname: 533x, string, leaf
@toolversion: 533x, double [7.74, 7.74], leaf
@status: 533x, string, leaf
text(): 533x, 66 distinct strings, leaf
colorSpace: 328x, leaf
@toolname: 328x, 2 distinct strings, leaf
@toolversion: 328x, 2 distinct doubles [1.5, 7.74],
leaf
@status: 327x, string, leaf
text(): 328x, 3 distinct strings, leaf
samplesPerPixel: 323x, leaf
@toolname: 323x, 2 distinct strings, leaf
@toolversion: 323x, 2 distinct doubles [1.5, 7.74],
leaf
@status: 322x, string, leaf
text(): 323x, 3 distinct integers [3, 5], leaf
YCbCrPositioning: 21x, leaf
@toolname: 21x, string, leaf
@toolversion: 21x, double [7.74, 7.74], leaf
text(): 21x, 2 distinct integers [1, 2], leaf
captureDevice: 9x, leaf
@toolname: 9x, string, leaf
@toolversion: 9x, double [7.74, 7.74], leaf
@status: 9x, string, leaf
text(): 9x, string, leaf
digitalCameraManufacturer: 15x, leaf
@toolname: 15x, string, leaf
@toolversion: 15x, double [7.74, 7.74], leaf
@status: 15x, string, leaf
text(): 15x, 8 distinct strings, leaf
digitalCameraModelName: 15x, leaf
@toolname: 15x, string, leaf
@toolversion: 15x, double [7.74, 7.74], leaf
@status: 15x, string, leaf
text(): 15x, 13 distinct strings, leaf
fNumber: 14x, leaf
@toolname: 14x, string, leaf
@toolversion: 14x, double [7.74, 7.74], leaf
text(): 14x, 9 distinct doubles [2.2, 6.3], leaf
exposureTime: 14x, leaf
@toolname: 14x, string, leaf
@toolversion: 14x, double [7.74, 7.74], leaf
text(): 14x, 11 distinct strings, leaf
isoSpeedRating: 15x, leaf
@toolname: 15x, string, leaf
@toolversion: 15x, double [7.74, 7.74], leaf
@status: 15x, string, leaf
text(): 15x, 7 distinct integers [32, 1600], leaf
exifVersion: 155x, leaf
@toolname: 155x, string, leaf
@toolversion: 155x, double [7.74, 7.74], leaf
text(): 155x, 4 distinct integers [210, 230], leaf
@status: 24x, string, leaf
shutterSpeedValue: 15x, leaf
@toolname: 15x, string, leaf
@toolversion: 15x, double [7.74, 7.74], leaf
@status: 15x, string, leaf
text(): 15x, 12 distinct strings, leaf
apertureValue: 16x, leaf
@toolname: 16x, string, leaf
@toolversion: 16x, double [7.74, 7.74], leaf
text(): 16x, 11 distinct doubles [2.2, 13.4], leaf
exposureBiasValue: 16x, leaf
@toolname: 16x, string, leaf
@toolversion: 16x, double [7.74, 7.74], leaf
@status: 16x, string, leaf
text(): 16x, 2 distinct strings, leaf
maxApertureValue: 10x, leaf
@toolname: 10x, string, leaf
@toolversion: 10x, double [7.74, 7.74], leaf
text(): 10x, 5 distinct doubles [2.7, 4.9], leaf
meteringMode: 12x, leaf
@toolname: 12x, string, leaf
@toolversion: 12x, double [7.74, 7.74], leaf
text(): 12x, 4 distinct strings, leaf
flash: 15x, leaf
@toolname: 15x, string, leaf
@toolversion: 15x, double [7.74, 7.74], leaf
text(): 15x, 6 distinct strings, leaf
focalLength: 14x, leaf
@toolname: 14x, string, leaf
@toolversion: 14x, double [7.74, 7.74], leaf
text(): 14x, 12 distinct doubles [4.2, 137], leaf
sensingMethod: 8x, leaf
@toolname: 8x, string, leaf
@toolversion: 8x, double [7.74, 7.74], leaf
text(): 8x, string, leaf
referenceBlackWhite: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
text(): 1x, string, leaf
scannerManufacturer: 5x, leaf
@toolname: 5x, string, leaf
@toolversion: 5x, double [7.74, 7.74], leaf
@status: 5x, string, leaf
text(): 5x, 4 distinct strings, leaf
scannerModelName: 5x, leaf
@toolname: 5x, string, leaf
@toolversion: 5x, double [7.74, 7.74], leaf
@status: 5x, string, leaf
text(): 5x, 4 distinct strings, leaf
exposureProgram: 8x, leaf
@toolname: 8x, string, leaf
@toolversion: 8x, double [7.74, 7.74], leaf
text(): 8x, 4 distinct strings, leaf
subjectDistance: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, double [7.74, 7.74], leaf
text(): 2x, 2 distinct strings, leaf
cfaPattern: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, double [7.74, 7.74], leaf
text(): 2x, 2 distinct strings, leaf
byteOrder: 71x, leaf
@toolname: 71x, string, leaf
@toolversion: 71x, double [1.5, 1.5], leaf
@status: 71x, string, leaf
text(): 71x, string, leaf
exposureIndex: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
text(): 1x, integer [100, 100], leaf
brightnessValue: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, double [7.74, 7.74], leaf
@status: 2x, string, leaf
text(): 2x, 2 distinct doubles [6.369276219,
9.768303187], leaf
brightnessvalue: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, string, leaf
@status: 2x, string, leaf
text(): 2x, 2 distinct doubles [3737910,
3737956.3692762186], leaf
gpsLatitudeRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsLatitude: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsLongitudeRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsLongitude: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsAltitudeRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsAltitude: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsTimeStamp: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsSpeedRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsSpeed: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, integer [0, 4.9E-324], leaf
gpsImgDirectionRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsImgDirection: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, double [352.1332378, 352.1332378], leaf
gpsDestBearingRef: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
gpsDestBearing: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, double [352.1332378, 352.1332378], leaf
gpsDateStamp: 1x, leaf
@toolname: 1x, string, leaf
@toolversion: 1x, double [7.74, 7.74], leaf
@status: 1x, string, leaf
text(): 1x, string, leaf
audio: 1649x
bitRate: 1649x, leaf
@toolname: 1649x, string, leaf
@toolversion: 1649x, double [7.74, 7.74], leaf
@status: 1647x, string, leaf
text(): 1649x, 3 distinct integers [80000, 320000],
leaf
sampleRate: 1649x, leaf
@toolname: 1649x, string, leaf
@toolversion: 1649x, double [7.74, 7.74], leaf
@status: 1647x, string, leaf
text(): 1649x, 2 distinct integers [44100, 48000],
leaf
channels: 1649x, leaf
@toolname: 1649x, string, leaf
@toolversion: 1649x, double [7.74, 7.74], leaf
@status: 1647x, string, leaf
text(): 1649x, 2 distinct integers [1, 2], leaf
channelMode: 1649x, leaf
@toolname: 1649x, string, leaf
@toolversion: 1649x, double [7.74, 7.74], leaf
@status: 1649x, string, leaf
text(): 1649x, 3 distinct strings, leaf
duration: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, string, leaf
@status: 2x, string, leaf
text(): 2x, 2 distinct strings, leaf
milliseconds: 2x, leaf
@toolname: 2x, string, leaf
@toolversion: 2x, string, leaf
@status: 2x, string, leaf
text(): 2x, 2 distinct integers [2209067, 6200496],
leaf
fileSec: 29471x
fileGrp: 29471x
@ID: 29471x, string, leaf
@ADMID: 29471x, string, leaf
file: 38617x
@ID: 38617x, strings, leaf
@ADMID: 38617x, strings, leaf
@CREATED: 38617x, strings, leaf
@SIZE: 38617x, integers [948, 344172844], leaf
@CHECKSUM: 38617x, strings, leaf
@CHECKSUMTYPE: 38617x, string, leaf
@MIMETYPE: 38617x, 11 distinct strings, leaf
FLocat: 38617x, leaf
@LOCTYPE: 38617x, string, leaf
@xlink:type: 38617x, string, leaf
@xlink:href: 38617x, strings, leaf
structMap: 29471x
@TYPE: 29471x, string, leaf
div: 29471x
@ORDER: 29471x, integer [1, 1], leaf
@LABEL: 29471x, string, leaf
@TYPE: 29471x, string, leaf
fptr: 38617x, leaf
@FILEID: 38617x, strings, leaf
Dear Björn, Our Article on Indexes may give you some hints [1]: • Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order: db:text($db, 'urn:123')/parent::lmerFile:format/... Hope this helps Christian [1] http://docs.basex.org/wiki/Indexes On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Dear Christian, Thank you for your quick response. Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not? Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say. Regards, Björn [1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc }) Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan> [2] Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan> -----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Dear Björn, Our Article on Indexes may give you some hints [1]: • Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order: db:text($db, 'urn:123')/parent::lmerFile:format/... Hope this helps Christian [1] http://docs.basex.org/wiki/Indexes On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi, I have trouble seeing the original second query. But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter. Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database. Best regards Omar Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Dear Omar, Sorry I forgot to include the second query and just copied the info output. Here are the two queries again: [1] Uses index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { for $doc in db:open("00") where $doc//lmerFile:format/text() = 'urn:123' return $doc }) [2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) I have either a limited amount of databases I would like to access or all databases. Using anything but db:open with a string literal results in no indices are being used. Is there a solution to circumvent that problem? Regards, Björn -----Ursprüngliche Nachricht----- Von: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Hi, I have trouble seeing the original second query. But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter. Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database. Best regards Omar Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background: * The where clause will be attached to the for expression as predicate * As thefor expression is a simple function call, the result will be a filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123'] * If the for expression is a path expression, the where clause will be attached to the last predicate of this path. The result will be a path again. * Only path expressions are rewritten to index access, no filters. But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours. Best, Christian
Von: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Björn, hi Omar, All filter expressions that yield document nodes in distinct document order will now be rewritten to path expressions. Feel free to check out the latest stable snapshot [1]. Christian [1] http://files.basex.org/releases/latest/ On Wed, Jan 15, 2020 at 9:22 AM Christian Grün <christian.gruen@gmail.com> wrote:
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
* The where clause will be attached to the for expression as predicate * As thefor expression is a simple function call, the result will be a filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123'] * If the for expression is a path expression, the where clause will be attached to the last predicate of this path. The result will be a path again. * Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Christian, Awesome, thank you very much. I followed your suggestion and it works well. Björn -----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Mittwoch, 15. Januar 2020 11:24 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: Omar Siam <Omar.Siam@oeaw.ac.at>; basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases Hi Björn, hi Omar, All filter expressions that yield document nodes in distinct document order will now be rewritten to path expressions. Feel free to check out the latest stable snapshot [1]. Christian [1] http://files.basex.org/releases/latest/ On Wed, Jan 15, 2020 at 9:22 AM Christian Grün <christian.gruen@gmail.com> wrote:
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
* The where clause will be attached to the for expression as predicate * As thefor expression is a simple function call, the result will be a filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123'] * If the for expression is a path expression, the where clause will be attached to the last predicate of this path. The result will be a path again. * Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self: :doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db}enforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün <christian.gruen@gmail.com> Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn <bjoern.braunschweig@gwdg.de> Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn <bjoern.braunschweig@gwdg.de> wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
participants (3)
-
Braunschweig, Björn -
Christian Grün -
Omar Siam