Hi Christian,
thanks for the help!
I have a working version of this now that performs well. Initially I didn't want to reconstruct parts of the message, because we have a couple of different versions of these containers and usually there are multiple namespaces and prefixes involved that should be preserved. But it turns out this was easier than I thought.
Thanks & greetings from Salzburg, Tom
________________________________ Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 20. Jänner 2020 19:06 An: Tom Rauchenwald (UNIFITS) tom.rauchenwald@unifits.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] Help with a Query/Performance
I missed to do the obvious next step. The following query is evaluated in a few milliseconds:
declare variable $OFFSET1 := 3; declare variable $OFFSET2 := 2;
let $container := db:open('tr-test')/Container let $message := $container/*:MessageA[$OFFSET1] let $detail := $message/MessageADetail[$OFFSET2] return element { name($container) } { $container/*[contains(name(), 'MetaData')], element { name($message) } { $message/MessageAMetaData, element { name($detail) } { $detail/* } } }
On Mon, Jan 20, 2020 at 6:54 PM Christian Grün christian.gruen@gmail.com wrote:
Dear Tom,
If you have large elements, it will usually be faster to create new elements. Here’s one way to do it:
let $offset1 := 3 let $offset2 := 2 let $container := db:open('tr-test')/Container return element Container { (: add meta data elements :) $container/*[starts-with(name(), 'ContainerMetaData')], (: alternative: add everything except Message elements $container/(* except (MessageA, MessageB, MessageC)), :) $container/MessageA[$offset1] update { delete node MessageADetail[position() != $offset2] } }
There are probably ways to get this even faster; I may have a look at this tomorrow.
All the best from Konstanz, Christian
On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS) tom.rauchenwald@unifits.com wrote:
Hi list,
I'm struggling with a query.
We have XML documents with a structure similar to this:
<Container> <ContainerMetaData1>FOO</ContainerMetaData1> <ContainerMetaData2>FOO</ContainerMetaData2> <MessageA> <MessageAMetaData> <MessageMetaData1>FOO</MessageMetaData1> <MessageMetaData2>FOO</MessageMetaData2> </MessageAMetaData> <MessageADetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageADetail> <MessageADetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageADetail> </MessageA> <MessageB> <MessageBMetaData> <MessageMetaData1>FOO</MessageMetaData1> <MessageMetaData2>FOO</MessageMetaData2> </MessageBMetaData> <MessageBDetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageBDetail> </MessageB> <MessageC> <MessageCMetaData> <MessageMetaData1>FOO</MessageMetaData1> <MessageMetaData2>FOO</MessageMetaData2> </MessageCMetaData> <MessageCDetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageCDetail> </MessageC> </Container>
Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping). I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages). I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail). The use case is to show the detail in context in a UI.
The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA):
let $fh := (copy $x := /*:Container modify ( delete node $x/*:MessageA[position() != 3] , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] , delete node $x/*:MessageB , delete node $x/*:MessageC ) return $x) return $fh
This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message). I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct.
Any pointers would be very much appreciated.
Here's a function to generate sufficiently large test data:
declare function local:sample($numberOfMessages, $numberOfDetails) {
<Container> <ContainerMetaData1>FOO</ContainerMetaData1> <ContainerMetaData2>FOO</ContainerMetaData2> {for $i in 1 to $numberOfMessages return <MessageA> <MessageAMetaData> <MessageMetaData1>FOO {$i}</MessageMetaData1> <MessageMetaData2>FOO {$i}</MessageMetaData2> </MessageAMetaData> {for $j in 1 to $numberOfDetails return <MessageADetail> <DetailData1>FOO {$j}</DetailData1> <DetailData2>FOO {$j}</DetailData2> </MessageADetail> } </MessageA> } <MessageB> <MessageBMetaData> <MessageMetaData1>FOO</MessageMetaData1> <MessageMetaData2>FOO</MessageMetaData2> </MessageBMetaData> <MessageBDetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageBDetail> </MessageB> <MessageC> <MessageCMetaData> <MessageMetaData1>FOO</MessageMetaData1> <MessageMetaData2>FOO</MessageMetaData2> </MessageCMetaData> <MessageCDetail> <DetailData1>FOO</DetailData1> <DetailData2>FOO</DetailData2> </MessageCDetail> </MessageC> </Container> };
db:create('tr-test', local:sample(20, 100000), 'test.xml')
Thanks, Tom Rauchenwald