Hi Kendall,
Following up, the solution below (stripping the namespaces) worked well on the toy example I shared, but it does not scale well with the size of the database. I needed up getting the following
Error: Out of Main Memory.
This is despite providing BaseX with 8 GB of memory (BASEX_JVM="-Xmx8g $BASEX_JVM”). The issue is due to the large size of the XML file that was loaded into the database (see below).
I can go with the default namespace declaration so no followup is needed, unless you are curious and have the time to investigate.
Best, Ron
Database Properties NAME: DrugBank SIZE: 3778 MB NODES: 96333486 DOCUMENTS: 1 BINARIES: 0 TIMESTAMP: 2017-09-01T14:57:50.000Z UPTODATE: true
Resource Properties INPUTPATH: /Volumes/Extra/Documents/Data Science/Data Sets/DrugBank/drugbank.xml INPUTSIZE: 3243 MB INPUTDATE: 2017-09-01T14:50:38.000Z
On September 1, 2017 at 5:29:06 PM, Ron Katriel (rkatriel@mdsol.com) wrote:
Hi Kendall,
Yes, your solution works too (see query below). Really appreciate your help!
Best, Ron
declare namespace e = "http://example.com";
declare function e:strip-namespaces($node as node()) as node() { typeswitch ($node) case $node as document-node() return document { $node/node()/e:strip-namespaces(.) } case $node as element() return element {local-name($node)} { $node/@*, $node/node()/e:strip-namespaces(.) } default return $node };
for $drug in e:strip-namespaces(db:open('DrugBankFail'))/drugbank/drug where not(empty($drug/atc-codes/atc-code)) return <drug> { <ATC5> { string-join(distinct-values($drug/name), ' | ') } </ATC5>, <ATC4> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 5]), ' | ') } </ATC4>, <ATC3> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 4]), ' | ') } </ATC3>, <ATC2> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 3]), ' | ') } </ATC2>, <ATC1> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 1]), ' | ') } </ATC1> } </drug>
On September 1, 2017 at 5:21:11 PM, Kendall Shaw (kendall.shaw@workday.com) wrote:
I think my mail client altered my post to move ‘.’ characters to the end of what it thinks is a sentence.
This:
e:strip-namespaces().
Is supposed to be this:
e:strip-namespaces(.)