Hi Kendall,

Following up, the solution below (stripping the namespaces) worked well on the toy example I shared, but it does not scale well with the size of the database. I needed up getting the following

    Error: Out of Main Memory.

This is despite providing BaseX with 8 GB of memory (BASEX_JVM="-Xmx8g $BASEX_JVM”). The issue is due to the large size of the XML file that was loaded into the database (see below).

I can go with the default namespace declaration so no followup is needed, unless you are curious and have the time to investigate.

Best,
Ron


Database Properties
 NAME: DrugBank
 SIZE: 3778 MB
 NODES: 96333486
 DOCUMENTS: 1
 BINARIES: 0
 TIMESTAMP: 2017-09-01T14:57:50.000Z
 UPTODATE: true

Resource Properties
 INPUTPATH: /Volumes/Extra/Documents/Data Science/Data Sets/DrugBank/drugbank.xml
 INPUTSIZE: 3243 MB
 INPUTDATE: 2017-09-01T14:50:38.000Z

On September 1, 2017 at 5:29:06 PM, Ron Katriel (rkatriel@mdsol.com) wrote:

Hi Kendall,

Yes, your solution works too (see query below). Really appreciate your help!

Best,
Ron


declare namespace e = "http://example.com";

declare function e:strip-namespaces($node as node()) as node() { 
  typeswitch ($node) 
  case $node as document-node() return document { $node/node()/e:strip-namespaces(.) } 
  case $node as element() return element {local-name($node)} { $node/@*, $node/node()/e:strip-namespaces(.) } 
  default return $node 
}; 

for $drug in e:strip-namespaces(db:open('DrugBankFail'))/drugbank/drug
where not(empty($drug/atc-codes/atc-code))
return <drug> {
  <ATC5> { string-join(distinct-values($drug/name), ' | ') } </ATC5>,
  <ATC4> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 5]), ' | ') } </ATC4>,
  <ATC3> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 4]), ' | ') } </ATC3>,
  <ATC2> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 3]), ' | ') } </ATC2>,
  <ATC1> { string-join(distinct-values($drug/atc-codes/atc-code/level[string-length(@code) = 1]), ' | ') } </ATC1>
} </drug>

On September 1, 2017 at 5:21:11 PM, Kendall Shaw (kendall.shaw@workday.com) wrote:

I think my mail client altered my post to move ‘.’ characters to the end of what it thinks is a sentence.

This:

e:strip-namespaces().

Is supposed to be this:

e:strip-namespaces(.)