Platform: Intel i7 Windows 7 Enterprise BaseX 8.4
Today I opened an 80MB XML file in the BaseX GUI, and I was amazed at the speed of BaseX, compared to e.g. trying to open, count, and extract stuff from the same file in emacs.
What I needed from BaseX was finding the size of the contents of the second level elements (ie. the elements immediately below the root element).
The visualizations were great at navigating in the file and the Map visualization showed the sizes I was interested in. I was comparing two files with the same structure, one file with size 60MB, the other with size 80MB, and tried to figure out where the 20MB had gone.
On thing I was looking for, but did not find, was the number of child elements of a given element. I was hoping for a tooltip when hovering over an element in the visualizations (map, folder and tree), or a properties dialog when right clicking the elements in the visualization.
Is the information present somewhere? (I know the element child count has to be there somewhere, because without it, the map visualization couldn't be rendered...?)
How hard is this information to get at?
Thanks!
- Steinar
PS what I ended up doing, was selecting the elements I wanted to find the size of in map and tree, and then saving the contents of the Result view for each element. A bit more clumsy than I would have liked, but much faster than doing it in emacs.
Hi Steinar,
Thanks for your mail.
What I needed from BaseX was finding the size of the contents of the second level elements (ie. the elements immediately below the root element).
If you want to know the number of child elements from the root elements, you could run a simple XPath expression via the input bar [2] or editor panel [3] after opening the database:
count(/*/*)
As an alternative, you can first use the visualization to select the initial nodes, filter it (using the funnel icon in the upper right corner of the window) and then run the simple query * to see the number of results in the upper right corner.
Does this help? Do you have some basic experience with XPath or XQuery? Christian
[1] http://docs.basex.org/wiki/GUI#XQuery [2] http://docs.basex.org/wiki/GUI#Text_Editor
Christian Grün christian.gruen@gmail.com:
Hi Christian, thanks!
If you want to know the number of child elements from the root elements, you could run a simple XPath expression via the input bar [2] or editor panel [3] after opening the database:
count(/*/*)
That only gives me the total number of child elements of the document elements, I think...?
(The result was 14 in this case).
I have a document with a structure like this <Top> <A> <Achild></Achild> <Achild></Achild> <A> <B> <Bchild></Bchild> <Bchild></Bchild> ... <B> ... </Top>
And what I'm actually looking for in this case, is a list of the top level element names ("A" and "B" in my example) together with a count of their children.
I have tried to google up examples of XQuery expressions to do this today, but I haven't had any success in creating the desired results.
As an alternative, you can first use the visualization to select the initial nodes, filter it (using the funnel icon in the upper right corner of the window) and then run the simple query * to see the number of results in the upper right corner.
Hm... I selected a second level element in the Map, filtered it, and typed "select *" in the command window, and the upper right corner shows "0 Results".
Does this help?
It's a step on the way. Thanks! :-)
Do you have some basic experience with XPath or XQuery?
No experience with XQuery prior to today, but a fair amount of XSLT experience (which implies familiarity with XPath) back in 2000-2005 or thereabouts. But it's been a while.
And what I'm actually looking for in this case, is a list of the top level element names ("A" and "B" in my example) together with a count of their children.
Try e.g. one of these two queries:
• count(//A/*), count(//B/*) • for $c in /Top/* return count($c/*)
Hm... I selected a second level element in the Map, filtered it, and typed "select *" in the command window, and the upper right corner shows "0 Results".
What do you mean with "command window"? "select *" doesn’t sounds valid to me, but the plain asterisk character (without "select") should do the job. It’s a valid query – a shortcut for child::element() –, and it’s gives you all child elements of the nodes in the current context.
Christian Grün christian.gruen@gmail.com:
And what I'm actually looking for in this case, is a list of the top level element names ("A" and "B" in my example) together with a count of their children.
Try e.g. one of these two queries:
• count(//A/*), count(//B/*)
Those queries give me lines containing 0, one line per count(), unfortunately.
• for $c in /Top/* return count($c/*)
Stuff happens in the visualization views when I run this (some things are selected or unselected), but I get no output in the Result window, and the "Query Info" window shows: Result: - Hit(s): 0 Items - Updated: 0 Items - Printed: 0 Bytes
Could it be something to do with the namespacing in the document? Ie. that the XPath expressions don't match, even though they would seem to match.
I was trying to simplify my example, and that may have been a bad idea...?
The actual documents I'm working on, are of the same type as the one in this zip file http://goo.gl/ULH089
The document element is <FEST>, and the counts I am interested in, are the count of the child elements of the direct children of <FEST>, with names starting with "Kat".
So the actual queries I've tried, have been: - count(//KatLegemiddeldose/*), count(//KatLegemiddelMerkevare/*) - for $c in /FEST/* return count($c/*)
Hm... I selected a second level element in the Map, filtered it, and typed "select *" in the command window, and the upper right corner shows "0 Results".
What do you mean with "command window"? "select *" doesn’t sounds valid to me, but the plain asterisk character (without "select") should do the job. It’s a valid query – a shortcut for child::element() –, and it’s gives you all child elements of the nodes in the current context.
Right! With just "*" I got "6728 results"... that was more like it! :-)
Could it be something to do with the namespacing in the document?
If your document has namespaces, you can use a wildcard for your prefix…
• count(//*:A/*), count(//*:B/*) • for $c in /*:Top/* return count($c/*)
or define the prefix in the query prolog:
declare namespace abc = 'http:...'; /abc:Top
If you use count(), the result will be a number, which will be displayed in the textual result view. If you return nodes, they will also be highlighted in the visualizations.
The actual documents I'm working on, are of the same type as the one in this zip file http://goo.gl/ULH089
Google tells me that your "goo.gl shortlink has been disabled. It was found to be violating our Terms of Service."…
Christian Grün christian.gruen@gmail.com:
Could it be something to do with the namespacing in the document?
If your document has namespaces, you can use a wildcard for your prefix…
• count(//*:A/*), count(//*:B/*) • for $c in /*:Top/* return count($c/*)
or define the prefix in the query prolog:
declare namespace abc = 'http:...'; /abc:Top
Yup, I found this out at work today (without access to the email I'm using on this list).
The document has a default namespace, and this worked fine: declare default element namespace "http://www.kith.no/xmlstds/eresept/m30/2013-10-08"; for $c in /FEST/* return count($c/*)
If you use count(), the result will be a number, which will be displayed in the textual result view. If you return nodes, they will also be highlighted in the visualizations.
I wanted the element names together with their child counts, so I improved the query to this (which made for easily paste-able org-mode and Jira comment tables): declare default element namespace "http://www.kith.no/xmlstds/eresept/m30/2013-10-08"; for $c in /FEST/* return concat("|", node-name($c), "|", count($c/*), "|")
The actual documents I'm working on, are of the same type as the one in this zip file http://goo.gl/ULH089
Google tells me that your "goo.gl shortlink has been disabled. It was found to be violating our Terms of Service."…
So I also found out at work today, goo.gl shortlinks can't be used to link to zip files.
Hopefully dropbox doesn't have this limitation: https://www.dropbox.com/s/p0gy01j0mfsp7ne/M30_Fest250Rekvirent_20151015.zip
Thanks for your help!
And thanks for BaseX! The more I use it, the more I like it! :-)
basex-talk@mailman.uni-konstanz.de