Improving the understanding for counting of entries in data groups
Hello, I constructed the following XML file for another test of the software “BaseX 9.7”. <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <test_data> <info> <id>12</id> <topics> <topic>Demo1</topic> <topic>Demo2</topic> </topics> </info> <info> <id>23</id> <topics> <topic>Demo1</topic> <topic>Demo2</topic> </topics> </info> <info> <id>34</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>45</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>56</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>67</id> <topics> <topic>Probe1</topic> </topics> </info> </test_data> I tried the following XQuery script out accordingly. declare option output:method "csv"; declare option output:csv "header=yes, separator=|"; for $x in //test_data/info group by $topics := string-join($x/topics/topic/data(), "*") let $incidence := count($topics) order by $incidence descending return <csv> <record> <topic_combination>{$topics}</topic_combination> <incidence>{$incidence}</incidence> </record> </csv> Corresponding test result: topic_combination|incidence Demo1*Demo2|1 Test1*Test2*Test3|1 Probe1|1 I would like to see the numbers “2” and “3” instead at the end of two rows for such a data analysis approach. I would appreciate further advices for this use case. Regards, Markus
If the following result is the one you would expect … topic_combination|incidence Test1*Test2*Test3|3 Demo1*Demo2|2 Probe1|1 … it is sufficient to replace …
let $incidence := count($topics)
… by … let $incidence := count($x) The string join yields a single item, which is assigned to $topics; thus, count($topics) returns 1. Grouped values will be assigned to the variables that have been declared before the 'group by' clause. This means that count($x) returns … • 1 if it’s called before 'group by' • the number of grouped items if it’s called after 'group by'
participants (2)
-
Christian Grün -
Markus Elfring