I have a similar situation in which I want to get all distinct values of a specific attribute. I’ve tried using 2 different approaches: group and distinct-values. On small or medium size databases group tends to be faster. When trying to get distinct values of a specific attribute from large databases however both approaches are timing out for me. I’m looking for a way to optimize this query:

 

distinct-values(for $db in db:list() return distinct-values(db:open($db)//@sec-type))

 

Thanks,

Vincent

 

 

From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Monday, July 11, 2016 2:46 PM
To: Alex Muir <alex.g.muir@gmail.com>
Cc: BaseX <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] retrieve a sequence of all values within an attribute index

 

> I wrote the following query which returns 59 distinct periods from a 8gb
> db.. It's quite slow but it works

Ah, well… I guess that all the values are numeric? In that case, only
the min and max value will be stored in the statistics (and that won’t
help you in fact). Bad luck. You can call index:facets("13F") to get
more insight… Maybe we can fix that in future and store distinct
numbers as well.


> let $periods := distinct-values(db:open("13F")//data/@periodOfReport)
> let $transform :=
> <periods>
> {
> for $period in $periods
> return <period>{$period}</period>
> }
> </periods>
>
> return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml',
> $transform)
>
>
> Regards
> Alex
> tech.jahtoe.com
> bafila.jahtoe.com
>
> On Mon, Jul 11, 2016 at 6:21 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> > any way to retrieve the index for a specific attribute name?
>>
>> Nope, sorry. The index itself has no information on the location of
>> the text and attribute values. You’ll have to use distinct-values:
>>
>> distinct-values(//periodOfReport)
>>
>> If the number of distinct values is smaller than MAXCATS [1], the path
>> index will be utilized to speed up your query [2]. You can set MAXCATS
>> to a much larger value, but this might slow down the time required for
>> opening a database.
>>
>> Hope this helps
>> Christian
>>
>> [1] http://docs.basex.org/wiki/Options#MAXCATS
>> [2] http://docs.basex.org/wiki/Indexes#Path_Index
>
>