Hi I am searching for the most effective way for returning element with highest string value.
*Situation* I have heap of xml files, containing MSG elements, having attribute id and inner element TGEN, denoting time of MSG creation <MSG id="abcd"> <TGEN>2011-02-23T15:21:14-01:00<TGEN> <some-content-here/> </MSG>
My aim is to select for each id value the latest version (having highest value of TGEN)
For comparing TGEN it is sufficient to treat it as string value, there is no need to cast it to dateTime, which is expensive.
*My question:* How to form the query, which will use existing indexes most efficiently. I am not yet very familiar with reading query plans so I am not able to be sure, it goes the most efficient way.
*Background* built in max function converts values into xs:double
http://www.xqueryfunctions.com/xq/functx_max-string.html declares function max-string
declare namespace functx = "http://www.functx.com"; declare function functx:max-string ( $strings as xs:anyAtomicType* ) as xs:string? {
max(for $string in $strings return string($string)) } ;
*Sample data file* Sample XML can be created using following xquery
element MJD { for $id in 1 to 10, $age_in_hours in 24 to 48 let $age := xs:dayTimeDuration(concat('PT', $age_in_hours, 'H')) let $tgen := xs:dateTime('2011-03-16T12:00:00Z') - $age return element MSG { attribute id {$id}, attribute age {$age}, element TGEN {$tgen} } }
*Our existing solution* We already did some optimization, biggest speed up gained by using group by Here is my latest version of xquery
let $latest_msgs := (for $msg in //MSG let $id := $msg/@id group by $id return ( (for $m in $msg let $tgen := $m//TGEN order by xs:string($tgen) descending return $m )[1]) ) return <MJD count='{count($latest_msgs)}' created='{max($latest_msgs//xs:string(TGEN))}'>{$latest_msgs}</MJD>
Any comments are welcome, especially about efficient use of indexes
Jan
basex-talk@mailman.uni-konstanz.de