On Tue, 29 Jun 2021 at 10:36, Christian Grün christian.gruen@gmail.com wrote:
Hi Reece,
Interesting thoughts. All I can say is that your iterator approach for sets looks pretty similar to something that I tried in the past.
More generally, it would be helpful for BaseX to have adapters for Java
arrays, Lists, Sets, Maps, and Iterables/Iterators to XQuery (XDM) types and functions to construct them in XQuery (like my util:list-to-sequence function above).
One way would be to add new built-in functions to BaseX (in the Conversion Module, or in a new Java Module) that provide conversions custom functions for data structures in Java. I guess it might be cleaner to convert lists and sets to arrays, as those data structures can also contain null references.
It would be useful to have Java Collection to sequence, Java Collection to array(*) and Java Map to map(*) converters. Either the conversion module or a Java helper module would be useful. Saxon does the Collection to sequence automatically in its Java bindings - https://www.saxonica.com/documentation10/index.html#!extensibility/functions... .
My rational for not converting them to arrays is to avoid a performance overhead when dealing with a large number of items, but I can see how null values could be complicated to manage if the BaseX sequence interface doesn't do flattening itself (otherwise, you could map null to the empty sequence instance like with the general Java mapping).
Additionally, I'm working in Kotlin and have the list values as non-nullable types, so that won't be an issue for my particular use case.
The main reason why we didn’t push this any further was that we didn’t want to give users additional incentives to resort to Java code. Many things can also be done in XQuery, and as the XQuery-Java mapping for data types can never be perfect, and we experienced that users often stumbled upon these things in the beginning. However, quite obviously, there are always use cases in which a direct data exchange between XQuery and Java is helpful, and less cumbersome than writing custom Java functions with custom entry points for XQuery function calls (as e.g. documented in [2]).
Yeah. I'm experimenting with NLP and am passing the text through a tokenization, stemming/lemmatization, part of speech, etc. pipeline which looks something like this:
let $tokens := nlp:tokenize($node) => nlp:lemmatize() => nlp:pos-tag() => util:list-to-sequence() for $token in $tokens let $text := Token:get-text($token) let $part-of-speech := util:set-to-sequence(Token:get-part-of-speech($token)) return <span class="token" title="{string-join(",", $part-of-speech)}">{$text}</span>
I'm using Java (Kotlin more accurately) to do the logic that needs state to implement (and possibly share with other projects), and tying it together in XQuery.
Maybe it would be good indeed to realize the set of additional
functions as an XQuery module. We still haven’t defined a canonical way to promote and document external BaseX XQuery Modules – some users may remember that we have assembled existing modules on our server some time ago [1]; other modules, such as Leo’s algorithms and data structures, can be found on private repositories [3] – so ideas on how to get this better organized are welcome.
There is http://cxan.org/ but I don't know how active it currently is.
Kind regards, Reece
Cheers, Christian
[1] https://files.basex.org/modules/ [2] https://docs.basex.org/wiki/Repository#Combined [3] https://github.com/LeoWoerteler/xq-modules
On Mon, Jun 28, 2021 at 6:04 PM Reece Dunn msclrhd@googlemail.com wrote:
Hi,
I'm making use of the Java bindings in BaseX, with some of the functions
returning List<String> and Set<String> types.
For List<String> I can adapt that to a sequence using:
declare namespace List = "java:java.util.List"; declare function util:list-to-sequence($list) { for $n in 0 to List:size($list) - 1 return List:get($list, $n cast as xs:int) };
however, I'm not sure how to do the equivalent for Set<String> (or more
generally, any Iterator<T>) without converting it to a list or array first, as Set only has size() and iterator() methods. Has anyone done this before?
The best I can come up with is the following, which relies on the size
of the set and the number of next calls in the iterator to be the same (where it should be checking hasNext):
declare namespace Set = "java:java.util.Set"; declare namespace Iterator = "java:java.util.Iterator"; declare function util:set-to-sequence($set) { let $iterator := Set:iterator($set) for $n in 0 to Set:size($set) - 1 return Iterator:next($iterator) };
More generally, it would be helpful for BaseX to have adapters for Java
arrays, Lists, Sets, Maps, and Iterables/Iterators to XQuery (XDM) types and functions to construct them in XQuery (like my util:list-to-sequence function above).
Kind regards, Reece