BaseX-Talk April 2022

basex-talk@mailman.uni-konstanz.de

24 participants
32 discussions

BaseX 9.6: The Summer Edition
by Christian Grün 28 Nov '24

28 Nov '24

Dear all, We provide you with a new and fresh version of BaseX, our open source XML framework, database system and XQuery 3.1 processor: https://basex.org/ Apart from our main focus (query rewritings and optimizations), we have added the following enhancements: XQUERY: MODULES, FEATURES - Archive Module, archive:write: stream large archives to file - SQL Module: support for more SQL types - Full-Text Module, ft:thesaurus: perform Thesaurus queries - Fulltext, fuzzy search: specify Levenshtein limit - UNROLLLIMIT option: control limit for unrolling loops XQUERY: JAVA BINDINGS - Java objects of unknown type are wrapped into function items - results of constructor calls are returned as function items - the standard package "java.lang." has become optional - array arguments can be specified with the middle dot notation - conversion can be controlled with the WRAPJAVA option - better support for XQuery arrays and maps WEB APPLICATIONS - RESTXQ: Server-Timing HTTP headers are attached to the response For a more comprehensive list of added and updated features, look into our documentation (docs.basex.org) and check out the GitHub issues (github.com/BaseXdb/basex/issues). Have fun, Your BaseX Team

11 43

recursively used variables
by Rob Stapper 12 Aug '24

12 Aug '24

Hi, The code[1] below and send as attachment generates a error message: “Static variable depends on itself: $Q{http://www.w3.org/2005/xquery-local-functions}test”. I use these variables to refer to my private functions in my modules so I can easyly refer to them in a inheritance situation. It’s not a big problem for me but I was wondering if the error-triggering is justified or that it should work. [1]=========================================== declare variable $local:test := local:test#1 ; declare %private function local:test( $i) { if ( $i > 0) then $local:test( $i - 1) } ; $local:test( 10) =========================================== Kind regards, Rob Stapper Sent from Mail for Windows 10 -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

4 7

Add a comment to a backup?
by Jonathan Robie 13 May '22

13 May '22

I have been making backups before doing particularly complex things to my treebanks, and I find myself writing down information about what stage of processing a given backup corresponds to. "after replacing subtrees for missing compounds" I wish I could associate these strings with backups in BaseX so I can more easily know which one I would restore if something went wrong. Jonathan

4 8

Text index requires `/text()` in query
by Matthew Dziuban 02 May '22

02 May '22

Hi all, I was recently debugging performance of a query with an exact string comparison and discovered that it seems the query was only rewritten to use the text index [1] if I explicitly added `/text()` to the path I was comparing. My data looks like this: <data> <element><id>123</id></element> </data> And my original query was: for $el in db:open('DatabaseName')/data/element where $el/id = '123' return $el With 3 million <element> nodes in the database, this query took about 4 seconds, which made me question whether the text index was being used. I then changed the query to add `/text()` to the `where` clause, like so: for $el in db:open('DatabaseName')/data/element where $el/id/text() = '123' return $el With this change, the query only takes 0.4 seconds. Is it expected that `/text()` is required to get the text index to kick in? Thanks in advance, Matt [1] https://docs.basex.org/wiki/Indexes#Text_Index

2 6

Performance of ft:search function
by Tim Thompson 29 Apr '22

29 Apr '22

Hello, I have a largish (5.4G) file with a full-text index that I am using to reconcile names in a local dataset. I've been experimenting with splitting the file into many smaller index files to improve performance. I group the entries by initial character and create a new index file for each distinct initial character. Each smaller file then gets its own full-text index. I've been following the approach outlined in the documentation for custom index structures <https://docs.basex.org/wiki/Indexes#Custom_Index_Structures>. Using prof:track, I've noticed the following performance for different uses of ft:search. (Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB subindex. Times are averaged across 10 runs of 1000 iterations for each expression.) 1. Direct lookup against large index Time: 23ms Expression: ft:search($db, $text)/../.. 2. Direct lookup against subindex Time: 3.3ms Expression: ft:search($index, $text)/../.. 3. Lookup against subindex file with reference to large index Time: 2.9ms Expression: let $s := ft:search($index, $text)/../.. return db:open-id($db, $s/id)/../.. My question is: why would the third expression be slightly faster (or at least not slower) than the second one, if it involves additional computation? Thanks in advance, Tim -- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Yale University Library

2 3

xsl:transform-report message truncation
by Andy Bunce 29 Apr '22

29 Apr '22

Hi, Using 9.7.1 (: test transform :) let $xslt:=<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <xsl:template match="/"> <xsl:message>I want to see all of the very long message aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccccc hhhhhhhhhhhhhhhhhhhhhhhhhhhh gggggg gggggggggggggggggg aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa important bit</xsl:message> </xsl:template> </xsl:stylesheet> return xslt:transform-report(<xml/>,$xslt)?messages Returns ["I want to see all of the very long message aaaaaaaaaaaaaaaaa
 aaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccccc hhhhhhhhhhhhhhhhhhhhhhhhhhhh
 gggggg gggggggggggggggggg aaaaaaaaaaaaaaaaaaa..."] Is it BaseX truncating this? Can it be turned off for this case? /Andy

2 2

Date picture and xslt:transform()
by Zimmel, Daniel 29 Apr '22

29 Apr '22

Hi, why do I get different results with the following two queries? xslt:transform() does not respect my date picture. Expected result: <root>29. März 2022</root> Query 1: <root>{format-date(xs:date('2022-03-29'), '[D]. [MNn] [Y]', 'de', (), ())}</root> Result: <root>29. März 2022</root> Query 2: declare namespace xsl = 'http://www.w3.org/1999/XSL/Transform'; let $xslt := <xsl:stylesheet version="3.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"> <xsl:template match="/"> <root> <xsl:sequence select="format-date(xs:date('2022-03-29'), '[D]. [MNn] [Y]', 'de', (), ())"/> </root> </xsl:template> </xsl:stylesheet> let $xml := <root/> return for $xml in $xml return $xml => xslt:transform($xslt) Result: <root>[Language: en]29. March 2022</root> Running the XSLT with Saxon EE (not in BaseX via xslt:transform) returns (correctly): <root>29. März 2022</root> Using BaseX 9.5 ? Daniel

2 3

Stemming in BaseX Full-Text
by Tim Thompson 27 Apr '22

27 Apr '22

I'm currently involved in a project that's using MarkLogic, and I noticed that its implementation of English-language stemming differs from that of BaseX: e.g., "mouse" and "mice" both stem to "mouse." In BaseX, those words are stemmed separately. Is this a known limitation of the internal English syntax parser? Example: db:create("stem-test", <data> <x>mouse</x> <y>mice</y> </data> , "data", map {"ftindex": true(), "stemming": true(), "language": "en"} ) , update:output( ft:search("stem-test", "mice") ) Thanks, Tim -- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Yale University Library

3 7

Using error and catch for error paths in REST API endpoint code
by Omar Siam 27 Apr '22

27 Apr '22

Way too often I saw myself and my colleagues write huge, very hard to understand deeply nested if () then else code to handle any control path but the complete success one in RestXQ code. [1] is an example of such code even though it uses the module I want to introduce here. It seems I didn’t have time to refactor it yet. To add insult to injury the code producing a 500, 404, 403, 401, 302 or maybe even 201 response code was * neither short nor very uniform and * would respond to the browser with XML/XHTML or JSON or text but in most situations the format the browser side had the hardest time to handle For example [2] delivers an XHTML message no matter what the Accept header says and although it is an accurate message to a user, very similar XML snippets are found throughout that unit and maybe need to be adapted each separately if the HTML needs to updated. My XML snippets CRUD API started as a port of an apigility (now laminas api-tools [3]) API working with a relational backend to store XML snippets to something that does the same but just uses BaseX for any data storage and is much better at querying using sophisticated XPaths. So from that previous API I learned about RFC 7807. [4] RFC 7807 “Problem Details for HTTP APIs” [5] is one of several specification now available for reporting errors from REST APIs. This specification explicitly states how the errors should look like in XML as well as in JSON. Maybe this is a bit bold, but I used the URL of that very RFC 7807 as the resolved URI for my module. [6] Please note: I have to admit my modules have something unusual in common: I use the namespace prefix “_” within the module and only use a more talking prefix when I use a module elsewhere. So, the “_” prefix can map to numerous URIs in my code. I am not 100% sure if there are down sides to this style but I use it for a while now and no problems come to mind. I really dislike to get an error, especially during development of some service, without any indication of where that actually occurred. That is to say: I like stack traces in my errors. It also would be great if any runtime error in my code would be reported as XML or JSON, depending on the format the browser asked for, just as errors I explicitly raise. A few parts provided by BaseX greatly help in getting all of this packaged in some xqm-file. * A stack trace is always available in “$err:additional” when catching errors [7] * One catch-all error handler can be installed (“declare %rest:error('*') function”, although there are minor downsides to this catch-all handler) [8] * The XML based direct format BaseX uses to store JSON by default makes it very easy to transform RFC 7807 XML to JSON [9] * It is easy to query the request header anywhere in RestXQ XQuery code running on BaseX [10] I tried to have easy to remember function names that make the code readable as if it was a sentence. Therefore for example, I created a function wrapping the users code that says “return api-problem:or-result(user_function#x, [params, …])”[11]. Another example would be “return api-problem:result(<problem>[…]</problem>)”. I also wanted to come up with an “intuitive way” to send standard HTTP response codes. What I came up with is a special namespace and a mapping of status codes to standard messages that is part of my module. So for example a status code 404 can be returned like this: “error(xs:QName('response-codes:_404'), $api-problem:codes_to_message(404), 'A custom message')” [12]. Something similar is probably possible with “web:response-header()” but with the error function something like if let $check_file_exists := if (not(file:exists($path))) then error(xs:QName('response-codes:_404'), $api-problem:codes_to_message(404), 'A custom message') at the top of a more complex RestXQ function is possible. There is no need to wrap custom code in one or several “if () then else” blocks. This helps readability in my opinion very much, especially if you have to check quite a few things before, say, writing something to the database. The idea also works great with permission checks that use %perm:check annotations. My first use of my api-problem module predates the addition of these helpful annotations. [13] A while before we had a HTTP header parameter that gave us the execution time, I wanted to measure execution time. So the functions take an xs:integer that should be obtained using “prof:current-ns()” [14] as the very first variable in a RestXQ function and I try to execute a second “prof:current-ns()” as late as I can imagine in my module [15]. That way I think I get a reasonably accurate timing result in my outputs as long as the respective error does not come from the catch-all handler. I also incorporated a quick and dirty HTML page rendering function that displays an error in detail if someone needs that [16]. As a small addition this error page can link to error descriptions in the W3C standards. [17] By the way: developing this started on BaseX but then we had similar needs in another open-source XML database existing today so I tried to port the code. Not only worked that rather well (probably also because I know the XQuery needed there rather well too), I also had a few new ideas and added them back to the BaseX version. May be there is some left over code in the module at the moment that reimplements some helpful, non-standard XQuery functions for this reason. The code is somewhat portable. A few thoughts: * Some say that it is necessary for security purposes to disable any stack traces in production environments. I am not really believing this does much good. But if one does not want to hard code a Boolean switch in the modules source code: What would be the fastest external source one could use in terms of compile, optimizing and execution time? Are stack traces not available as “$err:additional” when RESTXQERRORS are switched off? * Is there any sane way to get a QName with an unknown prefix of an error as a string like in the catch all handler and resolve it against all prefix-URI mappings known in some XQuery program? [1] https://github.com/acdh-oeaw/vleserver_basex/blob/main/vleserver/users.xqm#… [2] https://github.com/acdh-oeaw/vicav-app/blob/master/http.xqm#L24-L55 [3] https://api-tools.getlaminas.org/ [4] https://api-tools.getlaminas.org/documentation/modules/api-tools-api-problem [5] https://datatracker.ietf.org/doc/html/rfc7807 [6] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/api-problem/api… [7] https://docs.basex.org/wiki/XQuery_3.0#Try.2FCatch [8] https://docs.basex.org/wiki/RESTXQ#Catch_XQuery_Errors [9] https://docs.basex.org/wiki/JSON_Module#Direct [10] https://docs.basex.org/wiki/Request_Module#request:header [11] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/tests/api-probl… [12] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/tests/http.xqm#… [13] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/tests/http.xqm#… [14] https://github.com/acdh-oeaw/vleserver_basex/blob/main/vleserver/dicts.xqm#… [15] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/api-problem/api… [16] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/tests/api-probl… [17] https://github.com/acdh-oeaw/api-problem4restxq/blob/master/api-problem/api… Best regards -- Mag. Ing. Omar Siam Austrian Center for Digital Humanities and Cultural Heritage Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons Wohllebengasse 12-14, 1040 Wien, Österreich | Vienna, Austria T: +43 1 51581-7295 omar.siam(a)oeaw.ac.at | www.oeaw.ac.at/acdh

3 9

Integers as attribute values
by Giuseppe G. A. Celano 27 Apr '22

27 Apr '22

Hi Everyone, I have an xml document with elements such as <div n=“21”>. If I run the query doc(“file.xml")//div[@subtype="chapter"]//*/parent::div[@n=21], I get the relevant div element, even if 21 is passed as an integer. On the other hand, if I type doc(“file.xmll")//div[@n=21], I get the error "Cannot convert to xs:double”, which can be solved by writing doc(“myfile.xmll")//div[@n=“21”]. Is this due to the fact that BaseX tries to convert the values of @n of all div elements into a number and, if it happens that the @n values returned are all numbers, then an error is not raised (the comparison is then possible), otherwise it is? Is this BaseX specific? Thanks. Best, Giuseppe

3 4

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

BaseX-Talk April 2022