Dear all,
We provide you with a new and fresh version of BaseX, our open source
XML framework, database system and XQuery 3.1 processor:
https://basex.org/
Apart from our main focus (query rewritings and optimizations), we
have added the following enhancements:
XQUERY: MODULES, FEATURES
- Archive Module, archive:write: stream large archives to file
- SQL Module: support for more SQL types
- Full-Text Module, ft:thesaurus: perform Thesaurus queries
- Fulltext, fuzzy search: specify …
[View More]Levenshtein limit
- UNROLLLIMIT option: control limit for unrolling loops
XQUERY: JAVA BINDINGS
- Java objects of unknown type are wrapped into function items
- results of constructor calls are returned as function items
- the standard package "java.lang." has become optional
- array arguments can be specified with the middle dot notation
- conversion can be controlled with the WRAPJAVA option
- better support for XQuery arrays and maps
WEB APPLICATIONS
- RESTXQ: Server-Timing HTTP headers are attached to the response
For a more comprehensive list of added and updated features, look into
our documentation (docs.basex.org) and check out the GitHub issues
(github.com/BaseXdb/basex/issues).
Have fun,
Your BaseX Team
[View Less]
Hi,
The code[1] below and send as attachment generates a error message: “Static variable depends on itself: $Q{http://www.w3.org/2005/xquery-local-functions}test”.
I use these variables to refer to my private functions in my modules so I can easyly refer to them in a inheritance situation.
It’s not a big problem for me but I was wondering if the error-triggering is justified or that it should work.
[1]===========================================
declare variable $local:test := local:test#1 ;
…
[View More]declare %private function local:test( $i) { if ( $i > 0) then $local:test( $i - 1) } ;
$local:test( 10)
===========================================
Kind regards,
Rob Stapper
Sent from Mail for Windows 10
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
[View Less]
Fellow BaseX Users!
You might have heard that XML Prague 2022 will take place in June.
(Unless the then-prevalent Greek letter makes it impossible even in that
time of year, of course.)
I asked Christian whether the BaseX team will organize a user group
meeting after it had not happened for years now. Christian didn’t seem
to be very fond of organizing such a meeting. I asked him whether he
would be available to present new features, the roadmap, and for a Q&A
session if the users …
[View More]themselves organized such a meeting. He agreed, and
therefore I hereby ask the list members whether anyone will join me in
organizing this.
The plan looks as follows: We will apply for one or two 90-minute slots
via the CFP process (https://www.xmlprague.cz/cfp/). We don’t need to
have a fixed schedule yet by Dec. 20 (end of CFP date as currently
announced – it will be extended anyway).
Christian was so kind as to create a new repo, user-group, on Github. We
will use one or more of its Wiki pages [1] in order to plan the event.
The page will eventually evolve into an agenda if you agree.
Looking forward to meeting many of you in Prague in June.
And another organizing volunteer (or other volunteers), please come
forward. Maybe we can also deal with it in the Wiki [2].
Gerrit
[1] https://github.com/BaseXdb/user-group/wiki/2022-06-XML-Prague
[2] https://github.com/BaseXdb/user-group/wiki/Members
[View Less]
Hi,
recently I ran into serious (as in SERIOUS) performance trouble regarding expensive regexes, and no wonder why.
Here is my scenario:
* XML text documents with a total of 1m text nodes
* A regex search string, including a huge string dictionary list of 50.000 strings (some of them containing 50 words each)
* a match must be wrapped in an element (= create a link)
I could drink many cups of tea while waiting for the regex to complete... hours... I ran out of tea!
Then I found the 2021 …
[View More]Balisage paper from Mary Holstege titled "Fast Bulk String Matching" [1] in which she explores the Aho-Corasick algorithm, implementing it with XQuery - marvellous! Following this, while I can build a data structure which gives me fast results, building the same structure is still very slow due to the amount of text to build from. So this was not fast enough for my use case - or I may simply not be smart enough to apply it correctly :-|
So, I tried tinkering with maps which turned out to give me extraordinary performance gains:
* build a map from the string dictionary
* walk through all text nodes one by one
* for each text node, put any combination of words in the text node in word order (I need to find strings, not words) into another map
* strip punctuation (for word boundaries) and do some normalization of whitespaces in both maps
* compare the keys of both maps
* give the new reduced string dictionary to the regular regex search
While comparing the maps, I do not know where in the text my strings are, but at least I know if they are in there - to find out where exactly (and how do they fit my other regex context) I can then use a massively reduced regular regex search. Fast!
I am quite happy BUT I still cannot understand why this is so much faster for my sample data:
* plain method : 51323.72ms
* maps method: 597.94ms(!)
Is this due to any optimizations done by BaseX or have I simply discovered the power of maps?
How would you do it? Is there still room for improvement?
Why does Aho-Corasick not help much with this scenario? Is it because the list of strings is simply too massive?
Why is this so much faster with text-splitting-to-map?
See below for the query examples to better understand what I am trying to do (bulk data not included) [2],[3]
There is no normalization of punctuation in the examples but that is only necessary for completeness.
Best, Daniel
[1] http://www.balisage.net/Proceedings/vol26/print/Holstege01/BalisageVol26-Ho…
[2] plain method
let $textnodes := fetch:xml(file:base-dir()||file:dir-separator()||'example.xml')//text()
let $strings := file:read-text(file:base-dir()||file:dir-separator()||'strings.txt')
let $regex := $strings
for $textnode in $textnodes
where matches($textnode,$regex) = true()
return $textnode
[3] maps method
(:~
: Create map from string
: Tokenization
:
: @param $strings
: @return Map
:)
declare function local:createMapFromString (
$string as xs:string
) as map(*) {
let $map_words :=
map:merge(
for $string in tokenize($string,'\|')
let $key := $string
let $val := $string
return map:entry($key,$val),
map { 'duplicates': 'use-first' })
return
$map_words
};
(:~
: Create map from text nodes
: Write any combination of words in document order to the map
:
: @param $textnodes
: @return Map
:)
declare function local:createMapFromTextnodes (
$textnodes as xs:string+
) as map(*) {
map:merge(
for $node in $textnodes
let $text := normalize-space($node)
let $tokens := tokenize($text,' ')
let $map_nodes :=
map:merge(
for $start in 1 to fn:count($tokens)
for $length in 1 to fn:count($tokens) - $start + 1
return
map:entry(fn:string-join(fn:subsequence($tokens, $start, $length), ' '),'x')
)
return
$map_nodes)
};
(:~
: Compare two maps
:
: @param $map1
: @param $map2
: @return xs:string*
:)
declare function local:reduceMaps (
$map1 as map(*),
$map2 as map(*)
) as xs:string* {
for $key in map:keys($map1)
where map:contains($map2,$key)
let $value := map:get($map1,$key)
return $value
};
let $textnodes := fetch:xml(file:base-dir()||file:dir-separator()||'example.xml')//text()
let $strings := file:read-text(file:base-dir()||file:dir-separator()||'strings.txt')
let $map_words := local:createMapFromString($strings)
let $map_textnodes := local:createMapFromTextnodes($textnodes)
let $matches := local:reduceMaps($map_words,$map_textnodes)
let $regex := string-join(for $match in $matches
group by $match
return $match,'|')
for $textnode in $textnodes
where matches($textnode,$regex) = true()
return $textnode
[View Less]
Thanks. Does the query do what you are looking for?
On Tue, Dec 21, 2021 at 12:47 PM <benengbers(a)dds.nl> wrote:
>
> Christian Grün schreef op 21-12-2021 10:18:
> > Hi Ben,
>
> > return db:add($db, $doc, $table || '.xml')
> >
> > Could you give us little examples for <DB-name>, <DB-schema> and
> > <table-name> ?
> >
> > Best,
> > Christian
>
> To the best of my knowledge in MySQL and/or MariaDB DB-name and
…
[View More]> DB-schema are identical? The schema-name I use is 'Innovate'.
> Table-names are
> +--------------------+
> | Tables_in_Innovate |
> +--------------------+
> | Dienst |
> | Mdw_Probleem |
> | Mdw_Wens |
> | Medewerker |
> | Medewerker_dienst |
> | Probleem |
> | Wens |
> +--------------------+
>
> Ben
> PS.I hope you'll see this reply. Since a few days all mail from
> basex-talk is refused by Thunderbird. At least I don't see them anymore
> ....
[View Less]
Hi,
After completing my work on the R-client, I started working on a
Prolog-client.
Long ago I wrote an application in SWI-Prolog which operated on data
from a MySQL-database. (In the meantime I changed from MySQL to
MariaDb). My goal is to write a new version of that application but now
based on data which is stored in Basex.
In the basexgui, I created an empty database "MariaBases"
The following code can be used to select data in MariaDb:
sql:init("org.mariadb.jdbc.Driver"),
let $con :=…
[View More] sql:connect('jdbc:mariadb://localhost:3306/<DB-name>',
'<user>', '<password>')
return sql:execute($con, "select * from Mdw_Wens")
returns:
<sql:row xmlns:sql="http://basex.org/modules/sql">
<sql:column name="ID">1</sql:column>
<sql:column name="Medewerker_ID">5</sql:column>
<sql:column name="Wens_ID">1</sql:column>
</sql:row>
Is it possible to change the query-statement in such a way that the
results are added to MariaBases/<DB-schema>/<table-name>?
--
Ben Engbers
[View Less]
Dear all,
We conclude this year with yet another release of BaseX, our XML
framework, database and XQuery processor [1]. The new version contains
new performance tweaks and minor bug fixes, the details of which can
be examined on our GitHub page [2].
As we’ve got numerous inquiries regarding Log4j: I can comfort you
that BaseX is not affected by the vulnerabilities. We have our own
logging mechanism, which has no built-in code execution features. The
same applies to the Jetty web server.
…
[View More]Log4j once again demonstrates that Open Source is everywhere, and that
it takes valuable time and resources to build professional and secure
software. We send out a big thank you to those of you who are (and
have been) supporting us by paying for maintenance, sponsoring new
features and giving donations. It’s you who is keeping BaseX both
alive and up to date!
Have a good time,
Christian
[1] https://basex.org
[2] https://github.com/BaseXdb/basex/commits/master
[View Less]
Hi,
Sorry for ask, but where I can find or read more about email.jar module
documentation?, I'm working for a personal project and beside to send
e-mail, I need to read some emails, or is there another options to do
this task.
Cheers.
Eliud Meza