BaseX-Talk

basex-talk@mailman.uni-konstanz.de

9 participants
5241 discussions

How to invalidate XSD cache without stopping and restarting Jetty Web server?
by ykhabins＠bellsouth.net 10 Jan '24

10 Jan '24

Hello, We are using BaseX 10.8 beta for XML files validation against the XSD. Our XSDs are conformant with the XSD 1.1 that why all necessary Apache Xerces 12.2.2 files are in the lib/custom directory. On top of that we are using Jetty Web server by launching c:\Program Files (x86)\BaseX\bin\basexhttp.bat Eventually, our REST HTTP call is as follow: string REQUESTURL = $"http://{HOST}:{PORT}/rest?run={XQUERY}&$xml={xmlFile}&$xsd={xsdFile}"; Our XQuery validation call is as follows: let $result := validate:xsd-report($xml, $xsd, map { 'http://apache.org/xml/features/validation/cta-full-xpath-checking': true(), 'cache': true() }) Everything is working as expected. :) Here is a question. =================== The validate:xsd-report(...) call is using caching via the 'cache': true() parameter. We are updating XSDs from time to time. Unfortunately, the validation is not picking up a new version of the XSD file on the file system. It is unaware that the *.xsd file was updated. It is still using an old XSD version that is cached in memory. Is it possible to invalidate the XSD cache without stopping and restarting Jetty Web server? IMHO, it is too intrusive. Maybe you can introduce a manual command for XSD cache invalidation in the BaseX GUI? Or some other mechanism to handle the scenario. Regards, Yitzhak Khabinsky

2 4

fiddle certificate error
by Martin Honnen 09 Jan '24

09 Jan '24

Hi, trying to access https://bxfiddle.cloud.basexgmbh.de/ Chrome tells me Your connection is not private Attackers might be trying to steal your information from bxfiddle.cloud.basexgmbh.de (for example, passwords, messages, or credit cards). Learn more NET::ERR_CERT_DATE_INVALID bxfiddle.cloud.basexgmbh.de normally uses encryption to protect your information. When Chrome tried to connect to bxfiddle.cloud.basexgmbh.de this time, the website sent back unusual and incorrect credentials. This may happen when an attacker is trying to pretend to be bxfiddle.cloud.basexgmbh.de, or a Wi-Fi sign-in screen has interrupted the connection. Your information is still secure because Chrome stopped the connection before any data was exchanged. You cannot visit bxfiddle.cloud.basexgmbh.de right now because the website uses HSTS. Network errors and attacks are usually temporary, so this page will probably work later. It seems the certificate expired on Jan 2nd of this (new) year. Anyone can fix this?

2 1

Creating backups manually
by Matthew Dziuban 04 Jan '24

04 Jan '24

Hi all, I'm interested in creating backups manually so I can use a different compression algorithm. Based on the source code[1], it looks like backups are just created by adding each file (excluding upd.basex) in the database directory to a .zip file, so I could do the same using tar and my compression algorithm of choice. Is my understanding correct or am I missing some other logic? Thanks, Matt [1] https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/bas…

2 2

Help - Regarding Performance Improvement
by Deepak Dinakara 03 Jan '24

03 Jan '24

Hi, Reaching out to get suggestions on improving performance. Using basex to store and analyze around 350,000 to 500,000 XMLs. Size of each XML varies between a few KBs to 5MB. Each day around 10k XMLs get added/patched. I have the following queries 1) What is the optimal size or number of documents in a DB? Earlier I had 1 DB with different collections but inserts were too slow, took more than 30s just to replace a document. So split it up by some category to have around 30 DBs. Inserts are fine but again if there are too many documents in a category, patching that DB slows and querying across all DBs also gets slowed down. Any optimal number for DBs? Can I create many DBs like 1 for every 10K XMLs? I read through https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg06310.ht…, having 100s of DBs cause query performance degradation? Is there any better solution? 2) Query performance has degraded with more documents in a DB. I also noticed that with/without token/attribute index, there is not much difference to query performance (they are just XML attribute queries). "Optimize" flag after inserts to recreate the index takes too much time and memory. I am not running it now since I didn't find significant improvement with/without index with my tests. Any suggestions for improving this? 3) Is it possible to just run queries against specific XMLs? I will have a pre-filter based on user selection and queries need to be run against only those XMLs. There are a number of filters users can apply and every time it can result in a different set of XMLs against which analysis has to be performed (Hence not feasible to create so many collections). Right now, I am querying against all XMLs even though I am interested only in a subset of XMLs and doing post filtering. I did go through https://mailman.uni-konstanz.de/pipermail/basex-talk/2010-July/000495.html, but again having a regex to include all the interested file paths(sometimes entire set of documents) will slow it down. Thank you, Deepak

2 2

XQuery fold-left Problem
by Marco Duiker - LandGoed 19 Dec '23

19 Dec '23

Please help me with the following problem: I try to construct the current status of a series of objects along the following simple rules: * the current status of on object with a certain `identificatie` (id) is the newest version of that object * except when the newest object has the property `<status>beëindigen</status>` then the object stops existing. I use a fold-left to proces these rules with a simple function implementing those rules. This gives me unexpected results. I've reduced the problem to a single stand alone query in which the last object (`obj3`) has the property `<status>beëindigen</status>`. So I would expect this line: `let $stand4 := fold-left(($obj3, $obj2, $obj1), (), function($stand, $obj) { local:stand-totaal($stand, $obj) })` to give me an empty result as `obj3`has the property <status>beëindigen</status> but I am getting `obj1` as demonstrated in the following stand alone query. When running the stand alone query I would *expect*: 2 <-> 1 as shown in the result (`obj2`) 3 <-> 1 as shown in the result (empty as `obj3`has the property `<status>beëindigen</status>`) 3 <-> 2 as shown in the result (empty as `obj3`has the property `<status>beëindigen</status>`) 3 <-> 2 <-> 1 as 3 <-> 2 and 3 <-> 1 (empty as `obj3` has the property `<status>beëindigen</status>`) *but in the result I am getting *`obj1`. What am I doing wrong? Any help greatly appreciated. The query I use: ``` declare namespace op="http://www.geostandaarden.nl/imow/opobject"; declare namespace da="http://www.geostandaarden.nl/imow/datatypenalgemeen"; declare namespace ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing"; declare namespace gml="http://www.opengis.net/gml/3.2"; declare namespace l="http://www.geostandaarden.nl/imow/locatie"; declare namespace ow="http://www.geostandaarden.nl/imow/owobject"; declare namespace ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand"; declare namespace r="http://www.geostandaarden.nl/imow/regels" ; declare namespace rol="http://www.geostandaarden.nl/imow/regelsoplocatie" ; declare namespace sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek" ; declare namespace xlink="http://www.w3.org/1999/xlink" ; declare namespace xsi="http://www.w3.org/2001/XMLSchema-instance"; declare namespace rg="http://www.geostandaarden.nl/imow/regelingsgebied"; declare function local:id (: returns the id(s) of ow-object(s) :) ($seq as item()*) as item()* { $seq//*:owObject/*/*:identificatie | $seq/*/*:identificatie }; declare function local:stand-totaal (: returns an object by comparing new to old :) (: newest should be returned, except when <ow:status>beëindigen</ow:status> then no object should be returned :) ($new as item()*, $old as item()*) as item()* { let $new_ids := local:id($new) let $stand_deel1 := for $obj_oud in $old//*:owObject where not(local:id($obj_oud) = $new_ids) return <sl:stand>{ $obj_oud }</sl:stand> let $stand_deel2 := for $obj_new in $new//*:owObject[not(.//*:status)] return <sl:stand>{ $obj_new }</sl:stand> return ($stand_deel1, $stand_deel2) }; let $obj1 := <sl:stand> <ow-dc:owObject xmlns:ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand" xmlns:sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <r:RegelVoorIedereen xmlns:r="http://www.geostandaarden.nl/imow/regels"> <r:identificatie>nl.imow-mnre1034.juridischeregel.OR0000000027</r:identificatie> <r:idealisatie>http://standaarden.omgevingswet.overheid.nl/idealisatie/id/concept/Exact</r:idealisatie> <r:artikelOfLid> <r:RegeltekstRef xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.regeltekst.OR0000000027"/> </r:artikelOfLid> <r:thema>http://standaarden.omgevingswet.overheid.nl/thema/id/concept/WaterEnWatersy…</r:thema> <r:locatieaanduiding> <l:LocatieRef xmlns:l="http://www.geostandaarden.nl/imow/locatie" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.ambtsgebied.LND6030B"/> </r:locatieaanduiding> </r:RegelVoorIedereen> </ow-dc:owObject></sl:stand> let $obj2 := <sl:stand> <ow-dc:owObject xmlns:ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand" xmlns:sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <r:Instructieregel xmlns:r="http://www.geostandaarden.nl/imow/regels"> <r:identificatie>nl.imow-mnre1034.juridischeregel.OR0000000027</r:identificatie> <r:idealisatie>http://standaarden.omgevingswet.overheid.nl/idealisatie/id/concept/Exact</r:idealisatie> <r:artikelOfLid> <r:RegeltekstRef xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.regeltekst.OR0000000027"/> </r:artikelOfLid> <r:thema>http://standaarden.omgevingswet.overheid.nl/thema/id/concept/WaterEnWatersy…</r:thema> <r:locatieaanduiding> <l:LocatieRef xmlns:l="http://www.geostandaarden.nl/imow/locatie" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebied.Kustfundament"/> </r:locatieaanduiding> <r:gebiedsaanwijzing> <ga:GebiedsaanwijzingRef xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.Kustfundament"/> <ga:GebiedsaanwijzingRef xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.RGKustfundament"/> </r:gebiedsaanwijzing> <r:instructieregelInstrument>http://standaarden.omgevingswet.overheid.nl/instrument/id/concept/Omgevings…</r:instructieregelInstrument> </r:Instructieregel> </ow-dc:owObject></sl:stand> let $obj3 := <sl:stand> <ow-dc:owObject xmlns:ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand" xmlns:sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <r:Instructieregel xmlns:r="http://www.geostandaarden.nl/imow/regels"> <ow:status xmlns:ow="http://www.geostandaarden.nl/imow/owobject">beëindigen</ow:status> <r:identificatie>nl.imow-mnre1034.juridischeregel.OR0000000027</r:identificatie> <r:idealisatie>http://standaarden.omgevingswet.overheid.nl/idealisatie/id/concept/Exact</r:idealisatie> <r:artikelOfLid> <r:RegeltekstRef xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.regeltekst.OR0000000027"/> </r:artikelOfLid> <r:thema>http://standaarden.omgevingswet.overheid.nl/thema/id/concept/WaterEnWatersy…</r:thema> <r:locatieaanduiding> <l:LocatieRef xmlns:l="http://www.geostandaarden.nl/imow/locatie" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebied.Kustfundament"/> </r:locatieaanduiding> <r:gebiedsaanwijzing> <ga:GebiedsaanwijzingRef xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.Kustfundament"/> <ga:GebiedsaanwijzingRef xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.RGKustfundament"/> </r:gebiedsaanwijzing> <r:instructieregelInstrument>http://standaarden.omgevingswet.overheid.nl/instrument/id/concept/Omgevings…</r:instructieregelInstrument> </r:Instructieregel> </ow-dc:owObject></sl:stand> let $stand1 := fold-left(($obj2, $obj1), (), function($stand, $obj) { local:stand-totaal($stand, $obj) }) let $stand2 := fold-left(($obj3, $obj1), (), function($stand, $obj) { local:stand-totaal($stand, $obj) }) let $stand3 := fold-left(($obj3, $obj2), (), function($stand, $obj) { local:stand-totaal($stand, $obj) }) let $stand4 := fold-left(($obj3, $obj2, $obj1), (), function($stand, $obj) { local:stand-totaal($stand, $obj) }) return ( '2 <-> 1 ==================', $stand1, '3 <-> 1 ==================', $stand2, '3 <-> 2 ==================', $stand3, '3 <-> 2 <-> 1 ============', $stand4 ) ``` The results: ``` 2 <-> 1 ================== <sl:stand xmlns:sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek"> <ow-dc:owObject xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand"> <r:Instructieregel xmlns:r="http://www.geostandaarden.nl/imow/regels"> <r:identificatie>nl.imow-mnre1034.juridischeregel.OR0000000027</r:identificatie> <r:idealisatie>http://standaarden.omgevingswet.overheid.nl/idealisatie/id/concept/Exact</r:idealisatie> <r:artikelOfLid> <r:RegeltekstRef xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.regeltekst.OR0000000027"/> </r:artikelOfLid> <r:thema>http://standaarden.omgevingswet.overheid.nl/thema/id/concept/WaterEnWatersy…</r:thema> <r:locatieaanduiding> <l:LocatieRef xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:l="http://www.geostandaarden.nl/imow/locatie" xlink:href="nl.imow-mnre1034.gebied.Kustfundament"/> </r:locatieaanduiding> <r:gebiedsaanwijzing> <ga:GebiedsaanwijzingRef xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.Kustfundament"/> <ga:GebiedsaanwijzingRef xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ga="http://www.geostandaarden.nl/imow/gebiedsaanwijzing" xlink:href="nl.imow-mnre1034.gebiedsaanwijzing.RGKustfundament"/> </r:gebiedsaanwijzing> <r:instructieregelInstrument>http://standaarden.omgevingswet.overheid.nl/instrument/id/concept/Omgevings…</r:instructieregelInstrument> </r:Instructieregel> </ow-dc:owObject> </sl:stand> 3 <-> 1 ================== 3 <-> 2 ================== 3 <-> 2 <-> 1 ============ <sl:stand xmlns:sl="http://www.geostandaarden.nl/bestanden-ow/standlevering-generiek"> <ow-dc:owObject xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ow-dc="http://www.geostandaarden.nl/imow/bestanden/deelbestand"> <r:RegelVoorIedereen xmlns:r="http://www.geostandaarden.nl/imow/regels"> <r:identificatie>nl.imow-mnre1034.juridischeregel.OR0000000027</r:identificatie> <r:idealisatie>http://standaarden.omgevingswet.overheid.nl/idealisatie/id/concept/Exact</r:idealisatie> <r:artikelOfLid> <r:RegeltekstRef xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="nl.imow-mnre1034.regeltekst.OR0000000027"/> </r:artikelOfLid> <r:thema>http://standaarden.omgevingswet.overheid.nl/thema/id/concept/WaterEnWatersy…</r:thema> <r:locatieaanduiding> <l:LocatieRef xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:l="http://www.geostandaarden.nl/imow/locatie" xlink:href="nl.imow-mnre1034.ambtsgebied.LND6030B"/> </r:locatieaanduiding> </r:RegelVoorIedereen> </ow-dc:owObject> </sl:stand> ``` -- Marco Duiker LandGoed Technisch directeur +31617115114

3 4

DBs corrupted; Unparseable date.
by France Baril 12 Dec '23

12 Dec '23

Hi, My databases are corrupted in a strange way. Everything worked yesterday and I have not upgraded my system (automatic updates are NOT set on my OS). In the WebDAV connector, all DB names except 6 appear as a date, examples: 2023-06-14T07:37:56.294Z, 2023-12-12T09:56:02.722Z. In the console, I get this error: [qtp289639718-19] INFO com.bradmcevoy.http.HttpManager - PROPFIND :: http://localhost:8972/webdav/ - http://localhost:8972/webdav/ bx_1 | Unparseable date: "app-pub-templates" bx_1 | Unparseable date: "app-pubs" bx_1 | Unparseable date: "app-tests" bx_1 | Unparseable date: "ar-eg" bx_1 | Unparseable date: "as-in" bx_1 | Unparseable date: "az-az" bx_1 | Unparseable date: "be-by" bx_1 | Unparseable date: "bg-bg" bx_1 | Unparseable date: "bn-bd" ... It seems that the names and dates of the DBs have been interchanged. I tried restoring the DBS from my backups (newer and older back ups). I also tried restarting the server. Same difference. I am using Basex 10.7 (beta) and have been for a few months. I could update BaseX to the official release, but I would prefer to upgrade with healthy DBs to avoid adding a layer of complexity to the issue. I have not had a similar problem in a decade of using BaseX, so I am a bit clueless about how else to try. Thanks in advance for your help, -- France Baril Architecte documentaire / Documentation architect france.baril(a)architextus.com

2 2

JSON number serialization
by Thompson, Timothy 10 Dec '23

10 Dec '23

Hello, Is it part of the spec that numbers in the “basic” JSON representation (of 7+ digits) be serialized using scientific notation? For example: let $direct := <json type="object"><n type="number">1339029</n></json> let $basic := <fn:map><fn:number key="n">1339029</fn:number></fn:map> let $result := ($direct, $basic) ! serialize(., map { "method": "json", "json": map { "format": if (position() eq 1) {"direct"} else {"basic"}, "indent": "yes" } }) return $result …produces two different results: { "n":1339029 } { "n":1.339029E6 } I usually prefer working with the “basic” format, but the automatic conversion to scientific notation is inconvenient because the value is not easily castable as an xs:integer. Thanks in advance, Tim -- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Yale University Library www.linkedin.com/in/timathompson<http://www.linkedin.com/in/timathompson> timathom(a)protonmail.com<mailto:timothy.thompson@yale.edu>

1 1

Using ft:search() across element boundaries: possible?
by Eliot Kimber 05 Dec '23

05 Dec '23

I’m searching for short phrases where I may want to respect order or not and where the phrases may cross element boundaries. For example, I have the phrase “Amazon Alexa Spoke” and I want to find any DITA topic whose title text includes “Amazon Alexa Spoke” in that order, or maybe I want those words in any order, depending on my search requirements. When I run this query against my database I find occurrences where all three words are in the same parent element, i.e.: <title>Create a connection record for the <ph>Amazon Alexa spoke</ph> </title> <title>Create a credential record for the <ph>Amazon Alexa spoke</ph> </title> <title>Set up the <ph>Amazon Alexa spoke</ph> </title> But I do not find it where one of the words is not in the same parent: This title is *not* found (even though this is the one I actually want to have found): <title><ph id="alexa">Amazon Alexa</ph> Spoke</title> Reading the docs on ft:search(), it is clear that it is searching on text nodes: “Returns all text nodes from the full-text index…” So I think the behavior here is as documented. Short of creating a separate database that removes the subelements within <title> elements, is there a way to use full text indexing to do the search I want? In particular, I want to be able to turn the ordered/unordered check on or off. If I always wanted ordered I could just use a regular expression match—it wouldn’t be that efficient but efficiency is not a concern in this particular case (but I can see where it would be in a more general search support situation). Or am I missing a more obvious solution to this requirement? Note that in this case I don’t care about finding different word forms—for this particular search I only care about exact word matches. Cheers, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

1 1

Code highlighting not working on Wiki
by Johan Mörén 03 Dec '23

03 Dec '23

Hi, Just discovered that the code samples on the basex wiki doesn't seem to be working fully. Noticed it a couple of days ago and thought it was temporary, but the problem is still there. Regards, Johan

2 2

Generate CSV with literal "&", not "&"
by Eliot Kimber 01 Dec '23

01 Dec '23

I’m generating CSV data that includes URLs with multiple query parameters, so “&somekey” in them. These get serialized as “&somekey” where I want “&somekey”. My CSV XML looks like this: <record> <AppID>sn_admin_center</AppID> <DocsURL>=HYPERLINK(https://docs.servicenow.com/csh?topicname=admin-center-intro&version=vancouver)</DocsURL> </record> I’m then doing: let $report := csv:serialize($csv, map{}) let $doWrite := file:write('/Users/eliot.kimber/temp/apps-to-topics.csv', $report) To write the CSV file. The resulting file looks like: sn_admin_center,”=HYPERLINK(“"https://docs.servicenow.com/csh?topicname=admin-center-intro&version=va…")" Note that the “&” is still escaped. Reviewing the docs for the CVS module and the serialize options, I don’t see any option that looks like it would control how escaping is handled. Is there a way to do what I want? Thanks, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>

2 2

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

BaseX-Talk