Hello,
We are using BaseX 10.8 beta for XML files validation against the XSD.
Our XSDs are conformant with the XSD 1.1 that why all necessary Apache
Xerces 12.2.2 files are in the lib/custom directory.
On top of that we are using Jetty Web server by launching c:\Program Files
(x86)\BaseX\bin\basexhttp.bat
Eventually, our REST HTTP call is as follow:
string REQUESTURL =
$"http://{HOST}:{PORT}/rest?run={XQUERY}&$xml={xmlFile}&$xsd={xsdFile}";
Our XQuery validation call is as follows:
let $result := validate:xsd-report($xml, $xsd, map {
'http://apache.org/xml/features/validation/cta-full-xpath-checking': true(),
'cache': true() })
Everything is working as expected. :)
Here is a question.
===================
The validate:xsd-report(...) call is using caching via the 'cache': true()
parameter.
We are updating XSDs from time to time.
Unfortunately, the validation is not picking up a new version of the XSD
file on the file system.
It is unaware that the *.xsd file was updated.
It is still using an old XSD version that is cached in memory.
Is it possible to invalidate the XSD cache without stopping and restarting
Jetty Web server?
IMHO, it is too intrusive.
Maybe you can introduce a manual command for XSD cache invalidation in the
BaseX GUI?
Or some other mechanism to handle the scenario.
Regards,
Yitzhak Khabinsky
Hi,
trying to access https://bxfiddle.cloud.basexgmbh.de/ Chrome tells me
Your connection is not private
Attackers might be trying to steal your information from
bxfiddle.cloud.basexgmbh.de (for example, passwords, messages, or credit
cards). Learn more
NET::ERR_CERT_DATE_INVALID
bxfiddle.cloud.basexgmbh.de normally uses encryption to protect your
information. When Chrome tried to connect to bxfiddle.cloud.basexgmbh.de
this time, the website sent back unusual and incorrect credentials. This
may happen when an attacker is trying to pretend to be
bxfiddle.cloud.basexgmbh.de, or a Wi-Fi sign-in screen has interrupted
the connection. Your information is still secure because Chrome stopped
the connection before any data was exchanged.
You cannot visit bxfiddle.cloud.basexgmbh.de right now because the
website uses HSTS. Network errors and attacks are usually temporary, so
this page will probably work later.
It seems the certificate expired on Jan 2nd of this (new) year.
Anyone can fix this?
Hi all,
I'm interested in creating backups manually so I can use a different
compression algorithm. Based on the source code[1], it looks like backups
are just created by adding each file (excluding upd.basex) in the database
directory to a .zip file, so I could do the same using tar and my
compression algorithm of choice. Is my understanding correct or am I
missing some other logic?
Thanks,
Matt
[1]
https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/bas…
Hi,
Reaching out to get suggestions on improving performance.
Using basex to store and analyze around 350,000 to 500,000 XMLs.
Size of each XML varies between a few KBs to 5MB. Each day around 10k XMLs
get added/patched.
I have the following queries
1) What is the optimal size or number of documents in a DB? Earlier I had 1
DB with different collections but inserts were too slow, took more than 30s
just to replace a document. So split it up by some category to have around
30 DBs. Inserts are fine but again if there are too many documents in a
category, patching that DB slows and querying across all DBs also gets
slowed down. Any optimal number for DBs? Can I create many DBs like 1 for
every 10K XMLs? I read through
https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg06310.ht…,
having 100s of DBs cause query performance degradation? Is there any better
solution?
2) Query performance has degraded with more documents in a DB. I also
noticed that with/without token/attribute index, there is not much
difference to query performance (they are just XML attribute queries).
"Optimize" flag after inserts to recreate the index takes too much time and
memory. I am not running it now since I didn't find significant improvement
with/without index with my tests. Any suggestions for improving this?
3) Is it possible to just run queries against specific XMLs? I will have a
pre-filter based on user selection and queries need to be run against only
those XMLs. There are a number of filters users can apply and every time it
can result in a different set of XMLs against which analysis has to be
performed (Hence not feasible to create so many collections). Right now, I
am querying against all XMLs even though I am interested only in a subset
of XMLs and doing post filtering. I did go through
https://mailman.uni-konstanz.de/pipermail/basex-talk/2010-July/000495.html,
but again having a regex to include all the interested file paths(sometimes
entire set of documents) will slow it down.
Thank you,
Deepak
Hi,
My databases are corrupted in a strange way. Everything worked yesterday
and I have not upgraded my system (automatic updates are NOT set on my OS).
In the WebDAV connector, all DB names except 6 appear as a date,
examples: 2023-06-14T07:37:56.294Z, 2023-12-12T09:56:02.722Z.
In the console, I get this error:
[qtp289639718-19] INFO com.bradmcevoy.http.HttpManager - PROPFIND ::
http://localhost:8972/webdav/ - http://localhost:8972/webdav/
bx_1 | Unparseable date: "app-pub-templates"
bx_1 | Unparseable date: "app-pubs"
bx_1 | Unparseable date: "app-tests"
bx_1 | Unparseable date: "ar-eg"
bx_1 | Unparseable date: "as-in"
bx_1 | Unparseable date: "az-az"
bx_1 | Unparseable date: "be-by"
bx_1 | Unparseable date: "bg-bg"
bx_1 | Unparseable date: "bn-bd"
...
It seems that the names and dates of the DBs have been interchanged.
I tried restoring the DBS from my backups (newer and older back ups). I
also tried restarting the server. Same difference. I am using Basex 10.7
(beta) and have been for a few months. I could update BaseX to the
official release, but I would prefer to upgrade with healthy DBs to avoid
adding a layer of complexity to the issue. I have not had a similar problem
in a decade of using BaseX, so I am a bit clueless about how else to try.
Thanks in advance for your help,
--
France Baril
Architecte documentaire / Documentation architect
france.baril(a)architextus.com
Hello,
Is it part of the spec that numbers in the “basic” JSON representation (of 7+ digits) be serialized using scientific notation? For example:
let $direct := <json type="object"><n type="number">1339029</n></json>
let $basic := <fn:map><fn:number key="n">1339029</fn:number></fn:map>
let $result := ($direct, $basic) ! serialize(., map {
"method": "json", "json": map {
"format": if (position() eq 1) {"direct"} else {"basic"}, "indent": "yes"
}
})
return $result
…produces two different results:
{
"n":1339029
}
{
"n":1.339029E6
}
I usually prefer working with the “basic” format, but the automatic conversion to scientific notation is inconvenient because the value is not easily castable as an xs:integer.
Thanks in advance,
Tim
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library
www.linkedin.com/in/timathompson<http://www.linkedin.com/in/timathompson>
timathom(a)protonmail.com<mailto:timothy.thompson@yale.edu>
I’m searching for short phrases where I may want to respect order or not and where the phrases may cross element boundaries.
For example, I have the phrase “Amazon Alexa Spoke” and I want to find any DITA topic whose title text includes “Amazon Alexa Spoke” in that order, or maybe I want those words in any order, depending on my search requirements.
When I run this query against my database I find occurrences where all three words are in the same parent element, i.e.:
<title>Create a connection record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Create a credential record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Set up the <ph>Amazon Alexa spoke</ph>
</title>
But I do not find it where one of the words is not in the same parent:
This title is *not* found (even though this is the one I actually want to have found):
<title><ph id="alexa">Amazon Alexa</ph> Spoke</title>
Reading the docs on ft:search(), it is clear that it is searching on text nodes:
“Returns all text nodes from the full-text index…”
So I think the behavior here is as documented.
Short of creating a separate database that removes the subelements within <title> elements, is there a way to use full text indexing to do the search I want? In particular, I want to be able to turn the ordered/unordered check on or off.
If I always wanted ordered I could just use a regular expression match—it wouldn’t be that efficient but efficiency is not a concern in this particular case (but I can see where it would be in a more general search support situation).
Or am I missing a more obvious solution to this requirement?
Note that in this case I don’t care about finding different word forms—for this particular search I only care about exact word matches.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>
Hi,
Just discovered that the code samples on the basex wiki doesn't seem to be
working fully. Noticed it a couple of days ago and thought it was
temporary, but the problem is still there.
Regards,
Johan
I’m generating CSV data that includes URLs with multiple query parameters, so “&somekey” in them. These get serialized as “&somekey” where I want “&somekey”.
My CSV XML looks like this:
<record>
<AppID>sn_admin_center</AppID>
<DocsURL>=HYPERLINK(https://docs.servicenow.com/csh?topicname=admin-center-intro&version=vancouver)</DocsURL>
</record>
I’m then doing:
let $report := csv:serialize($csv, map{})
let $doWrite := file:write('/Users/eliot.kimber/temp/apps-to-topics.csv', $report)
To write the CSV file.
The resulting file looks like:
sn_admin_center,”=HYPERLINK(“"https://docs.servicenow.com/csh?topicname=admin-center-intro&version=va…")"
Note that the “&” is still escaped.
Reviewing the docs for the CVS module and the serialize options, I don’t see any option that looks like it would control how escaping is handled.
Is there a way to do what I want?
Thanks,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow>