Congrats on the latest version! Looking forward as usual to exploring the new features.

However, I'm perplexed by the decision to remove the text parser from the codebase. I understand the desire to streamline and remove dependencies related to lower-value features, but I've always found the text parser to be super useful. After installing Basex 10.8 beta today, I had to refactor a process (parsing a set of interview transcripts generated by Zoom) that involved creating a DB from a directory of text files.

In addition, I noticed some unexpected results in how the text was parsed using standard methods. In BaseX 10.6, using the text parser in the GUI, the output looks like this:
<text>WEBVTT

1
00:00:02.910 --&gt; 00:00:27.240
...
</text>
Here, each line end is just a newline character (\n).

Using file:read-text or fn:unparsed-text (in 10.6 and 10.8 beta), the output looks like this:
<text>WEBVTT&#xD;
&#xD;
1&#xD;
00:00:02.910 --&gt; 00:00:27.240&#xD;
...
</text>
Here, each line end also has a carriage return (\r).

And if instead, I store it as an XQuery value, I see the newline characters that aren't otherwise displayed in the GUI:
"WEBVTT&#xD;&#xA;&#xD;&#xA;1&#xD;&#xA;00:00:02.910 --> 00:00:27.240&#xD;&#xA;..."
So, the text parser seems to have done some normalization, which was also helpful.

Any chance that it could be restored (by popular demand) in version 11? :)

Best regards,
Tim


--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library

El vie, 4 ago 2023 a la(s) 06:55, Christian Grün (christian.gruen@gmail.com) escribió:
Dear all,

We’re pleased to announce version 10.7 of BaseX, of our XML framework: https://basex.org.

The new release is a big step forward towards BaseX 11:

• We have added numerous new operators, functions and features of XQuery 4.
• The GUI editor and result view now provide full support for Unicode characters.
• Font rendering has been improved (you can tweak it in the Font Dialog)
• More BaseX 11 preview features available (see docs.basex.org)
• Various bug fixes (web:forward, job:eval, main-memory documents)

Have fun,
Your BaseX Team