Hi Ben,
Yes, that’s possible. Office files are simple ZIP archives, so you can create a database with ZIP parsing turned on.
If you supply a Word file to the collection() function, the document will be parsed on-the-fly. Just run the following query on the attached document:
collection('HelloWorld.docx')//text()[. contains text 'hello']
In practice, you’ll surely have to invest some more time, as an Office text string may be distributed across multiple nodes.
Best, Christian
On Tue, Jan 28, 2020 at 2:01 PM Ben Engbers Ben.Engbers@be-logical.nl wrote:
Hi,
While we were discussing possible usecases for basex, a colleague asked me if it is also possible to load libreoffice and Word documents into Basex and then perform full-text analysis on them. In essence, these are both XML files, so it should be possible.
Does anybody have experience with this?
Ben