Hi,
I was looking through the feature list in the issue tracker to see what's in the pipeline. I suddenly remembered a feature from an xml database I used a couple of years ago called Qizx. This had a very neat feature where every database document and collection could have a special map with metadata properties. These do not affect the XML content in any way but they can be accessed via special API calls or Qizx specific extension module.
A better explanation of this feature can be read in the Qizx manual (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 and 57).
I have used such metadata properties on nodes to implement syncing XML documents in a SCM (Subversion). I stored revision id's and other SCM control data in those properties. Authors would work in Subversion and certain directories where kept synced to a Qizx database so we could easily create PDF publications of the latest XML with zero impact on the XML itself.
Maybe BaseX already uses something like that under the hood, I don't know. If so extending it or opening it for use would be useful I think, and generally cool :-)
+1
I would find this feature useful for several similar scenarios. I want to use BaseX for querying XML documents and keep BaseX synchronized with external archives/repositories where the XML files are maintained.
I've started to implement along these lines by creating a second database to hold metadata about documents in the actual database. If there is a better option I'll switch to it.
Vincent
________________________________________ From: basex-talk-bounces@mailman.uni-konstanz.de basex-talk-bounces@mailman.uni-konstanz.de on behalf of Marc van Grootel marc.van.grootel@gmail.com Sent: Thursday, August 28, 2014 5:38 PM To: BaseX Subject: [basex-talk] db documents metadata
Hi,
I was looking through the feature list in the issue tracker to see what's in the pipeline. I suddenly remembered a feature from an xml database I used a couple of years ago called Qizx. This had a very neat feature where every database document and collection could have a special map with metadata properties. These do not affect the XML content in any way but they can be accessed via special API calls or Qizx specific extension module.
A better explanation of this feature can be read in the Qizx manual (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 and 57).
I have used such metadata properties on nodes to implement syncing XML documents in a SCM (Subversion). I stored revision id's and other SCM control data in those properties. Authors would work in Subversion and certain directories where kept synced to a Qizx database so we could easily create PDF publications of the latest XML with zero impact on the XML itself.
Maybe BaseX already uses something like that under the hood, I don't know. If so extending it or opening it for use would be useful I think, and generally cool :-)
-- --Marc
@Marc:
For BaseX 8.0, we are planning to speed up our document index, and we could possibly enrich it with some more (possibly user-specific) metadata. I have added a reference to this mailing-list thread in the correspondent GitHub issue [1].
However, I am not sure if we should extend in our existing APIs. Maybe it would be more consistent to provide an additional XQuery Module for that, or extend the Database Module. Additional metadata could be returned via db:list-details(), and we could an updating function, sth. like db:store-details(). What do you think? Any more suggestions are welcome.
@Vincent:
I've started to implement along these lines by creating a second database to hold metadata about documents in the actual database. If there is a better option I'll switch to it.
I would be interested which metadata properties you currently storing in this auxiliary database?
Thanks, Christian
[1] https://github.com/BaseXdb/basex/issues/804
I would find this feature useful for several similar scenarios. I want to use BaseX for querying XML documents and keep BaseX synchronized with external archives/repositories where the XML files are maintained.
Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de basex-talk-bounces@mailman.uni-konstanz.de on behalf of Marc van Grootel marc.van.grootel@gmail.com Sent: Thursday, August 28, 2014 5:38 PM To: BaseX Subject: [basex-talk] db documents metadata
Hi,
I was looking through the feature list in the issue tracker to see what's in the pipeline. I suddenly remembered a feature from an xml database I used a couple of years ago called Qizx. This had a very neat feature where every database document and collection could have a special map with metadata properties. These do not affect the XML content in any way but they can be accessed via special API calls or Qizx specific extension module.
A better explanation of this feature can be read in the Qizx manual (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 and 57).
I have used such metadata properties on nodes to implement syncing XML documents in a SCM (Subversion). I stored revision id's and other SCM control data in those properties. Authors would work in Subversion and certain directories where kept synced to a Qizx database so we could easily create PDF publications of the latest XML with zero impact on the XML itself.
Maybe BaseX already uses something like that under the hood, I don't know. If so extending it or opening it for use would be useful I think, and generally cool :-)
-- --Marc
Hi Christian,
That would be terrific and I think what you suggest is already sufficient. Maybe have another look at the Qizx manual or API to see what it offered. I found it a quite well-designed feature.
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern? If it's the latter I would argue that such metadata would be there precisely to handle cases (such as the SCM sync) where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app. Such an SCM sync tool, written, say in Java would then probably have to use the Java API to update the information on these nodes.
I was thinking of other use cases:
- ACL info - Sync info (eg. last-modfied date, content-type etc. so one instance can cache nodes from another without touching the content itself) - Properties that are computationally expensive to derive, calculate them in advance and then quickly search for and retrieve them (this case was also mentioned in the Qizx manual)
Of course when one has both the option of expressing the data as XML and as metadata properties you might ask yourself when to store where? But having this option is a good thing I believe.
Cheers, --Marc
On Fri, Aug 29, 2014 at 12:32 PM, Christian Grün christian.gruen@gmail.com wrote:
@Marc:
For BaseX 8.0, we are planning to speed up our document index, and we could possibly enrich it with some more (possibly user-specific) metadata. I have added a reference to this mailing-list thread in the correspondent GitHub issue [1].
However, I am not sure if we should extend in our existing APIs. Maybe it would be more consistent to provide an additional XQuery Module for that, or extend the Database Module. Additional metadata could be returned via db:list-details(), and we could an updating function, sth. like db:store-details(). What do you think? Any more suggestions are welcome.
@Vincent:
I've started to implement along these lines by creating a second database to hold metadata about documents in the actual database. If there is a better option I'll switch to it.
I would be interested which metadata properties you currently storing in this auxiliary database?
Thanks, Christian
[1] https://github.com/BaseXdb/basex/issues/804
I would find this feature useful for several similar scenarios. I want to use BaseX for querying XML documents and keep BaseX synchronized with external archives/repositories where the XML files are maintained.
Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de basex-talk-bounces@mailman.uni-konstanz.de on behalf of Marc van Grootel marc.van.grootel@gmail.com Sent: Thursday, August 28, 2014 5:38 PM To: BaseX Subject: [basex-talk] db documents metadata
Hi,
I was looking through the feature list in the issue tracker to see what's in the pipeline. I suddenly remembered a feature from an xml database I used a couple of years ago called Qizx. This had a very neat feature where every database document and collection could have a special map with metadata properties. These do not affect the XML content in any way but they can be accessed via special API calls or Qizx specific extension module.
A better explanation of this feature can be read in the Qizx manual (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 and 57).
I have used such metadata properties on nodes to implement syncing XML documents in a SCM (Subversion). I stored revision id's and other SCM control data in those properties. Authors would work in Subversion and certain directories where kept synced to a Qizx database so we could easily create PDF publications of the latest XML with zero impact on the XML itself.
Maybe BaseX already uses something like that under the hood, I don't know. If so extending it or opening it for use would be useful I think, and generally cool :-)
-- --Marc
Hi Marc,
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern?
One of the reasons is that we have quite a lot of different APIs, and it takes quite some time to provide new features in more than one API (which is often a user request if a feature turns out to be successful). This is why we tend to include new features either via BaseX commands or directly in XQuery, or in both.
Maybe we could provide additional commands and add XQuery functionality in a second step.
Some more questions:
[...] where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app.
* Do you refer to document nodes, or nodes in general? In the latter case, we could also think about binding properties to node ids.
* Would it be sufficient to use strings for keys and values?
Christian
Hi Christian,
With nodes I meant database "nodes". E.g. a database == collection == collection node and a document == document node. I wasn't talking about nodes within a document. I don't think the latter is as valuable as the former. For myself I compare this a bit to the metadata saved in a CMS where folders and documents can get metadata for organizing the documents and being able to locate/find things based on more than just the path or collection name.
Yes, I think strings for keys and values is sufficient.
Regarding keys in maps, I would've like to be able to have QNames as map keys but sadly this is not allowed.
--Marc
On Tue, Sep 2, 2014 at 1:03 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Marc,
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern?
One of the reasons is that we have quite a lot of different APIs, and it takes quite some time to provide new features in more than one API (which is often a user request if a feature turns out to be successful). This is why we tend to include new features either via BaseX commands or directly in XQuery, or in both.
Maybe we could provide additional commands and add XQuery functionality in a second step.
Some more questions:
[...] where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app.
- Do you refer to document nodes, or nodes in general? In the latter
case, we could also think about binding properties to node ids.
- Would it be sufficient to use strings for keys and values?
Christian
... was thinking if these metadata nodes would also exist for binary database resources or only for xml documents and collections.
Cheers, --Marc
On Tue, Sep 2, 2014 at 2:24 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:
Hi Christian,
With nodes I meant database "nodes". E.g. a database == collection == collection node and a document == document node. I wasn't talking about nodes within a document. I don't think the latter is as valuable as the former. For myself I compare this a bit to the metadata saved in a CMS where folders and documents can get metadata for organizing the documents and being able to locate/find things based on more than just the path or collection name.
Yes, I think strings for keys and values is sufficient.
Regarding keys in maps, I would've like to be able to have QNames as map keys but sadly this is not allowed.
--Marc
On Tue, Sep 2, 2014 at 1:03 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Marc,
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern?
One of the reasons is that we have quite a lot of different APIs, and it takes quite some time to provide new features in more than one API (which is often a user request if a feature turns out to be successful). This is why we tend to include new features either via BaseX commands or directly in XQuery, or in both.
Maybe we could provide additional commands and add XQuery functionality in a second step.
Some more questions:
[...] where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app.
- Do you refer to document nodes, or nodes in general? In the latter
case, we could also think about binding properties to node ids.
- Would it be sufficient to use strings for keys and values?
Christian
-- --Marc
It would surely be consistent to allow metadata for both binary and xml resources.
On Tue, Sep 2, 2014 at 4:45 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:
... was thinking if these metadata nodes would also exist for binary database resources or only for xml documents and collections.
Cheers, --Marc
On Tue, Sep 2, 2014 at 2:24 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:
Hi Christian,
With nodes I meant database "nodes". E.g. a database == collection == collection node and a document == document node. I wasn't talking about nodes within a document. I don't think the latter is as valuable as the former. For myself I compare this a bit to the metadata saved in a CMS where folders and documents can get metadata for organizing the documents and being able to locate/find things based on more than just the path or collection name.
Yes, I think strings for keys and values is sufficient.
Regarding keys in maps, I would've like to be able to have QNames as map keys but sadly this is not allowed.
--Marc
On Tue, Sep 2, 2014 at 1:03 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Marc,
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern?
One of the reasons is that we have quite a lot of different APIs, and it takes quite some time to provide new features in more than one API (which is often a user request if a feature turns out to be successful). This is why we tend to include new features either via BaseX commands or directly in XQuery, or in both.
Maybe we could provide additional commands and add XQuery functionality in a second step.
Some more questions:
[...] where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app.
- Do you refer to document nodes, or nodes in general? In the latter
case, we could also think about binding properties to node ids.
- Would it be sufficient to use strings for keys and values?
Christian
-- --Marc
-- --Marc
Hi Christian,
About the QNames, I was confused, just ignore my remark on that. I realize it now.
I fully agree with you but I'm not convinced that this metadata thing should be extended to all nodes. I see it as a more coarse-grained facility. But maybe others have use cases for such fine-grained metadata. Maybe better to do it like this first and see how people use it?
How do you currently think about this metadata and indexes? In Qizx I think that these properties are index so that querying on metadata is very fast.
--Marc
On Tue, Sep 2, 2014 at 4:49 PM, Christian Grün christian.gruen@gmail.com wrote:
It would surely be consistent to allow metadata for both binary and xml resources.
On Tue, Sep 2, 2014 at 4:45 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:
... was thinking if these metadata nodes would also exist for binary database resources or only for xml documents and collections.
Cheers, --Marc
On Tue, Sep 2, 2014 at 2:24 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:
Hi Christian,
With nodes I meant database "nodes". E.g. a database == collection == collection node and a document == document node. I wasn't talking about nodes within a document. I don't think the latter is as valuable as the former. For myself I compare this a bit to the metadata saved in a CMS where folders and documents can get metadata for organizing the documents and being able to locate/find things based on more than just the path or collection name.
Yes, I think strings for keys and values is sufficient.
Regarding keys in maps, I would've like to be able to have QNames as map keys but sadly this is not allowed.
--Marc
On Tue, Sep 2, 2014 at 1:03 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Marc,
Why do you hesitate about adding API access to such data? Is that a technical/complexity or more a design concern?
One of the reasons is that we have quite a lot of different APIs, and it takes quite some time to provide new features in more than one API (which is often a user request if a feature turns out to be successful). This is why we tend to include new features either via BaseX commands or directly in XQuery, or in both.
Maybe we could provide additional commands and add XQuery functionality in a second step.
Some more questions:
[...] where one, system (not an XQuery app) needs to manipulate information on nodes as well as nodes themselves that can then be used in an XQuery app.
- Do you refer to document nodes, or nodes in general? In the latter
case, we could also think about binding properties to node ids.
- Would it be sufficient to use strings for keys and values?
Christian
-- --Marc
-- --Marc
How do you currently think about this metadata and indexes? In Qizx I think that these properties are index so that querying on metadata is very fast.
I didn't look at Qizx in detail yet. How do such queries look like? Do you retrieve properties for a specific document, or do you retrieve all documents that contain a specific key or key/value combination?
Hi Christian,
I went through it's user manual to summarize some of it (wherever I said "node" before I now use "member" as Qizx manual does).
- There are system properties that every node has (nature = collection|document and path eg. /foo/bar/baz.xml). - There are custom properties. - Property keys are strings - Property values are of a type: boolean, long integer, double, string, date, a node (eg. element), any serializable Java object.
Querying on properties is fast, that's what I know (not sure how it is worked into indexes so I shouldn't speak about it's implementation :-)
Extension functions for property handling (ns xlib):
xlib:property-names($member) as xs:string* => list of property names on the member
xlib:get-property($member, $prop-name as xs:string) as item()? => value of the property
xlib:set-property($member, $name as xs:string, $value as item()) => sets a property (empty sequence clears a property)
xlib:commit() for commiting property changes. xlib:rollback()
How to use in search:
xlib:query-properties($path, prop1=true())//foo/bar
xlib:query-properties resolves into a sequence of members which can then be queried further using regular xpath.
note that the seoncd argument can be more complex xpath expressions, here's one I lifted from the manual to show it in XQuery context.
for $doc in xlib:query-properties ("/2005/propositions/*", creation-date > xs:date("2003-03-03") and x:fulltext(description, "suitable AND purpose")) return xlib:property($doc, "path")
Hope that this is useful.
--Marc
On Tue, Sep 2, 2014 at 8:41 PM, Christian Grün christian.gruen@gmail.com wrote:
How do you currently think about this metadata and indexes? In Qizx I think that these properties are index so that querying on metadata is very fast.
I didn't look at Qizx in detail yet. How do such queries look like? Do you retrieve properties for a specific document, or do you retrieve all documents that contain a specific key or key/value combination?
Hi Marc, thanks for summarizing the Qizx property features.
- There are system properties that every node has (nature =
collection|document and path eg. /foo/bar/baz.xml).
In our case, members would probably resources (raw data or xml documents). We could also provide properties for database paths -- as an alternative for collections -- but this would complicate matters as we would need to work with property hierarchies, in which local properties could override more global one, etc.
Querying on properties is fast, that's what I know (not sure how it is worked into indexes so I shouldn't speak about it's implementation :-)
Out of interest: how many documents have you been working with (thousands? millions?).
Extension functions for property handling (ns xlib):
Interesting to see that there are also some XQuery functions avaliable, so our approach wouldn't differ that much.
I am just thinking loud.. The current output of db:list-details looks as follows:
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z">file.xml</resource>
We could add user-specific properties as attributes, e.g.:
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z" creation-date="2003-03-03">file.xml</resource>
Obviously, the serialized representation of an element could get pretty bulky, and the property names would not be allowed to match existing property names.
Maybe it's better to introduce a new function that returns the following output...
<resource name="/path/to/file.xml"> <creation-date>2003-03-03</creation-date> <description>what a wonderful file</description> </resource>
Possible signatures: * db:properties($db as xs:string) as element(resource) * db:properties($db as xs:string, $path as xs:string) as element(resource)
It would then be possible to search for specific values as follows:
for $prop in db:properties("db") where $prop/creation-date = '2003-03-03' and $prop/description contains text "suitable" ftand "purpose" return $prop/@name/string()
It *could* be that performance of such queries may not be sufficient enough if we deal with lots of lots of resources, but we could introduce some custom query optimizations in a second step. We could also add a function to retrieve a single key (using exact match):
* db:property($db as xs:string, $path as xs:string, $key as xs:string) as xs:string
A second function could be used to store properties (the path would possibly need to refer to a single resource). The function would be *updating*, i.e., it would be added to the pending update list and executed after the evaluation of the query:
* db:set-property($db as xs:string, $path as xs:string, $key as xs:string, $value as xs:string)
The equivalent BaseX commands could be...
* PROP GET * PROP GET [path] * PROP GET [path] [key] * PROP SET [path] [key] [value]
It would operate on an opened database, and it could be used as follows:
* PROP GET: returns all properties of all resources * PROP GET path/to: returns all properties of resources in the specified path * PROP GET "doc 1.xml": returns all properties of a specific resource * PROP GET file.xml creation-date: returns specific property
Both the database commands and the XQuery functions could be used with the existing APIs.
Suggestions? Christian
Hi Christian,
The amount of files was rather in the thousands than the millions. So yeah, still not "big" data ;-)
I am just thinking loud.. The current output of db:list-details looks as follows:
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z">file.xml</resource>
We could add user-specific properties as attributes, e.g.:
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z" creation-date="2003-03-03">file.xml</resource>
Obviously, the serialized representation of an element could get pretty bulky, and the property names would not be allowed to match existing property names.
Maybe it's better to introduce a new function that returns the following output...
<resource name="/path/to/file.xml"> <creation-date>2003-03-03</creation-date> <description>what a wonderful file</description> </resource>
Attributes with namespaces would be another possibility but this might needlessly complicate things and some are allergic to namespaces ;-)
As one of the possible values (at least for Qizx, not sure if you want to support it for BaseX too) is nodes storing the values in attributes wouldn't be good.
I think your second suggestion is probably best (using separate resource element. Only I think that to avoid confusion with "resource" elements returned by db:list-details(), maybe it's better to call it "properties".
<properties name="/path/to/file.xml"> <creation-date>2003-03-03</creation-date> <description>what a wonderful file</description> </properties>
I would also think that it's worth mirroring the original system "properties" returned by db:list-details. So the return value from db:properties() would be, in fact, a superset of db:list-details(). The system properties being maintained by BaseX and not modifiable by the user.
Returning such an XML structure deviates from Qizx (which is fine of course, only mentioning it) in that Qizx only allows access to properties via accessor functions and through the query expressions inside xlib:query-properties().
I do feel that this is a bit clearer separation of concerns and underlining the meta-ness of these properties. Also may provide better control over access and easier to return a typed value. But I'm not sure as I'm not so familiar with how all this is implemented.
It *could* be that performance of such queries may not be sufficient enough if we deal with lots of lots of resources, but we could introduce some custom query optimizations in a second step. We could also add a function to retrieve a single key (using exact match):
- db:property($db as xs:string, $path as xs:string, $key as xs:string) as
xs:string
Okay, so from this signature I understand that you're not planning on returning different property value types. Is that right? It would be very handy if a date property would return an xs:DateTime instead of just a string and above I more or less assumed similar types as for Qizx. But if you think that's not feasible (now) ...
--Marc
Only I think that to avoid confusion with "resource" elements returned by db:list-details(), maybe it's better to call it "properties".
Makes sense.
I would also think that it's worth mirroring the original system "properties" returned by db:list-details.
I thought about this, too. Let's see.
Okay, so from this signature I understand that you're not planning on returning different property value types. Is that right?
Yes. As it's not straightforward how to represent different types with BaseX commands, and as string values can easily be converted to booleans, numbers and dates, I prefer to start with a simple approach, which may be extended in future.
Thanks for your feedback, Christian
Maybe something like the following could be a good compromise?
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z" name="file.xml"> <property name="prop1">value1</property> <property name="prop2">value2</property> </resource>
Just my two cents. M.
On 03/09/2014 15:22, Christian Grün wrote:
Only I think that to avoid confusion with "resource" elements returned by db:list-details(), maybe it's better to call it "properties".
Makes sense.
I would also think that it's worth mirroring the original system "properties" returned by db:list-details.
I thought about this, too. Let's see.
Okay, so from this signature I understand that you're not planning on returning different property value types. Is that right?
Yes. As it's not straightforward how to represent different types with BaseX commands, and as string values can easily be converted to booleans, numbers and dates, I prefer to start with a simple approach, which may be extended in future.
Thanks for your feedback, Christian
Maybe something like the following could be a good compromise?
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z" name="file.xml"> <property name="prop1">value1</property> <property name="prop2">value2</property>
</resource>
Funny, I also thought about this solution.. One advantage would be that we could use arbitrary strings as keys. However, it would make the output incompatible to previous versions (..unless we use another function for it).
More suggestions are welcome, C.
Hi Christian,
The proposed signatures make sense to me. Copied from the github ticket:
db:properties($db as xs:string) as element(properties)* db:properties($db as xs:string, $path as xs:string) as element(properties)* db:property($db as xs:string, $path as xs:string, $key as xs:string) as xs:string db:set-property($db as xs:string, $path as xs:string, $key as xs:string, $value as xs:string) as empty-sequence() PROP GET PROP GET [path] PROP GET [path] [key] PROP SET [path] [key] [value]
Possibly add signatures to allow removing a previously set property:
db:set-property($db as xs:string, $path as xs:string, $key as xs:string) as empty-sequence() PROP SET [path] [key]
Allowing arbitrary strings as keys is definitely helpful. The returned values of db:properties and PROP GET could be like:
<properties resource="/path/to/file.xml"> <property name="prop1">value1</property> <property name="prop2">value2</property> </properties>
Any number of property key-value pairs could be set by users. The resource attribute would be unmodifiable, of course.
Thinking of possible use cases, one might use something like the below example to find all documents with a specific property value:
for $a in db:properties('testdb')[properties/property[@name = 'prop1'][string() = 'value1'] return if (db:is-xml('testdb', $a/@resource)) then db:open('testdb', $a/@resource) else ()
Thanks, Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Christian Grün Sent: Wednesday, September 03, 2014 9:43 AM To: Marco Lettere Cc: BaseX Subject: Re: [basex-talk] db documents metadata
Maybe something like the following could be a good compromise?
<resource raw="false" content-type="application/xml" modified-date="2012-02-02T19:13:42.000Z" name="file.xml"> <property name="prop1">value1</property> <property name="prop2">value2</property> </resource>
Funny, I also thought about this solution.. One advantage would be that we could use arbitrary strings as keys. However, it would make the output incompatible to previous versions (..unless we use another function for it).
More suggestions are welcome, C.
Possibly add signatures to allow removing a previously set property:
db:set-property($db as xs:string, $path as xs:string, $key as xs:string)
as empty-sequence()
PROP SET [path] [key]
Thanks, I missed that! An empty string, or an empty sequence would be an alternative:
db:set-property($db , "file.xml", "version", () )
Allowing arbitrary strings as keys is definitely helpful.
I see advantages and drawbacks for both approaches. Maybe it is also sufficient to restrict the character set of keys to those of element names (i.e., NCNames)?
Thinking of possible use cases, one might use something like the below
example to find all documents with a specific property value:
for $a in db:properties('testdb')[properties/property[@name =
'prop1'][string() = 'value1']
return if (db:is-xml('testdb', $a/@resource)) then db:open('testdb',
$a/@resource) else ()
Right. Here's one (of many) more way(s) to put it:
let $db := 'testdb' let $path := db:properties($db)[prop1 = 'value1']/@resource[db:is-xml($db, .)] return db:open('testdb', $path)
C.
I moved the suggestion to an extra GitHub issue:
https://github.com/BaseXdb/basex/issues/988
Edits are welcome, Christian
On Wed, Sep 3, 2014 at 8:53 PM, Christian Grün christian.gruen@gmail.com wrote:
Possibly add signatures to allow removing a previously set property:
db:set-property($db as xs:string, $path as xs:string, $key as xs:string) as empty-sequence() PROP SET [path] [key]
Thanks, I missed that! An empty string, or an empty sequence would be an alternative:
db:set-property($db , "file.xml", "version", () )
Allowing arbitrary strings as keys is definitely helpful.
I see advantages and drawbacks for both approaches. Maybe it is also sufficient to restrict the character set of keys to those of element names (i.e., NCNames)?
Thinking of possible use cases, one might use something like the below example to find all documents with a specific property value:
for $a in db:properties('testdb')[properties/property[@name = 'prop1'][string() = 'value1'] return if (db:is-xml('testdb', $a/@resource)) then db:open('testdb', $a/@resource) else ()
Right. Here's one (of many) more way(s) to put it:
let $db := 'testdb' let $path := db:properties($db)[prop1 = 'value1']/@resource[db:is-xml($db, .)] return db:open('testdb', $path)
C.
With nodes I meant database "nodes". E.g. a database == collection == collection node and a document == document node.
Ok, thanks. In our terminology, database nodes may be of any of the six node types (document, element, text, attribute, comment, proc.-instr.). But I got your point. -- We would be even more flexible if we allowed properties for all database nodes. On the other hand, the necessity for efficient data structures would arise soon after, because it would then be possible to set millions of millions of node properties. So it surely makes sense if we first focus on the document nodes in a database.
Regarding keys in maps, I would've like to be able to have QNames as map keys but sadly this is not allowed.
QNames are actually allowed in XQuery maps:
let $key := xs:QName('x') let $map := map { $key: 'value' } return $map($key)
Do you refer to other types of maps (maybe maps in Java)?
C.
Hi Christian,
The properties I'm storing/planning for in my ancillary database are:
- dateTime the source document was loaded to BaseX - sha1 hash of the source document - used in determining if the source document has changed and should be replaced in BaseX - identifiers assigned by our content management system and archive - path to the source document - filename of the source document
These properties could be stored using strings for keys and values.
An extension to db:list-details(), with a method like db:store-details(),to allow setting and retrieving user-defined properties would work. A more extensive set of features as Marc described based on Qizx would also work and could support a larger variety of cases.
The ability to access these methods via a Java API or the BaseXClient API would be useful. Although, presumably a simple wrapper could be employed with the existing APIs to access the XQuery methods for querying and setting properties.
Thanks, Vincent
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Friday, August 29, 2014 6:33 AM To: Lizzi, Vincent Cc: Marc van Grootel; BaseX Subject: Re: [basex-talk] db documents metadata
@Marc:
For BaseX 8.0, we are planning to speed up our document index, and we could possibly enrich it with some more (possibly user-specific) metadata. I have added a reference to this mailing-list thread in the correspondent GitHub issue [1].
However, I am not sure if we should extend in our existing APIs. Maybe it would be more consistent to provide an additional XQuery Module for that, or extend the Database Module. Additional metadata could be returned via db:list-details(), and we could an updating function, sth. like db:store-details(). What do you think? Any more suggestions are welcome.
@Vincent:
I've started to implement along these lines by creating a second database to hold metadata about documents in the actual database. If there is a better option I'll switch to it.
I would be interested which metadata properties you currently storing in this auxiliary database?
Thanks, Christian
[1] https://github.com/BaseXdb/basex/issues/804
I would find this feature useful for several similar scenarios. I want to use BaseX for querying XML documents and keep BaseX synchronized with external archives/repositories where the XML files are maintained.
Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de basex-talk-bounces@mailman.uni-konstanz.de on behalf of Marc van Grootel marc.van.grootel@gmail.com Sent: Thursday, August 28, 2014 5:38 PM To: BaseX Subject: [basex-talk] db documents metadata
Hi,
I was looking through the feature list in the issue tracker to see what's in the pipeline. I suddenly remembered a feature from an xml database I used a couple of years ago called Qizx. This had a very neat feature where every database document and collection could have a special map with metadata properties. These do not affect the XML content in any way but they can be accessed via special API calls or Qizx specific extension module.
A better explanation of this feature can be read in the Qizx manual (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 and 57).
I have used such metadata properties on nodes to implement syncing XML documents in a SCM (Subversion). I stored revision id's and other SCM control data in those properties. Authors would work in Subversion and certain directories where kept synced to a Qizx database so we could easily create PDF publications of the latest XML with zero impact on the XML itself.
Maybe BaseX already uses something like that under the hood, I don't know. If so extending it or opening it for use would be useful I think, and generally cool :-)
-- --Marc
Hi Vincent,
I think that all your properties could be represented with the proposed extension in my last mail. Thinking forward, it would only be consequent to store dateTime as a default property. Right now, we have the modified-date property, but right now this one is identical for all XML resources.
Christian
I've summarized the proposed extensions in GitHub: https://github.com/BaseXdb/basex/issues/804
On Wed, Sep 3, 2014 at 1:48 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Vincent,
I think that all your properties could be represented with the proposed extension in my last mail. Thinking forward, it would only be consequent to store dateTime as a default property. Right now, we have the modified-date property, but right now this one is identical for all XML resources.
Christian
basex-talk@mailman.uni-konstanz.de