Hello,
I am a happy BaseX user.
Currently I am trying to help other teams to find right solution to their data mining from huge xml data store. They are going after postgre and probably even db2. Nor that I have any influence over their decision, but I would like to alert them about something obvious.
I just wanted to have some comments from experts here, to give me guidance on what is a major different between these relational DBs supporting native xml storages as compared to BaseX or whatever. I know postgre doesn't support xquery, but db2 does. As you all are aware, our queries won't be complicated probably even xpath might work, just amount of data would be huge. We won't even care hugely about the time taken to return results as long as it's not days.
Please advice,
- Mansi
Hi Mansi,
No one answered so far, so there is probably no simple answer to your question.
As you indicated, many relational DBMS (such as DB2 and PostgreSQL) provide support for XPath as well, but in most cases, the systems do not create additional index structures on top of the XML nodes to speed up querying. However, as the queries you presented on the mailing list are simple XPath expressions, there are probably not so many chances to benefit from values index structures anyway.
If teams in your companies are trying DB2 and PostgreSQL anyway, I'll be looking forward to hearing more about their experiences. Commercial systems often disallow their users to publish performance results (and presenting them on the list is some way of publishing them), so you can also send them to me privately. If you go for PostgreSQL, this won't be a problem of course.
Hope this helps, Christian
Hi Mansi, my 2 cents here are mainly that none of the RDBMS based solution deliver the full support of the complete stack of XML technologies to their complete power (XPath, XQuery 3.x, XSLT, XSD, eXPath , aso). As far as I remember you can hardly find HTTP server functionality with RestXQ support there too. As soon as you have to write an application or service of a respectable complexity you will nee all of those functionalities. Another point to use here, at least with BaseX, is the footprint of the application. I don't know how large DB2 is but for sure installing, managing and using PostgreSQL is several levels of complexity higher than using BaseX. Finally I also think that for implementing a really performing tree based database the relational technology is not suited and a proper datamodel is required but I don't have figures or demonstrations for that so I'm also very curious to see those performance comparison results! :-D Hope this helps somehow. Greetings, Marco.
On 28/02/2015 12:11, Christian Grün wrote:
Hi Mansi,
No one answered so far, so there is probably no simple answer to your question.
As you indicated, many relational DBMS (such as DB2 and PostgreSQL) provide support for XPath as well, but in most cases, the systems do not create additional index structures on top of the XML nodes to speed up querying. However, as the queries you presented on the mailing list are simple XPath expressions, there are probably not so many chances to benefit from values index structures anyway.
If teams in your companies are trying DB2 and PostgreSQL anyway, I'll be looking forward to hearing more about their experiences. Commercial systems often disallow their users to publish performance results (and presenting them on the list is some way of publishing them), so you can also send them to me privately. If you go for PostgreSQL, this won't be a problem of course.
Hope this helps, Christian
Hi Mansi,
adding to the excellent advice given by Christian and Marco I would like to add two points:
1. I guess you could turn the question around: Why would you store your XML data in a relational database? To me it feels the most natural thing to store XML (or in general hierarchical data) in an XML database like BaseX, to store JSON data in a json database like MongoDB and table-like data in a relational database like Oracle and DB2. Of course, there are also other technical reasons why you might want to use a more mature system like DB2, but also non-technical reasons, for example a deep knowledge of your employees with SQL.
2. Choosing a database system is not just a technical, but also a business decision. So one point for this to keep in mind: Systems like DB2 and Oracle aren't cheap, I am sure the folks at IBM and Oracle will tell you that. On the other hand, BaseX is open-source (but provides commercial support). Of course, there are also many other open-source dbms like Postgres.
Cheers Dirk
On 03/01/2015 02:36 PM, Marco Lettere wrote:
Hi Mansi, my 2 cents here are mainly that none of the RDBMS based solution deliver the full support of the complete stack of XML technologies to their complete power (XPath, XQuery 3.x, XSLT, XSD, eXPath , aso). As far as I remember you can hardly find HTTP server functionality with RestXQ support there too. As soon as you have to write an application or service of a respectable complexity you will nee all of those functionalities. Another point to use here, at least with BaseX, is the footprint of the application. I don't know how large DB2 is but for sure installing, managing and using PostgreSQL is several levels of complexity higher than using BaseX. Finally I also think that for implementing a really performing tree based database the relational technology is not suited and a proper datamodel is required but I don't have figures or demonstrations for that so I'm also very curious to see those performance comparison results! :-D Hope this helps somehow. Greetings, Marco.
On 28/02/2015 12:11, Christian Grün wrote:
Hi Mansi,
No one answered so far, so there is probably no simple answer to your question.
As you indicated, many relational DBMS (such as DB2 and PostgreSQL) provide support for XPath as well, but in most cases, the systems do not create additional index structures on top of the XML nodes to speed up querying. However, as the queries you presented on the mailing list are simple XPath expressions, there are probably not so many chances to benefit from values index structures anyway.
If teams in your companies are trying DB2 and PostgreSQL anyway, I'll be looking forward to hearing more about their experiences. Commercial systems often disallow their users to publish performance results (and presenting them on the list is some way of publishing them), so you can also send them to me privately. If you go for PostgreSQL, this won't be a problem of course.
Hope this helps, Christian
Thanks Dirk...
I spent some time looking at the data. Seems they had historical reasons to store data in XML, but its not that hierarchal. So they don't care so much about different querying technologies. We are going to import into BaseX and play around...
Will keep you posted on our experiences with posture DB.
Thanks again for all inputs. - Mansi
On Tue, Mar 3, 2015 at 5:25 AM, Dirk Kirsten dk@basex.org wrote:
Hi Mansi,
adding to the excellent advice given by Christian and Marco I would like to add two points:
- I guess you could turn the question around: Why would you store your
XML data in a relational database? To me it feels the most natural thing to store XML (or in general hierarchical data) in an XML database like BaseX, to store JSON data in a json database like MongoDB and table-like data in a relational database like Oracle and DB2. Of course, there are also other technical reasons why you might want to use a more mature system like DB2, but also non-technical reasons, for example a deep knowledge of your employees with SQL.
- Choosing a database system is not just a technical, but also a business
decision. So one point for this to keep in mind: Systems like DB2 and Oracle aren't cheap, I am sure the folks at IBM and Oracle will tell you that. On the other hand, BaseX is open-source (but provides commercial support). Of course, there are also many other open-source dbms like Postgres.
Cheers Dirk
On 03/01/2015 02:36 PM, Marco Lettere wrote:
Hi Mansi, my 2 cents here are mainly that none of the RDBMS based solution deliver the full support of the complete stack of XML technologies to their complete power (XPath, XQuery 3.x, XSLT, XSD, eXPath , aso). As far as I remember you can hardly find HTTP server functionality with RestXQ support there too. As soon as you have to write an application or service of a respectable complexity you will nee all of those functionalities. Another point to use here, at least with BaseX, is the footprint of the application. I don't know how large DB2 is but for sure installing, managing and using PostgreSQL is several levels of complexity higher than using BaseX. Finally I also think that for implementing a really performing tree based database the relational technology is not suited and a proper datamodel is required but I don't have figures or demonstrations for that so I'm also very curious to see those performance comparison results! :-D Hope this helps somehow. Greetings, Marco.
On 28/02/2015 12:11, Christian Grün wrote:
Hi Mansi,
No one answered so far, so there is probably no simple answer to your question.
As you indicated, many relational DBMS (such as DB2 and PostgreSQL) provide support for XPath as well, but in most cases, the systems do not create additional index structures on top of the XML nodes to speed up querying. However, as the queries you presented on the mailing list are simple XPath expressions, there are probably not so many chances to benefit from values index structures anyway.
If teams in your companies are trying DB2 and PostgreSQL anyway, I'll be looking forward to hearing more about their experiences. Commercial systems often disallow their users to publish performance results (and presenting them on the list is some way of publishing them), so you can also send them to me privately. If you go for PostgreSQL, this won't be a problem of course.
Hope this helps, Christian
-- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
basex-talk@mailman.uni-konstanz.de