One database instance per user
Hi, I intend to use one database for each user of the business domain (web application). Does anyone have experiences with lots of 'small' databases vs. one 'big' database regarding performance/scalability/stability? Thanks! Best regards, Erdal
Hi Erdal Depending on the data and the actual size you might be interested in this article. [1] I guess the title is self-explanatory: ‘Making a large treebank searchable online’. Instead of using a huge database, the authors (supervisors of mine) chose to distinguish a lot of small databases. This is useful because before starting your query you can already prune and only go through data that you actually need. The benchmarks that we ran (not published yet) show that the bigger your dataset, the higher the performance gain. To give you an idea: when running +- 90 queries on a corpus of 15 million sentences (in treebank form, i.e. with dependency structures) the median overall query time was 2675 seconds in the regular version of the corpus, and merely 123s in the re-organised database structure. Note that these results are not published yet, so please do not quote me from this email and wait for the publication next year. I hope it helps, or gives you some new ideas! Kind regards Bram Vanroy http://bramvanroy.be/ [1]: http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-CML... Van: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Namens Erdal Karaca Verzonden: zaterdag 1 oktober 2016 13:01 Aan: basex-talk <basex-talk@mailman.uni-konstanz.de> Onderwerp: [basex-talk] One database instance per user Hi, I intend to use one database for each user of the business domain (web application). Does anyone have experiences with lots of 'small' databases vs. one 'big' database regarding performance/scalability/stability? Thanks! Best regards, Erdal
Hi Bram, Thanks a lot for sharing your results! Looking forward to reading your final publication. Best regards, Erdal 2016-10-01 14:42 GMT+02:00 Bram Vanroy | KU Leuven < bram.vanroy1@student.kuleuven.be>:
Hi Erdal
Depending on the data and the actual size you might be interested in this article. [1] I guess the title is self-explanatory: ‘Making a large treebank searchable online’. Instead of using a huge database, the authors (supervisors of mine) chose to distinguish a lot of small databases. This is useful because before starting your query you can already prune and only go through data that you actually need. The benchmarks that we ran (not published yet) show that the bigger your dataset, the higher the performance gain.
To give you an idea: when running +- 90 queries on a corpus of 15 million sentences (in treebank form, i.e. with dependency structures) the median overall query time was 2675 seconds in the regular version of the corpus, and merely 123s in the re-organised database structure. Note that these results are not published yet, so please do not quote me from this email and wait for the publication next year.
I hope it helps, or gives you some new ideas!
Kind regards
Bram Vanroy
[1]: http://www.lrec-conf.org/proceedings/lrec2014/ workshops/LREC2014Workshop-CMLC2%20Proceedings-rev2.pdf#page=20
*Van:* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *Namens *Erdal Karaca *Verzonden:* zaterdag 1 oktober 2016 13:01 *Aan:* basex-talk <basex-talk@mailman.uni-konstanz.de> *Onderwerp:* [basex-talk] One database instance per user
Hi,
I intend to use one database for each user of the business domain (web application).
Does anyone have experiences with lots of 'small' databases vs. one 'big' database regarding performance/scalability/stability?
Thanks!
Best regards,
Erdal
participants (2)
-
Bram Vanroy | KU Leuven -
Erdal Karaca