Thanks guys for all expert comments. Currently, I am going experimenting performance with just deleting and inserting using Java API. If this process takes a tiny bit longer, i don't really care is what I figured :) If i becomes unacceptable, I will use one of these suggestions.
Thanks once again.
StringList databases = List.list(context) ;
String query = "" ;
for(String database : databases ) {
query = "db:list('" + database + "')" ;
try {
for (String fileName: query(query).split(" ")) {
query = "db:delete('" + database + "','" + fileName + "')" ;
if(fileName.contains(XMLFileName.split("_")[1])) {
query(query) ;
logger.info("Deleted " + fileName + " from " + database) ;
retVal = true;
break;
}
}
} catch (BaseXException e) {
e.printStackTrace();
}
}
On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari ferrari_martin@hotmail.com wrote:
I forgot one thing, I got much better performance by just calling
replace rather than delete and insert, but this is a db with more than one million records. If performance is not important, I believe either way will do.
Martín.
From: ferrari_martin@hotmail.com To: mansi.sheth@gmail.com; basex-talk@mailman.uni-konstanz.de Date: Mon, 31 Aug 2015 16:35:33 +0000 Subject: Re: [basex-talk] Finding document based on filename
Hi Mansi, I have a similar situation. I don't think there's a fast way to get documents by only knowing a part of their names. It seems you need to know the exact name. In my case, we might be able to group documents by a common id, so we might create subfolders inside the DB and store/get the contents of the subfolder directly, which is pretty fast. I've also tried indexing, but insertions got really slow (I assume maybe because indexing is not granular, it indexes all values) and we need performance.
Oh, I've also tried using starts-with() instead of contains(), but it
seems it does not pick up indexes.
Martín.
Date: Fri, 28 Aug 2015 16:52:37 -0400 From: mansi.sheth@gmail.com To: basex-talk@mailman.uni-konstanz.de Subject: [basex-talk] Finding document based on filename
Hello,
I would be having 100s of databases, with each database having 100 XML documents. I want to devise an algorithm, where given a part of XML file name, i want to know which database(s) contains it, or null if document is not currently present in any database. Based on that, add current document into the database. This is to always maintain latest version of a document in DB, and remove the older version, while adding newer version.
So far, only way I could come up with is:
for $db in all-databases: open $db $fileNames = list $db for eachFileName in $fileNames: if $eachFileName.contains(sub-xml filename): add to ret-list-db
return ret-list-db
Above algorithm, seems highly inefficient, Is there any indexing, which can be done ? Do you suggest, for each document insert, I should maintain a separate XML document, which lists each file inserted etc.
Once, i get hold of above list of db, I would be eventually deleting that file and inserting a latest version of that file(which would have same sub-xml file name). So, constant updating of this external document also seems painful (Map be ?).
Also, would it be faster, using XQUERY script files, thru java code, or using Java API for such operations ?
How do you all deal with such operations ?
- Mansi