Re: [basex-talk] Finding document based on filename

1 Sep 2015

      Thanks guys for all expert comments. Currently, I am going experimenting
performance with just deleting and inserting using Java API. If this
process takes a tiny bit longer, i don't really care is what I figured :)
If i becomes unacceptable, I will use one of these suggestions.

Thanks once again.

StringList databases =  List.list(context) ;

String query = "" ;

for(String database : databases ) {

query = "db:list('" + database + "')" ;

try {

for (String fileName: query(query).split(" ")) {

query = "db:delete('" +  database + "','" + fileName + "')" ;

if(fileName.contains(XMLFileName.split("_")[1])) {

query(query) ;

logger.info("Deleted " + fileName + " from " + database) ;

retVal = true;

break;

}

}

} catch (BaseXException e) {

e.printStackTrace();

}

}

On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari <ferrari_martin@hotmail.com>
wrote:
...
I forgot one thing, I got much better performance by just calling
replace rather than delete and insert, but this is a db with more than one
million records. If performance is not important, I believe either way will
do.
Martín.
------------------------------
From: ferrari_martin@hotmail.com
To: mansi.sheth@gmail.com; basex-talk@mailman.uni-konstanz.de
Date: Mon, 31 Aug 2015 16:35:33 +0000
Subject: Re: [basex-talk] Finding document based on filename
Hi Mansi,
     I have a similar situation. I don't think there's a fast way to get
documents by only knowing a part of their names. It seems you need to know
the exact name. In my case, we might be able to group documents by a common
id, so we might create subfolders inside the DB and store/get the contents
of the subfolder directly, which is pretty fast.
     I've also tried indexing, but insertions got really slow (I assume
maybe because indexing is not granular, it indexes all values) and we
need performance.
Oh, I've also tried using starts-with() instead of contains(), but it
seems it does not pick up indexes.
Martín.
------------------------------
Date: Fri, 28 Aug 2015 16:52:37 -0400
From: mansi.sheth@gmail.com
To: basex-talk@mailman.uni-konstanz.de
Subject: [basex-talk] Finding document based on filename
Hello,
I would be having 100s of databases, with each database having 100 XML
documents. I want to devise an algorithm, where given a part of XML file
name, i want to know which database(s) contains it, or null if document is
not currently present in any database. Based on that, add current document
into the database. This is to always maintain latest version of a document
in DB, and remove the older version, while adding newer version.
So far, only way I could come up with is:
for $db in all-databases:
      open $db
      $fileNames = list $db
            for eachFileName in $fileNames:
                   if $eachFileName.contains(sub-xml filename):
                            add to ret-list-db
return ret-list-db
Above algorithm, seems highly inefficient, Is there any indexing, which
can be done ? Do you suggest, for each document insert, I should maintain a
separate XML document, which lists each file inserted etc.
Once, i get hold of above list of db, I would be eventually deleting that
file and inserting a latest version of that file(which would have same
sub-xml file name). So, constant updating of this external document also
seems painful (Map be ?).
Also, would it be faster, using XQUERY script files, thru java code, or
using Java API for such operations ?
How do you all deal with such operations ?
- Mansi
-- 
- Mansi

Re: [basex-talk] Finding document based on filename

Mansi Sheth