[basex-talk] Whatever happened to DeepFS

Alexander Holupirek alexander.holupirek at uni-konstanz.de
Tue Nov 15 15:48:32 CET 2011


On 14.11.2011, at 21:48, Andy Bunce wrote:

> It is the metadata extraction part that is non trivial. 
> So packaging the libraries and calls for that sounds like a great way to go.
> 
> /Andy
> 
> On Mon, Nov 14, 2011 at 7:22 PM, John D. Mitchell <jdmitchell at gmail.com> wrote:
> On Nov 14, 2011, at 11:17 , Alexander Holupirek wrote:
> [...]
> > If you also want to have the extractor functionality ... we thought about packaging [2] it for BaseX and make it available as XQuery functions.  Just give us a hint and we will get going.
> 
> ++
> 
> Cheers,
> John

Thanks for your feedback.  We decided to go for the packaging approach and to provide an EXPath package [0] in order to produce a FSML database of a given file hierarchy.

It would be interesting to hear what kind of file types are relevant for you.
The idea is to have transducer code [1] that, for example, extracts ID3 information for audio files:

   <file name="LockerBleiben.mp3" suffix="mp3" st_mode="0100644" st_size="4585915" st_mtime="1320945388000" st_uid="1000" st_gid="1000" st_nlink="1" bsid="70622d84-f4f7-4b90-95e2-9e1821e8d283">
      <folder name="ID3v2">
        <fact name="Title">Locker Bleiben</fact>
        <fact name="Artist">Die Fantastischen Vier</fact>
        <fact name="Composer">Andreas Rieke/Michael DJ Beck/Thomas Dürr/Michael B. Schmidt</fact>
        <fact name="Album">Lauschgift</fact>
        <fact name="Track">15/20</fact>
        <fact name="PartOfSet">1/1</fact>
        <fact name="Year">1995</fact>
        <fact name="Genre">Hip Hop/Rap</fact>
        <fact name="Compilation">1</fact>
        <fact name="Comment">(iTunPGAP) 0</fact>
        <fact name="EncodedBy">iTunes 8.0.2</fact>
      </folder>
      <folder name="Cover">
	...
      </folder>
    </file>

Currently I think about using exiftool[1] by Phil Harvey to include metadata about numerous multi-media files.
Extract full text and publisher metadata from PDF files, etc.

If you have something special or want to comment on this, I'm all ears.

Thanks,
	Alex


[0] EXPath Packaging: http://docs.basex.org/wiki/Packaging
[1] Transducer coined by Gifford et.al. Semantic File System: http://dl.acm.org/citation.cfm?id=121138
[1] http://www.sno.phy.queensu.ca/~phil/exiftool/


More information about the BaseX-Talk mailing list