I am mainly interested in image, (usually jpg ), and audio (usually mp3) I dont know much about Exiftool but it seems to be a Perl library. Nothing wrong with that :-), but sounds an heavy choice to wrap in a java package?
xmlcalabash has cx:metadata-extractor extension step; for images a thin shell around Drew Noakes' library of the same namehttp://www.drewnoakes.com/code/exif/ . http://xmlcalabash.com/download/Mentioned athttp://xmlcalabash.com/download/
Mp3 is more tricky, but https://github.com/mpatric/mp3agic looks like a possible candidate to me.
/Andy
On Tue, Nov 15, 2011 at 2:48 PM, Alexander Holupirek < alexander.holupirek@uni-konstanz.de> wrote:
On 14.11.2011, at 21:48, Andy Bunce wrote:
It is the metadata extraction part that is non trivial. So packaging the libraries and calls for that sounds like a great way to
go.
/Andy
On Mon, Nov 14, 2011 at 7:22 PM, John D. Mitchell jdmitchell@gmail.com
wrote:
On Nov 14, 2011, at 11:17 , Alexander Holupirek wrote: [...]
If you also want to have the extractor functionality ... we thought
about packaging [2] it for BaseX and make it available as XQuery functions. Just give us a hint and we will get going.
++
Cheers, John
Thanks for your feedback. We decided to go for the packaging approach and to provide an EXPath package [0] in order to produce a FSML database of a given file hierarchy.
It would be interesting to hear what kind of file types are relevant for you. The idea is to have transducer code [1] that, for example, extracts ID3 information for audio files:
<file name="LockerBleiben.mp3" suffix="mp3" st_mode="0100644" st_size="4585915" st_mtime="1320945388000" st_uid="1000" st_gid="1000" st_nlink="1" bsid="70622d84-f4f7-4b90-95e2-9e1821e8d283"> <folder name="ID3v2"> <fact name="Title">Locker Bleiben</fact> <fact name="Artist">Die Fantastischen Vier</fact> <fact name="Composer">Andreas Rieke/Michael DJ Beck/Thomas Dürr/Michael B. Schmidt</fact> <fact name="Album">Lauschgift</fact> <fact name="Track">15/20</fact> <fact name="PartOfSet">1/1</fact> <fact name="Year">1995</fact> <fact name="Genre">Hip Hop/Rap</fact> <fact name="Compilation">1</fact> <fact name="Comment">(iTunPGAP) 0</fact> <fact name="EncodedBy">iTunes 8.0.2</fact> </folder> <folder name="Cover"> ... </folder>
</file>
Currently I think about using exiftool[1] by Phil Harvey to include metadata about numerous multi-media files. Extract full text and publisher metadata from PDF files, etc.
If you have something special or want to comment on this, I'm all ears.
Thanks, Alex
[0] EXPath Packaging: http://docs.basex.org/wiki/Packaging [1] Transducer coined by Gifford et.al. Semantic File System: http://dl.acm.org/citation.cfm?id=121138 [1] http://www.sno.phy.queensu.ca/~phil/exiftool/