Projects/Nepomuk/FileIndexing

< Projects‎ | Nepomuk
Revision as of 17:10, 10 September 2012 by Vhanda (talk | contribs) (Videos)
Jump to: navigation, search

Nepomuk currently acts as the file indexer for the KDE platform, applications and workspaces. Even though we frequently tout that we are not just a file indexer, we need to index the files properly.

File indexing solutions

Strigi

The KDE software releases in version 4.9, currently use libstreamanalyzer to index the files. Current problems with strigi -

  • Difficult to contribute to
  • No documentation
  • Un-maintained
  • Does not reuse libraries

Lists the current status of indexing different files.

Roll our own?

File Formats

We list down all the different file formats, and which all are supported by the different file indexing solutions.

Images

  • JPEG - Use exiv - strigi also uses exiv - currently broken
  • PNG - Strigi rolls its own - detects the application name, color depth and interlace mode as well
  • GIF - there isn't much metadata
  • EXIF
  • TIFF
  • BMP
  • SVG - Strigi stores them as plain text


We just use exiv2 and cover almost everything. Plus the code would be super simple.

Videos

Strigi uses ffmpeg except for ID3, vorbis and OggS. It also has to seek through the file. Not sure what that is for.

Overall, we could just use ffmpeg for everything. It's very fast and pretty much supports all the formats.

Audio

  • MP3

Documents

  • doc
  • docx
  • odf
  • pdfs
  • epub
  • mobi
  • spreadsheet formats
  • presentation Formats
  • lyx
  • tex
  • cbz - Comic books

Archives

  • tar
  • gzip
  • whatever ..

Emails

  • There was a bug report

Text Files

  • Text files
  • Source Code

ISO images

Executable files


Content is available under Creative Commons License SA 4.0 unless otherwise noted.