Difference between revisions of "Projects/Nepomuk/FileIndexing"

Jump to: navigation, search
(Listed different file formats)
 
Line 1: Line 1:
Lists the current status of indexing different files -
+
Nepomuk currently acts as the file indexer for KDE. Even though we frequently tout that we are not just a file indexer, we need to index the files properly.
  
* Images
+
= File Indexing solutions =
** JPEG
+
 
** PNG
+
== Strigi ==
** GIF
+
KDE 4.9, currently uses libstreamanalyzer to index the files. Current problems with strigi -
** EXIF
+
 
** TIFF
+
* Difficult to contribute to
** BMP
+
* No documentation
** SVG
+
* Un-maintained
 +
* Does not reuse libraries
 +
 
 +
Lists the current status of indexing different files.
 +
 
 +
== Roll our own? ==
 +
 
 +
= File Formats =
 +
 
 +
We list down all the different file formats, and which all are supported by the different file indexing solutions.
 +
 
 +
== Images ==
 +
 
 +
* JPEG - Use exiv - strigi also uses exiv - currently broken
 +
* PNG - Strigi rolls its own - detects the application name, color depth and interlace mode as well
 +
* GIF - there isn't much metadata
 +
* EXIF  
 +
* TIFF
 +
* BMP
 +
* SVG - Strigi stores them as plain text
 +
 
 +
We just just use exiv2 and cover almost everything. Plus the code would be super simple.
  
 
* Videos
 
* Videos

Revision as of 12:50, 10 September 2012

Nepomuk currently acts as the file indexer for KDE. Even though we frequently tout that we are not just a file indexer, we need to index the files properly.

File Indexing solutions

Strigi

KDE 4.9, currently uses libstreamanalyzer to index the files. Current problems with strigi -

  • Difficult to contribute to
  • No documentation
  • Un-maintained
  • Does not reuse libraries

Lists the current status of indexing different files.

Roll our own?

File Formats

We list down all the different file formats, and which all are supported by the different file indexing solutions.

Images

  • JPEG - Use exiv - strigi also uses exiv - currently broken
  • PNG - Strigi rolls its own - detects the application name, color depth and interlace mode as well
  • GIF - there isn't much metadata
  • EXIF
  • TIFF
  • BMP
  • SVG - Strigi stores them as plain text

We just just use exiv2 and cover almost everything. Plus the code would be super simple.

  • Videos
  • Audio
    • MP3
  • Documents
    • doc
    • docx
    • odf
    • pdfs
    • epub
    • mobi
    • spreadsheet formats
    • Presentation Formats
    • lyx
    • tex
    • cbz - Comic books
  • Archives
    • tar
    • gzip
    • whatever ..
  • Emails
    • There was a bug report
  • Text Files
    • Text files
    • Source Code
  • ISO images
  • Executable Files

Content is available under Creative Commons License SA 4.0 unless otherwise noted.