Amarok/Archives/Proposals/FuzzyFilter

From KDE Community Wiki

making the current filters work in a fuzzy way

Requirements

  1. should work on both the collection browser filter (both in flat and treeview) and the playlist filter
  2. a slider could determine the grade of fuzzyness in 3 or 4 steps (eg. exact - slightly fuzzy - very fuzzy )
  3. should work as close as possible on the current google-like filter syntax (so you can first search in an exact way, and then "widen" the search by increasing the fuzzyness)

I tried to implement a mechanism to make the existing filter work in a fuzzy way, following these requirements. For the flat view in collection browser, it now works more or less...

But now I decided to give up (not without having learned something, see below)

why I gave up

1.the filter mechanism of amarok is quite powerful (OR, AND, field specifications, exclude, greater than/less than, scope(to which fields does the filter apply)), whereas a fuzzy search as envisaged is much more simple (no AND/OR needed, no exclude, no greater than/less than, fixed scope (artist, title, album))

  1. Requirement 3: how would you interprete a complex search term such as "enya OR enia AND album:"memory of trees" score:>60?
  2. such a search term may not make sense for the normal user, but it is possible with the current filter, so it has to be interpreted in some way when operating fuzzy

2.scope: on which fields does the search operate?

  1. in the current filter that depends on the view (tree vs. flat), on the visible columns and on the filter (comment:bla)
  2. I think the scope is sometimes not even transparent for a normal user (for a power user: yes, and for complex queries it is even needed) or he has to take care of being in the correct mode and having the correct columns visible to find items
  3. a fuzzy search for its primary use case should operate on artist, title, album (maybe comment). not more, not less.

3.performance problems

  1. due to the fact that the filter is so flexible, but fuzzy filtering cannot be done with SQL constructs, in some (not even absurd) cases the database query results in 50.000 results which now have to be filtered by the fuzzy algorithm => resulted in 5-60 seconds on my 3500 songs collection (highly depending on the scope, see above)

4.implementation/documentation problems

  1. implementing it was harder than expected, although it is less than half finished (in tree view it becomes even more complex than in flat view)
  2. due to the complexity of scope and search constructs, documenting this feature would not be easy (which parts of a complex query are interpreted fuzzy on which scope...), although fuzzy search can be implemented in a really self-explanatory way

5.ordering of results

  1. when filtering, the ordering should not be affected by the filter
  2. in fuzzy mode, the results should be ordered according to their match factor

6.tree view is good for browsing, but often not ideal for presenting search results (you may have to open up nodes before you see the relevant results)

=> from all these points I learned that fuzzy search and collection/playlist filter can not be merged easily into one thing

=> my consequence (oN) would be to add fuzzy search as a distinct, self-explanatory feature (as envisaged in my first attempt)

screenshots

of the partial implementation (what a pity that those queries took around 17s...)

query parts and how to interprete them in a fuzzy way

(old, but shows some of the difficulties I had when implementing it)

  • score:>80 (ie. all filters on numerical values) => no fuzzy interpretation => exact filtering
  • -Kelly (ie. all exlude filters) => no fuzzy interpretation => exact filtering
  • OR does OR make sense in a fuzzy filter term? (technically a bit difficult because if one part is fuzzy and the other exact we can still not do the exact part in the sql statement