Difference between revisions of "Projects/Nepomuk/4.11"
m (→Query Change Monitoring)
|Line 59:||Line 59:|
=Pure Socket Communication=
=Pure Socket Communication=
Revision as of 15:52, 5 February 2013
This page documents the rough roadmap for the KDE Workspaces 4.11 release.
- 1 Graph Mergers
- 2 File Watcher
- 3 Cleaner GUI
- 4 Nepomuk Tools
- 5 Nepomuk Service Management
- 6 Nepomuk Shell
- 7 Pure Socket Communication
- 8 Resource Identifier Unit tests
- 9 File Indexer
- 10 Web-Miner
- 11 Controller UI
- 12 Query Change Monitoring
Nepomuk currently creates a new graph during each transaction. This results in a large number of graphs. We can reduce the number of graphs required by limiting the total number of graphs -
- 1 graph per application
- 1 discard-able graph per application
If two applications choose to store the same data, then that data will exist in multiple graphs at the same time.
This should result in a massive speed increase of removeDataByApp, and the overall code simplification of the data management model.
The FileWatcher currently only has one inotify backend which provides us with all the functionality that we need. Minus, the part where we need to install a large number of watches.
It might be better to add some more backends based on the features they provide, and all the users to have a mix of all of the above. Possible backends are -
- inotify - Provides all functionality
- fanotify - Provides new file creation / deletion
- kio - Provides file moves ONLY for kde
One could allow the users to choose which backend combinations they want. For example - A good default might be - kio + inotify (without moves). This way the number of inotify watches would also be very low (Depending on the number of indexed directories).
Or one could even have fanotify + kio which would only require 1 fanotify watch for file creation and deletion monitoring.
The Nepomuk Cleaner currently has a horrible GUI. Considering that it is an important tool, it should be improved by doing the following -
Dividing the jobs based on -
- Data Migration
- Duplicate detection
- Invalid data removal
- Data removal
The Data Migration task should only need to be run once. The others can be run when the user wants. It would ideally be nice to have some kind of QWizard based UI where the user could choose exactly what jobs to run, and maybe even review the data before acting on it.
Nepomuk should ship with some simple tools to control nepomuk. The most important ones are -
- NepomukCtl - A simple tool to start, stop and restart Nepomuk and any of its services.
- NepomukShow - A simple tool to dump the nepomuk data onto the terminal. Giving users queries is hard. It's better to have some specialized tool for this.
NepomukShow already exists in vhanda's scratch repo. It should be polished and then shipped with nepomuk-core.
Nepomuk Service Management
Currently each Nepomuk Service is run under a nepomukservicestub executable. This makes it hard for the users to provide accurate debugging information cause the process name they see if nepomukservicestub. Also, it makes it harder to run per-service optimizations.
It might be better to move towards each service having their own process. This would have the following benefits -
- Each Nepomuk service would have their own easily recognizable process.
- The service could be a KApplication or QCoreApplication depending on its needs.
Currently, all services are KApplications cause the file indexer needs to create an instance KIdleTime which requires widgets. This results in a lot of extra memory being spent in loading QWidget stuff, which is just not required.
Also, with this we can rename the nepomukserver to nepomuk_control or something. It's stupid to call something the server when it doesn't do the job of a server.
The Nepomuk shell currently does not provide much value. It should ideally help with debugging Nepomuk. That means providing a list of running queries, and logs to show which data management functions have been called.
Preferably something like this -
- StoreResources - With this graph
- AddProperty - resUri - property - value
- SetProperty - resUri - property - value
One can then select that individual operation to see what all data was passed into store resources and what was the result of the operation.
Pure Socket Communication
Currently we use a mix of dbus and a local socket for clients to communicate with the Nepomuk Storage. This is not good. Specially since dbus was not designed for data-transfer. We use it to transfer query results (QueryServiceClient) and indexing data (StoreResources).
There is already a somewhat proof of concept of this in nepomuk and soprano in the branch customCommandsOverSockets. It works, but it's messy cause this would involve both blocking on non-blocking communication over the same socket.
Maybe we should move to a completely asynchronous socket communication? How would we go about that? It would require moving away from the Soprano Model concept. Can we really do that?
Maybe we could create our own client-server communication and not use Soprano? How would that help?
Resource Identifier Unit tests
The ResourceIdentifier really needs some unit tests to make sure stuff is getting identified properly. Specially from stuff like emails, series, seasons, etc. We have all noticed that it acts a little strange at times.
The File Indexer needs the following changes -
Better Plugin System
Plugin based versioning
Each Plugin should be able to set its version number and indicate if those files should need to be reindexed if the plugin is updated. We can very easily do this considering that each plugin operates on a set of mimetypes.
One could store the following info in the nepomukstrigirc - [Plugins] ffmpegPlugin=0.1 taglibPlugin=0.1
When the plugin version number is updated, we remove all the indexed data for the files with those mimetypes and then reindex them.
Plugin mimetype priority
Different plugins can support a number of mimetypes. Each plugin should give a number indicating how confident it is on handling the mimetype.
For example both taglib and ffmepg can both handle mp4 files. But ffmpeg handles them better. However, some distros do not ship ffmpeg extractors.
Okular based Indexer
Okular handles a number of document types. However, it uses QWidgets which could pop up and ask for a password. We need to rewrite Okular to allow us to use its backeneds directly.
We need ship this with 4.11. What all needs to be done?
Nepomuk users never know what is going on. We need to show them detailed info of what is going on. Jorg already has some mockups, but it could user more work. We need to make a list of stuff that needs to be done.
Query Change Monitoring
Currently the Query Service reruns each query when relevant data changes. Re-running the entire query is not exactly practical. We could instead append this to the query -
?r nao:lastModified ?m . FILTER( ?m > DateTimeOfLastQueryExecution ) .
So that we only get query results which are after a certain time interval. This obviously doesn't handle data removal.
Data changes can be divided into 2 parts -
- Data addition
- Data Removal
- Specific data is removed
- The entire resource is removed
We can easily handle data addition with the nao:lastModified trick. And data deletion can also easily be handled. How do we handle specific data removal? I don't think this can be generically done.
One way of handling this is keeping track of which all resources had some data removed and re-running the query only against those resources - FILTER( ?r in (<..>, <...>) ) . If the list is too long then we can just re-run the entire query?