m (→Query Change Monitoring)
|Line 59:||Line 59:|
=Pure Socket Communication=
=Pure Socket Communication=
This page documents the rough roadmap for the KDE Workspaces 4.11 release.
Nepomuk currently creates a new graph during each transaction. This results in a large number of graphs. We can reduce the number of graphs required by limiting the total number of graphs -
If two applications choose to store the same data, then that data will exist in multiple graphs at the same time.
This should result in a massive speed increase of removeDataByApp, and the overall code simplification of the data management model.
The FileWatcher currently only has one inotify backend which provides us with all the functionality that we need. Minus, the part where we need to install a large number of watches.
It might be better to add some more backends based on the features they provide, and all the users to have a mix of all of the above. Possible backends are -
One could allow the users to choose which backend combinations they want. For example - A good default might be - kio + inotify (without moves). This way the number of inotify watches would also be very low (Depending on the number of indexed directories).
Or one could even have fanotify + kio which would only require 1 fanotify watch for file creation and deletion monitoring.
The Nepomuk Cleaner currently has a horrible GUI. Considering that it is an important tool, it should be improved by doing the following -
Dividing the jobs based on -
The Data Migration task should only need to be run once. The others can be run when the user wants. It would ideally be nice to have some kind of QWizard based UI where the user could choose exactly what jobs to run, and maybe even review the data before acting on it.
Nepomuk should ship with some simple tools to control nepomuk. The most important ones are -
NepomukShow already exists in vhanda's scratch repo. It should be polished and then shipped with nepomuk-core.
Currently each Nepomuk Service is run under a nepomukservicestub executable. This makes it hard for the users to provide accurate debugging information cause the process name they see if nepomukservicestub. Also, it makes it harder to run per-service optimizations.
It might be better to move towards each service having their own process. This would have the following benefits -
Currently, all services are KApplications cause the file indexer needs to create an instance KIdleTime which requires widgets. This results in a lot of extra memory being spent in loading QWidget stuff, which is just not required.
Also, with this we can rename the nepomukserver to nepomuk_control or something. It's stupid to call something the server when it doesn't do the job of a server.
The Nepomuk shell currently does not provide much value. It should ideally help with debugging Nepomuk. That means providing a list of running queries, and logs to show which data management functions have been called.
Preferably something like this -
One can then select that individual operation to see what all data was passed into store resources and what was the result of the operation.
Currently we use a mix of dbus and a local socket for clients to communicate with the Nepomuk Storage. This is not good. Specially since dbus was not designed for data-transfer. We use it to transfer query results (QueryServiceClient) and indexing data (StoreResources).
There is already a somewhat proof of concept of this in nepomuk and soprano in the branch customCommandsOverSockets. It works, but it's messy cause this would involve both blocking on non-blocking communication over the same socket.
Maybe we should move to a completely asynchronous socket communication? How would we go about that? It would require moving away from the Soprano Model concept. Can we really do that?
Maybe we could create our own client-server communication and not use Soprano? How would that help?
The ResourceIdentifier really needs some unit tests to make sure stuff is getting identified properly. Specially from stuff like emails, series, seasons, etc. We have all noticed that it acts a little strange at times.
The File Indexer needs the following changes -
Each Plugin should be able to set its version number and indicate if those files should need to be reindexed if the plugin is updated. We can very easily do this considering that each plugin operates on a set of mimetypes.
One could store the following info in the nepomukstrigirc - [Plugins] ffmpegPlugin=0.1 taglibPlugin=0.1
When the plugin version number is updated, we remove all the indexed data for the files with those mimetypes and then reindex them.
Different plugins can support a number of mimetypes. Each plugin should give a number indicating how confident it is on handling the mimetype.
For example both taglib and ffmepg can both handle mp4 files. But ffmpeg handles them better. However, some distros do not ship ffmpeg extractors.
Okular handles a number of document types. However, it uses QWidgets which could pop up and ask for a password. We need to rewrite Okular to allow us to use its backeneds directly.
We need ship this with 4.11. What all needs to be done?
Nepomuk users never know what is going on. We need to show them detailed info of what is going on. Jorg already has some mockups, but it could user more work. We need to make a list of stuff that needs to be done.
Currently the Query Service reruns each query when relevant data changes. Re-running the entire query is not exactly practical. We could instead append this to the query -
?r nao:lastModified ?m . FILTER( ?m > DateTimeOfLastQueryExecution ) .
So that we only get query results which are after a certain time interval. This obviously doesn't handle data removal.
Data changes can be divided into 2 parts -
We can easily handle data addition with the nao:lastModified trick. And data deletion can also easily be handled. How do we handle specific data removal? I don't think this can be generically done.
One way of handling this is keeping track of which all resources had some data removed and re-running the query only against those resources - FILTER( ?r in (<..>, <...>) ) . If the list is too long then we can just re-run the entire query?