This is a simple copy of the original Dynamic Collection wiki page. While the old page is mainly aimed at users, this page still contains the original developer information.
Amarok currently does not support tracks on removable devices very well. It assumes that all tracks which are part of the collection are available all the time and that their path does not change over time. Dynamic collection is the working title I just invented for a project to improve the way Amarok identifies the tracks in its collection (feel free to propose others). It should solve much of bug #87391.
Please note that this is only about including files which are not on a local disks in Amarok's local collection DB in a better way. It is not about actually using remote collection databases. If I understood Andrew correctly, his SoC proposal was about just that, but by now his ideas are pretty close to mine. Andrew, maybe you have time to add your thoughts to this page.
Amarok uses an audio file's absolute path as an unique identifier. That works fine as long as Amarok's whole collection is stored in a way that makes it accessible by Amarok permanently under the same path. When part of the collection are not stored that way, e.g. on external harddisks or network shares, well known problems arise:
To play a file, a valid URL pointing to that file is required. The core of my idea is a change of the way Amarok stores that URL. Each URL can be split into two parts, a first part which identifies the device a file is stored on, and a second part which gives the relative path of the file on that device.
Amarok can then automagically generate a unique identifier of the device and store that identifier together with the relative path in the collection database. All files on the same device would have the same identifier, but different relative paths. Instead of a single field as primary key Amarok would the use the device identifier and the relative path as composite primary key.
For this to work, we need ways to identify devices. Some ideas:
"Plugins" is probably the first thing which comes to one's mind when reading that list.
There are probably better terms than device which could be used here, but it is the only one which came to my mind while writing this. It can mean, among others, partitions on external hardisks, cd-roms or mounted network shares.
AmaroK can replace a file's absolute path with a device id and a relative path just before storing the song in the database, and generate an absolute path which is valid at that time from the device id and relative path right after retrieving the information from the collection database. This would make it quite transparent for the rest of the code.
It may prove sensible to eventually remove the current URL field, given it would be generated from the device ID and relative path, and would thus be redudant. This would be a large task (with little gain other than a slightly more logical DB schema) and would be best suited to doing after the new system is fully working.
In the proof-of-concept patch I sent to the amarok mailing list, the class MountPointManager encapsulates all the code to generate a device id (usually called mediaid there) and a relative path from an absolute path and vice versa. CollectionDB and QueryBuilder handle the changed DB schema as transparently as possible for the rest of the code. Generally, the existence of the dynamic collection can be hidden in the persistence layer to a very high degree.
Everywhere where Amarok uses the URL or directory return value, we have to add the device id field as return value. Additionally, when we use the URL or directory value as a filter, we have to filter using the URL/directoy field ( containing the relative path ) and the device id field. Special case: using filtering using LIKE on the URL/directory field. It is not going to work if the search string matches a part of the mount point.
We can restrict all SQL queries to the songs which are currently available by adding deviceId IN (<list of available device ids>) to the query's where clause.
AmaroK uses only the file access protocol at the moment. If I understand the news from k4m correctly, one of the devs there made it possible to play songs directly using network protocols like scp. Using an URL like scp://hostname/aDir/aSong as example, scp://hostname simply identifies the device, and aDir/aSong is the relative path on that device.
ATF associates a unique value with a file to keep track of the file even if the user moves it around/renames it outside of Amarok. At the moment, ATF uses the file's unique value to update the file's primary key, the absolute file name. So it should be no problem to use ATF with a dynamic collection by using the composite primary key instead.
Thinking about remote collections (this should probably should go somewhere further down the page):
We could add a user-definable name to each device in the database. That would allow us to show the user a message like "The file is stored on a cd-rom which is currently no mounted. Please insert and mount CD <name> and try again." The same could happen for tracks on a USB drive, network share, etc.
In my opinion, this is definitely necessary for the user to be able to keep track of all the devices. One other example of where this would necessary, aside from the above, is a device configuration/add/remove screen. --Andrewt512 09:22, 31 May 2006 (EDT)
Currently, when there are changes to tables such as the tags tables, we drop them and recreate them. This doesn't matter as everything is mounted, so we just recreate the information in the new format by scanning again.
If we had removable media devices in the collection, then it would be bad to lose the information due to a minor update of the tags table. Hence, it will be necessary to write code to upgrade after a version number bump of the tags table.
It would also be bad to lose information due to a full database rescan, which is effectively the same as above, but started by the user. In that case, we should probably only rescan the devices that are available, and leave everything else untouched.
Amarok saves playlists as m3u files and stores a file's absolute path at the time of the playlist's creation in it. That won't work with Dynamic Collection where we try to avoid storing absolute paths anywhere. I'm not sure how to solve this problem yet.
A first thought about collections might suggest that there are simply sources that can be mounted (eg Hard Drives, USB drives, CDs, Samba) and those that can't (eg iTunes' DAAP and Ampache shares, Samba without smbfs, FTP). This, however naïvely overlooks one far more important distinction: how the information is stored.
In an Ampache share, there is a database of all the music on the server. All the artists present can be listed, without having to see all the song titles. There are even IDs for every artist, album and song. It can even perform searches. In short, it has does all the hard work for us.
Compare this to a USB drive, or an FTP share, where we only get to see the files and their locations and have to get the metadata ourselves.
So, there are really four different types of collections.
Mountable with database:
Non-mountable with database:
To interoperate with the Dynamic Collection ideas above, a plugin would need to:
It is however the second point where the real difference between collections with and without databases show
Without a database, there is no option but to scan all the files (eg with taglib) to generate the metadata and to add them to Amarok's DB.
With a database, however, there are many possible ways the task can be approached all with different drawbacks:
Importing all the songs in one go into the Amarok DB by asking for the metadata of all songs from the remote DB:
Importing on demand (ie for an Artist->Album->Song view in the Collection Browser, first get the artist list, then get the album list for an artist when it is expanded. When a song is finally added to the playlist, add that song to Amarok's DB):
To integrate remote collection databases seamless into the rest of the collection, we should simply copy the whole remote database (if the network is fast enough to play music from the remote location, bandwidth is probably not an issue). It is almost certain that there will be an option to search in the whole collection instead of only the active collection (active collection: the songs which are stored on devices that are mounted/accessible at the time of the search). To give the user a consistent view on the collection, we have to import the whole remote DB into amarok's local DB: the user expects to be able to search for all the files in the remote database even if he is not connected to it, just like he is able to do with, for example, unmounted CDs or USB drives. --Mkossick 18:50, 31 May 2006 (EDT)
Personally, I favour the Importing on Demand - it has less bandwidth requirements, and I think we can assume the latency to be reasonable for a remote collection. The problem will all remote collections is that you can't tell if things have been changed. With Importing on Demand, you shrug and carry on, until you can see for certain something has been changed, then you the change yourself as best you can (realistically, you only remove things). With the Import-It-All approach, you need to resynchronise periodically. If you have lots of remote collections (I personally am on a university network - there are tens of people running iTunes), that's going to be a lot of downloading. That means, it could well feel slower to the user, especially when they first out the features (and give up because it takes 10 minutes to scan and eats half of their bandwidth whilst doing so). --Andrewt512 18:41, 10 June 2006 (EDT)
I think we need both options. For things that have a remote database that can be easily queried there is no need to make a persistent local copy of the database. Other collections might require just that however. A plugin interface was mentioned earlier and I definetely think that is the way to go. We might however need two different kinds of plugins. Some that copy all info about the collection to a local database, and some that acts as a proxy for a remote database. How this fits together architecturally however, I have no idea at the moment! --Freespirit 13:22, 12 June 2006 (EDT)
To be continued...
(Reminders of topics still to be covered: Searching remote collections)