Jump to content

KDE Linux/Delta

From KDE Community Wiki

The illusive delta updates are what this page is discussing.

Random links to start with:

A delta update is utilizing a binary delta file between two distinct versions of a file to construct the new version using only the old version and the delta. Or in our case: if you are on 1.erofs and you want to update to 2.erofs you need a delta file that bridges the gap, then the client system can take 1.erofs and apply the delta to get 2.erofs. Notable advantage is that you don't need to download the changed portions of 2.erofs. Notable disadvantage is that you need special preparation on the server and special support on the client.

A delta download is somewhat orthogonal to that and means you only download the delta. It does not necessarily have to be a binary delta from the server! This is for example how a torrent works. The file gets split into chunks, each with a checksum, the server then ships a bill of all chunks (in the case of a torrent that is information inside the torrent file itself) and if the client already has a chunk it doesn't download it again. This could technically be utilized here as well if one uses 1.erofs as "input" for a download of 2.erofs. Challenge is that there needs to be sizable chunk overlap for this to have benefit. That in turn requires the files to have a reproducibility. 1.erofs needs to not have randomized order of data and 2.erofs needs to somewhat "increment" on that. Otherwise there will be little to no chunk overlap and the entire method becomes pointless. That is unfortunately what Harald observed with our current erofs!. It's possible that chunks still exist but move about, but torrent clients probably don't account for that.

On an implementation level we have the problem that we don't support either method and neither does systemd. While there are plans to have systemd provide a more plug-able download functionality it does not even exist yet, so maybe it'd make sense to ponder some add-on software that can sit between systemd and the server to inject delta behavior.

As for generate binary deltas the following seem popular:

Upstream systemd is trying to secure funding and ideally will solve all of this for us: https://github.com/systemd/systemd/issues/33351


kde-linux-sysupdated

To implement delta downloads we've created a special daemon that sits between systemd-sysupdate and the actual remote server. https://invent.kde.org/kde-linux/kde-linux-sysupdated is where it lives. https://github.com/folbricht/desync is what it is based on. The general feature set is explained in this blog post.

It's largely a stop gap measure until systemd can provide us better anchor points to hook into the update process. The fundamental process would largely remain the same with a future systemd expansion though.

The basic idea is that the erofs is divided into tiny pieces called chunks with a checksum. The chunks are then recorded in an associated index file ending in .caibx. Given at least one index file and one erofs file on disk we can then calculate a delta download between what we have on disk and what is available on the server.

Range Requests

One particular implementation of this, and that is the currently available version, is based on HTTP range requests. The client downloads the index of the new version, and (currently) generates an ad-hoc index of all erofs versions on disk. It then simulates the construction of the new version chunk by chunk, based on what is available in the indexes. The basic idea is that all chunks that are already present locally may be picked up from the local files. All new chunks are download from the server using HTTP range requests for just that specific data set. This makes the downloads relatively small.

Content Addressable Store

A future variant of this would actually split the erofs file itself into chunks, removing the need for the complete file. All chunks would be put into a directory tree based on their checksum, called a store. If a chunk is already present it doesn't need updating. Only new chunks are added to the store.

The client works in much the same way except instead of HTTP range requests it can issue more trivial GET requests on the relevant chunk's store location. So, the client downloads the index, takes the local indexes, builds a plan on what needs picking from where as per usual. When it comes time to download, instead of issuing a range request it'd take the checksum of the chunk from the index and download the chunk at `kde-linux/sysupdate/v3/asdf/asdfghkl` using a regular GET request.

The content store itself carries no signature because the assembled file needs to match the sha256 from our SHA256SUMS file and that file in turn is signed. That is to say: the intermediate blobs need no signing because the assembled output eventually needs to meet a trust checksum.

Pluggable systemd

Ideally systemd would allow us to plug directly into the update process such that our code is actually in charge of assembling the file. This would allow us to do more efficient data copy from the existing files, plus concurrent assembling of local and remote parts (i.e. because we know how many chunks there are and where they are, we can allocate the whole file and concurrently fill chunks into it -- part of the reason why Go is used for this project - for its ease of threading)

Try it out

To try out KDE Linux's experimental delta updates feature, run:

sudo systemctl enable --now kde-linux-sysupdated.socket

Then restart, then apply updates normally using Discover or updatectl.

The team is ecpecially interested in usage feedback from people with storage disks that aren't NVMe SSDs. Like SATA SSDs, and especially hard disks.