Profil de MikaMika's Digital GardenBlog Outils Aide

Blog


7 août

Generic Diff Format

A few years ago I stumbled across this specification from W3C, the Generic Diff Format or GDIFF for short. It's a specification that defines a binary format for describing the difference between two streams of binary data. Since any data such as strings, images etc. can be represented as a stream of binary data (i.e. a stream of bytes), GDIFF can be used to represent the difference between any two streams of data.

For example, if you have two streams of data, B1 and B2, you can use GDIFF to produce the difference between these two streams of data to get D. When the algorithm that produces the difference is optimal, the size of D should be much smaller than either B1 or B2, unless B1 and B2 are totally different.

Usage Scenarios

Probably the most useful area for GDIFF would be different kinds of versioning applications, such as document and content management systems, source code storage etc. In such systems you could apply GDIFF to store different versions of the same document using reverse delta, where you store only the latest version of a document in full. To get to older versions, you need to get the full version (the latest) and then apply one GDIFF to that.

For instance, you have a document with4 versions, v1 through v4. To get to those different versions, you apply the following logic.

  • v4: This is the only version that is stored in full, so you simply get the version
  • v3: Get v4 and apply one GDIFF to v4 to produce v3.
  • v2: Get v4 and apply the GDIFF that produces v3. Then take the GDIFF for v2 and apply it to v3.
  • v1: Get v4 and apply the GDIFF that produces v3. Then take the GDIFF for v2 and apply it to v3. Finally, take the GDIFF for v1 and apply it to v2.

When you store a new version (v5) of this document, you only have to do the following things:

  1. Save the full version of v5.
  2. Compute the GDIFF between v5 and v4 and replace the full version v4 with only the GDIFF for v4.

Older versions do not have to be modified.

The Implementation

A few years back when I first discovered the GDIFF spec I tried for a while to create code that would produce GDIFFs. At that time I did not get anything done that was worth saving, probably because I did not put my mind into it properly. However, a couple of weeks ago I was talking with one of my colleagues and this topic just came up again, and I thought why not give it a go one more time.

The result including source code can be found on CodePlex at http://www.codeplex.com/GDIFF. Please feel free to download the source code and give your input to it using either the Issue Tracker or the dicussion board. If you feel like joining the development, please send me a line and I'll add you to the project as a contributor.

Code Examples

I've created a separate page on the project wiki where I'm writing examples on how to use the GDIFF library in your code. Please have a look at the page here.

Commentaires

Veuillez patienter...
Le commentaire entré est trop long. Raccourcissez-le.
Vous n'avez rien entré. Réessayez.
Il est actuellement impossible d'ajouter votre commentaire. Réessayez plus tard.
Pour ajouter un commentaire, tu dois avoir l'autorisation de tes parents. Demander l'autorisation
Tes parents ont désactivé les commentaires.
Il est actuellement impossible de supprimer votre commentaire. Réessayez plus tard.
Vous avez dépassé le nombre maximal de commentaires qu'il est possible d'envoyer le même jour. Réessayez dans 24 heures.
Votre compte a pu laisser les commentaires désactivés parce que nos systèmes indiquent que vous risquez d'arroser d'autres utilisateurs de messages. Si vous pensez que votre compte a été désactivé par erreur, contactez l'assistance en ligne de Windows Live.
Effectuez la vérification de sécurité ci-dessous pour finaliser l'envoi de votre commentaire.
Les caractères entrés pour la vérification de sécurité doivent correspondre à ceux de l'image ou du fichier audio.

Pour ajouter un commentaire, connectez-vous avec votre identifiant Windows Live ID (si vous utilisez Messenger ou Xbox LIVE, vous avez un identifiant Windows Live ID). Connectez-vous


Vous n'avez pas d'identifiant Windows Live ID ? Inscrivez-vous

Rétroliens

L'URL de rétrolien de ce billet est :
http://novembernight.spaces.live.com/blog/cns!B172437163E55D7F!260.trak
Blogs Web qui font référence à ce billet
  • Aucune