Diff, Merge and Patch your Models with Helios

Ok, you're stuck at home, you are one of the numerous budget shortcuts victims ? You did not had the chance to come at EclipseCon ? Here is some kind of transcript of the talk I just gave:

This talk will tackle team-working with models. Once you use models in your development proces, they matters as much as the source code. Don't you want to be able to diff, merge or even patch your models just like with text files ?

The good news is that unlike text files models have a semantic structure defined thanks to their ecore model, as such we're able to semantically compare the models, comparing the serialization (XMI or other..) is often meaningless.

By the way I'm the project lead of EMF compare, the project has been contributed in Eclipse in early 2007, at that time many EMF adopters realized that this piece was missing in the Modeling ecosystem and this lack was often a blocker !

So here we are, three years later. EMF Compare - in the EMF Technology project at first- graduated and is now part of the EMF project itself !

Just like Transaction, Validation or CDO, Compare is one of the many pieces you can reuse as a framework, or just as a tool. Its focus is quite narrow : comparing, merging and patching any kind of EMF model, the later being an UML model or a domain specific one.

As we graduated we've been focusing on keeping stable API you can rely on. We really think that EMF popularity is highly due to the fact that depending on it is easy as it is completely forward compatible. Working nicely as a pure Java jar library is another key asset of EMF, we tried to stick to that for the Compare project: our framework can be used as a Java jar, not depending on Equinox or any extension point.

We could phrase the Eclipse IDE spirit in : be extensible, be customizable, be integrated. We are sticking to this motto too, you can extend or customize any part of the comparison process.

The compare and merge features are completely integrated with the Eclipse Team API. When you launch a comparison from the workspace or from an history, if the file is in fact a model, EMF Compare will be opened and will show you the differences, allowing you to merge, or switch back to the serialization diff.

Let's have a look on the tool through a demo. This demo goes higher and higher in coolness, as such it's starting by comparing an old fashioned UML model on a dying CVS Repository.

A bit more cool : comparing a domain specific model on a SVN repository.

Total coolness : comparing an XText DSLsemantically, merging it, on top of a GIT repository !

That's just the tip of the iceberg, EMF Compare has a few more features and is especially useful in a lot of contexts, rather than listing all these details I'll focus now on the inside, revealing you which kind of magic make this clock ticking.

As I said at the beginning, the good news comparing models is that we've got semantic information we're not comparing plain text. There is a drawback though: models are graph and as such being able to match similar graphs is a complex and tricky problem.

The first thing we have to do for a comparison is to match the elements from both versions of the model.

If you've got ID's that part is trivial, EMF Compare will use your ID's (either business one or technical ones). On the other hand if you don't, we are providing what we call the "generic match engine", this engine uses a few statistical metrics to match the elements.

For a given element this engine will extract it's type information, the content values, its relations with other elements and its name if we can detect one. Each piece of this extraction will be compared with other elements to compute a similarity coefficient, from this one we can try to get closer and closer to the perfect match.

Once this engine has done it's job, it provides a Match Model grouping all this information and weaving the other models.

It gets more complicated, (and then more interesting) when using source control management systems. Then you have to match three versions of a model: yours, the remote one, and the common ancestor between those versions.

To do so we builds two match models, between your local version and the common ancestor, then between the remote version and the common ancestor, and we combine those two match models into one, weaving the three models altogether.

At this stage it should be obvious that the faster the match engine is, the faster you'll get a result.

To be honest the generic match engine is not so fast, having little clue about the models it's matching it spends a lot of time browsing the structure, trying to match things which probably have no possibilities of being the same..

Being aware of that we eased the definition of your own match engine specific to your Ecore model. In doing so no doubt you'll get better results and way faster.

Let's take a step back. What are we trying to do ?

We are trying to change two versions of a graph into a set of events, in reality we are trying to re-construct "a posteriori" the history of the graph: what changes have been made to transform the original one to the new one.

Match computation is done by the Match Engine, the Diff one by the Diff Engine. This processor has to provide a Diff Model from a Match Model.

In fact, when you have the MatchModel, deducing the DiffModel is not a huge task, you basically have to browse the matched elements, checking for changed attributes and reference, and then create for each "unmatched element" the corresponding deletion or addition event.

Here again, you can plug in your own diff engine, and you can even define your own diffs specific to your formalism. Instead of having a "stock value changed from 12 to 34" event you can define yours as being "stock value has been increased from 12 to 34" and even aggregate several atomic diffs in a single top-level one.

Being a first class model itself, the diff model can be leveraged through model to model or model to text transformation to publish the changes to another format.

Now you should have a basic understanding of what EMF compare is trying to solve and in which way. We've seen that stability both in term of API and code was our primary goal right now but does that means nothing new is being done in Compare ?

For Helios we fixed many issues thanks to the community feedback, support for fragmented models and matching of referenced resources has been greatly improved.

The primary feedback is bug reports, but we also had quite a few contributions among those a new API to scope the matching process and a whole new set of plugins to create model independent diffs resilent to transformations in the model you want to apply the diff on.

So many things to discuss in such a short time frame.

Give it a try, EMF Compare is part of the Eclipse Modeling Platform SDK, download the package and you're done.

I would be happy to discuss with you, either IRL or through electronic means. Please uses the EMF newsgroup, the bugzilla or the #eclipse-modeling IRC channel on freenode. We're also available and new trendy channels like Twitter : @bruncedric.

Want's more ? Have a try downloading the Eclipse Modeling Package !

Cédric Brun

Diff, Merge and Patch your Models with Helios

You might also enjoy (View all posts)