Feedsync, a bit more detail

I wrote a piece for GigaOm today on Microsoft’s new Feedsync, a clever blend of RSS feeds and an update system that allows you to keep changing bits of data — like an address or a calendar event — up to date.

In putting the article together, I asked Jonathan Ginter what he thought. He gave me a pretty thoughtful analysis that I didn’t have room to print in its entirety over at GigaOm; here it is.

I’d never heard of FeedSync, so I looked it up, glanced over the spec, etc. It’s an interesting idea and a neat way to leverage RSS to accomplish something different. I expect that it will do what it is designed to do quite well and has the hallmarks of a solution that might really catch on (although you can never really tell with these things). Its major flaw appears to be the inability for a publisher to know for certain whether a given subscriber has actually received all of its updates, which is a characteristic of RSS feeds in general. This forces the subscriber to do some heavy lifting on occasion. I’m not sure how this will scale if the cache being synchronized is very large. I like the fact that it is based on open protocols, unlike other cache sync solutions out there right now (e.g., Coherence, etc)

In FeedSync, the publisher must provide its current cache state as an “initial” feed. All updates on that initial state are provided as an “update” feed. Subscribers to this service start by processing the full “initial” feed and then focus on the “update” feed. However, publishers are allowed to roll updates off the update feed, making them unavailable to any subscribers that have not yet polled that feed. Consequently, subscribers are expected to notice whenever they appear to have missed something on the update feed. When this happens, they must re-process the entire initial feed – followed by the update feed – to bring them back in sync. If the cache is very large, I can only assume that processing the initial feed could be a tremendous burden on the subscriber, especially in the presence of brittle communication. Two-way sync simply forces both sides to take on both roles and increases the burden.

As for how it compares with JMS, they are founded on different business needs. In its simplest reduction, JMS delivers messages from publisher to subscriber whereas FeedSync is trying to synchronize data caches.

JMS is really just another Message-Oriented-Middleware (MOM) framework – i.e., deliver messages from point-to-point in a secure, reliable fashion (MQSeries being another example). One of the basic assumptions for MOM is that the publisher will no longer hold on to the message once it is sent. This is crucial, since a 1-to-many delivery scenario means that MOMs must have features such as internal message storage, built-in retry strategies, etc. They can natively deal with the idea that each subscriber must be dealt with separately for delivery issues without impacting the publisher. Essentially, the publisher is able to hand the message to the MOM and delete it locally without having to worry about complex retry mechanisms. Since the publisher does not keep the information, MOM solutions assume that delivery failure is a potential crisis. Finally, MOM frameworks are built around the idea that once the subscriber has the data, the publisher and the middleware no longer have it.

FeedSync is much more aligned with the notion of distributed data caches. FeedSync exposes the full data cache through its “initial” feed queue, while maintaining an “update” feed queue for changes to the state of the data cache. The subscriber is polling the sender, which is not the case in MOM frameworks, where the publisher and subscriber are often completely decoupled. Data does not disappear as it passes through a FeedSync system, as it does with MOM-oriented solutions where data is moved instead of shared. In FeedSync, delivery failure is the subscriber’s problem. Unlike MOM solutions, FeedSync has built-in protocols to handle issues related to data synchronization – e.g., merging changes, flagging collisions, deleting data items (which it calls a “tombstone”), etc. JMS simply delivers data from point A to point B with none of these built-in semantics.

JMS is similar to web services in a lot of respects – i.e., it carries documents, expects to be asynchronous, etc. Could you replace web services with FeedSync (or the reverse)? Would you want to?

I’m sure my opinions on this would draw a lot of defensive rebuttals, though, from both sides. 😉

Figured his perspective was too good to keep to myself, and a decent clarification of the subtle differences between MOM and cache updates. It’s always nice to be able to ask smarter people than yourself for opinions, then pass them off as your own.