Living in a virtual world

I’m at Web2Expo this week, followed by Interop next week. I’m moderating a discussion between database heavyweights: Brian Aker, Director of Technology for MySQL; Dave Campbell, Microsoft Technical Fellow behind MS SQL Server; and Matt Domo, General Manager of Amazon SimpleDB. My first question should really be, “why isn’t Oracle here?”

But with the exception of that session, everything I’m seeing is virtual. Here’s my list of ten companies with a virtualization theme you should probably know about.

  1. Elastra, whose seasoned CEO has them out of the gates in record time with a technology for provisioning whole app clusters across any grid you want.
  2. Platespin, who suck physical machines into the virtual world, and just completed acquisition by Novell.
  3. Bluelane, focusing on security within the virtual machine — since if someone owns your hypervisor, you’re a whole new kind of crispy toast.
  4. Stacksafe, who I interviewed for GigaOm out on the pier. Very interesting use of virtualization to do pre-production testing.
  5. 3Tera, who launched a major upgrade to their virtual machine management technology at Web2Expo this week.
  6. Rightscale, who make virtual machine management tools. They just closed a round from Benchmark.

But wait, you say, that’s not ten! Yep, there are four in the list that I can’t get into yet. Stay tuned. They’re all twists on essential parts of a modern application, using bits instead of atoms. And that changes everything.

Four kinds of visibility web operators need

4 kinds of visibilityMost of the web operators we talk to have some degree of visibility into what goes on within their applications. But many lack a complete picture of their site.

There are hundreds of tools available to show what’s going on with a production website. But the problems arise when people try to use the wrong tool for the job, which often leads to bad conclusions. In my experience, operational questions fall into four major categories:

  • What did my users do?
  • Could they do it?
  • Why did they do it?
  • How did they do it?

There are four classes of tool that answer these four questions. But they’re all similar enough to cause confusion. Here’s a clarification.

Continue reading “Four kinds of visibility web operators need”

The downside of getting readers

It’s been an interesting day.

Over the past few weeks, I’ve been bothering friends of mine with questions about how the Internet as we know it might die. Many of them are part of the Bitcurrent crew. We came up with a decent list, which I cleaned up and posted on GigaOm.

1500 Diggs and a lot of comments later, I have a lot of thoughts on the responses. First (though I shouldn’t be surprised) is the vitriol of Digg. It seems like any “top 10” list is instantly considered lame, while at the same time it drives huge traffic. But there was some useful dialogue in the noise, too. Continue reading “The downside of getting readers”

Feedsync, a bit more detail

I wrote a piece for GigaOm today on Microsoft’s new Feedsync, a clever blend of RSS feeds and an update system that allows you to keep changing bits of data — like an address or a calendar event — up to date.

In putting the article together, I asked Jonathan Ginter what he thought. He gave me a pretty thoughtful analysis that I didn’t have room to print in its entirety over at GigaOm; here it is.

I’d never heard of FeedSync, so I looked it up, glanced over the spec, etc. It’s an interesting idea and a neat way to leverage RSS to accomplish something different. I expect that it will do what it is designed to do quite well and has the hallmarks of a solution that might really catch on (although you can never really tell with these things). Its major flaw appears to be the inability for a publisher to know for certain whether a given subscriber has actually received all of its updates, which is a characteristic of RSS feeds in general. This forces the subscriber to do some heavy lifting on occasion. I’m not sure how this will scale if the cache being synchronized is very large. I like the fact that it is based on open protocols, unlike other cache sync solutions out there right now (e.g., Coherence, etc)

In FeedSync, the publisher must provide its current cache state as an “initial” feed. All updates on that initial state are provided as an “update” feed. Subscribers to this service start by processing the full “initial” feed and then focus on the “update” feed. However, publishers are allowed to roll updates off the update feed, making them unavailable to any subscribers that have not yet polled that feed. Consequently, subscribers are expected to notice whenever they appear to have missed something on the update feed. When this happens, they must re-process the entire initial feed – followed by the update feed – to bring them back in sync. If the cache is very large, I can only assume that processing the initial feed could be a tremendous burden on the subscriber, especially in the presence of brittle communication. Two-way sync simply forces both sides to take on both roles and increases the burden.

As for how it compares with JMS, they are founded on different business needs. In its simplest reduction, JMS delivers messages from publisher to subscriber whereas FeedSync is trying to synchronize data caches.

JMS is really just another Message-Oriented-Middleware (MOM) framework – i.e., deliver messages from point-to-point in a secure, reliable fashion (MQSeries being another example). One of the basic assumptions for MOM is that the publisher will no longer hold on to the message once it is sent. This is crucial, since a 1-to-many delivery scenario means that MOMs must have features such as internal message storage, built-in retry strategies, etc. They can natively deal with the idea that each subscriber must be dealt with separately for delivery issues without impacting the publisher. Essentially, the publisher is able to hand the message to the MOM and delete it locally without having to worry about complex retry mechanisms. Since the publisher does not keep the information, MOM solutions assume that delivery failure is a potential crisis. Finally, MOM frameworks are built around the idea that once the subscriber has the data, the publisher and the middleware no longer have it.

FeedSync is much more aligned with the notion of distributed data caches. FeedSync exposes the full data cache through its “initial” feed queue, while maintaining an “update” feed queue for changes to the state of the data cache. The subscriber is polling the sender, which is not the case in MOM frameworks, where the publisher and subscriber are often completely decoupled. Data does not disappear as it passes through a FeedSync system, as it does with MOM-oriented solutions where data is moved instead of shared. In FeedSync, delivery failure is the subscriber’s problem. Unlike MOM solutions, FeedSync has built-in protocols to handle issues related to data synchronization – e.g., merging changes, flagging collisions, deleting data items (which it calls a “tombstone”), etc. JMS simply delivers data from point A to point B with none of these built-in semantics.

JMS is similar to web services in a lot of respects – i.e., it carries documents, expects to be asynchronous, etc. Could you replace web services with FeedSync (or the reverse)? Would you want to?

I’m sure my opinions on this would draw a lot of defensive rebuttals, though, from both sides. 😉

Figured his perspective was too good to keep to myself, and a decent clarification of the subtle differences between MOM and cache updates. It’s always nice to be able to ask smarter people than yourself for opinions, then pass them off as your own.