Archive for May, 2009

Cloud Computing Lessons

Wednesday, May 6th, 2009

Cloud computing means lots of different things, and much of it is hype. At Yieldex, we’ve been using cloud computing, specifically Amazon Web Services, as a key part of our infrastructure for the better part of a year, and we thought we’d pass on a few of our lessons learned. As you might expect, the services we use have trade-offs. If your challenge fits within the parameters, cloud computing can be a huge win, but it’s not the answer for everything.

All of these lessons are the result of the hard work of our entire engineering team, most notably Craig and Calvin. These guys are among the best in the world at scaling to solve enormous data and computation problems with a cloud infrastructure. We could not have built this company and these solutions without them.

For a startup, there are a number of compelling reasons to use a cloud infrastructure for virtually every new project. You don’t get locked into a long-term investment in hardware and data centers, it’s easy to experiment, and easy to change your mind and try a different approach. You don’t have to spend precious capital on servers and storage, wait days or weeks for them to arrive, and then spend a day or two setting them up. If your application scales horizontally, then you can scale additional customers, storage, and processing with minimal cost and time delay. All these things are touted by cloud providers, and basically boil down to: focus on your business, not your infrastructure.

Sometimes, however, you do need to focus on the infrastructure. We provide our customers with analytics and optimization based on our unique and proprietary DynamicIQ engine. Our first customer was a decent sized web property, and we were able to complete our DynamicIQ daily processing on several gigabytes of data using just one instance in less than an hour. Our next customer, however, was 10x the size. And the one after that, 10x more – hundreds of gigabytes per day. Fortunately, we had designed our DynamicIQ engine to easily parallelize across multiple instances. We spent some time learning how to start up instances, distribute jobs to them, and shut them back down again, but because we had designed the engine for this eventuality, we were able to use the cloud to cost-effectively scale to even the largest sites on the web.

We also have BusinessIQ, which is basically an application server that provides query processing and a user interface into our analytics. Initially we started with this server in the cloud too, but as we bumped up against other scalability issues, we found that the cloud doesn’t solve every problem. For example, we provide a sophisticated scenario analysis capability. To calculate a “what-if” scenario requires processing a huge amount of data in a very short time. For our larger customers, a single cloud instance did not have enough memory to perform this operation. Trying to stay true to the cloud paradigm, we implemented a distributed cache across multiple instances, but this didn’t work well because of limitations on I/O. We ended up having to go to a hybrid model, where we bought and hosted our own servers with large memory footprints, so we could provide this functionality.

We have been very happy users of the Amazon Web Services cloud, and not just because we won the award. We would not have been able to get our business of the ground with out the cost effective scalability of the Amazon infrastructure. While it’s not for every application, for the right application, it truly changes the game.