Drizzle and Gearman from Boston MySQL User Group

12 07 2009

drizzle Drizzle

and …

I went to the Boston MySQL Meetup on Monday for a fairly crowded and lively discussion on Drizzle and Gearman. The talk was led by Eric Day and Patrick Galbraith, two very well known members of the MySQL community.

Here is a link to the presentation.

Drizzle

In short, Drizzle is a fork of the not-yet-released MySQL 6.0 and represents a reversion to first principals upon which MySQL was founded: fast, lightweight, and optimized for web applications. The genesis was a customer advisory board (CAB) meeting in 2005 where the core MySQL users felt that the database was heading in the wrong direction (meaning towards corporate needs) and was not being responsive to the needs of web app developers. They have standardized on C++ for development (MySQL is C and C++), have stripped out a bunch of unused code, and have modularized it around a microkernal architecture that should make it far easier to maintain and extend going forward.  Here is a list of the differences between Drizzle and MySQL.

One key data point is that MySQL 6 is over 1.2 million lines of C and C++ code. Drizzle has gotten that down to < 300k. The team supporting this is in the research side of Sun and is separate from the core MySQL team. It should be interesting to see how this plays out. Other facts: over 100 community contributions have been accepted to date. That’s pretty significant in a project that’s only a year old.

Drizzle could very easily become the “new MySQL” but it’s too early to tell if this will happen.

Gearman (anagrams to Manager)

gearman_stackGearman is lightweight scale-out distributed messaging service.  In some ways, it’s a bit of a cross between a message queue (e.g. IBM WebSphere MQ) and a publish & subscribe engine. Gearman enables applications to “job out” tasks to other servers in a synchronous or asynchronous manner.

cocktail-kittenGearman was created to solve the “kittens” problem at LiveJournal, where resizing images (often of kittens) on uploads was killing their servers.  They created a resizing farm and moved it off of the Web servers.  The LiveJournal application then published out the resizing work to this farm using Gearman’s Client API.  The Gearman Job Server then jobbed it out to a Gearman Worker (in this case a resizing server) which performed the image actions and notified the Job Server when it was done.

Think of Gearman a bit like Amazon’s Simple Queue Service (SQS), but open source and portable to any server environment.  It’s also really fast (the claim was 50,000 actions per second on an 8 core machine).

gearman_cluster

Gearman is also fault-tolerant.  Clients connect to a single Job Server, but fails over to another Job Server automatically if the first one is unavailable.  Workers connect to all Job Servers, so any Job Server can dispatch any task.

This is such a universally applicable capability that I’m a bit surprised that nobody has done this before.  There areo other open source message queues, including JBoss Messaging, ActiveMQ and others.  Those are targeted at replacing IBM WebSphere MQ and not necessarily web-scale applications.  They are heavier and more feature-rich, but not as fast or lightweight.

We could have used Gearman at Givvy for our charity search process and at one point I had discussed using SQS with the team.  The need to maintain and regularly update the search index (using Solr), had we achieved any market traction, would have required a level of decoupling of search from the core application.  Gearman is supposedly in use by LiveJournal, Yahoo!, Digg and others.

I can envision some interesting use-cases:

  • image resizing
  • video transcoding
  • search index updates
  • e-commerce order processing
  • data replication (database and file)
  • any type of Map Reduce-like processing (parallel operations on a large data set)
  • HPC distributed processing (risk analytics, etc.)

Actions

Information

One response

29 11 2009
ilan

“The need to maintain and regularly update the search index (using Solr), had we achieved any market traction, would have required a level of decoupling of search from the core application. ”

– I like this …

Thanks!

Leave a comment