Last week Amazon announced the addition of full database consistency as an option for SimpleDB users. Most of you know that SimpleDB is a “NoSQL” database that allows you to build very scalable Web apps without the typical scaling limitations of SQL databases. One of the limitations of SimpleDB has been the reliance on “eventual consistency” at a transaction level (see Amazon CTO Werner Vogels post on Eventually Consistent data for more details – it’s a good read. See his post re the update here).
In short, “eventually consistent” means that an update may not be reflected in the next read of that “object” and but will eventually get there. Consistency is the “C” in ACID (Atomicity, Consistency, Isolation and Durability) properties that define a proper transactional database. In shared data systems there is the CAP Theorem that states that (as Werner shares) “of the three properties of shared-data systems–data consistency, system availability, and tolerance to network partitions–only two can be achieved at any given time.”
For most of today’s distributed Web systems, the primary trade-off for achieving consistency is performance. By ensuring that writes are fully propagated across your system before allowing reads, your performance will be affected (and possibly availability in some rare instances). Okay, so we know that very large systems may make this trade off, but Amazon has in the past made eventual consistency the only model. If you wanted to enforce consistency, it used to be that you needed to use a different database solution. Not any more.
Amazon adds two features to SimpleDB for this issue. The first is Consistent Reads – where you can ensure that your data is fully up to date so no queries will return stale data. Here is a nice chart from Werner’s post on comparing the old (eventually consistent) model and the new option for a consistent read.
|Eventually consistent read||Consistent read|
|Stale reads possible
Lowest read latency
Highest read throughput
|No stale reads
Higher read latency
Lower read throughput
Conditional Put and Delete is a bit more complicated.
“Conditional Puts allow inserting or replacing values of one or more attributes for a given item if the expected consistent value of a single-valued attribute matches the specified expected value. Conditional puts are a mechanism to eliminate lost updates caused by concurrent writers writing to the same item as long as all concurrent writers use conditional updates.”
My assumption here is that you can read an attribute before doing an update, then use the read value as a condition before your update can be accepted. That way if another process jumps in before you with an update, you don’t overwrite them. You have to manage the event handling code to determine what you do next, but you get more control.
“Conditional deletes allow deleting an item, an attribute or an attribute’s value for a given item if the expected consistent value of a single-valued attribute of an item matches the specified expected value. If the current value does not match the expected value, or if the attribute is gone altogether, the delete is rejected.”
The use case would be similar to the put above. Again, you have a lot more control to avoid stomping on another delete or update that just happened…
Taken together, Amazon SimpleDB is now a more robust solution for managing databases for Web-scale applications. You can choose to favor performance and availability, or you can impose consistency through these two features. It’s a good update to a service that has seemed to lag in usage vs. other Amazon tools.