Understanding ACID

A common measure of data integrity is that all transactions that modify data have the ACID properties:

  • Atomic: All operations in the transaction succeed or they all fail.
  • Consistent: The state of the data complies with all constraints before and after the transaction.
  • Isolated: Concurrent transactions behave as if serialized.
  • Durable: When a transaction completes successfully, the results are persisted.

The ACID properties are not specific to relational databases, but often used in that context, mostly because the relational schemas, with their formal constraints, provide a convenient measure of consistency. The isolation property often has serious performance implications and may be relaxed in some systems that prefer high-performance and eventual consistency.

The durability property is pretty obvious. There is no point going to all the trouble if your data can't be safely persisted. There are different levels of persistence:

  • Persistence to disk: Can survive restart of the node, but no disk failure
  • Redundant memory on multiple nodes: Can survive restart of a node and disk failure, but not temporary failure of all the nodes
  • Redundant disks: Can survive the failure of a disk
  • Geo-distributed replicas: Can survive a whole data center being down
  • Backups: Cheaper to store a lot of information, but slower to restore and often lags behind real time

The atomicity requirement is also a no-brainer. Nobody likes partial changes, which can violate data integrity and break the system in unpredictable ways that are difficult to troubleshoot.