Posted in AWS

Dynamo DB Gotchas

A few gotchas when dealing with Dynamo DB that a newcomer may not expect. DynamoDB is sold for its simplicity, high throughput and low latency. Its ideal use case is for an application that needs fast lookups on a key that will return a single item from big data sets such as for storing web site user preferences. The idea being that you have a hashkey that is distributed well across the data set. Its not particularly suited if the lookup key is not known at the time of schema design, if the lookup key value can change over time or if multiple different query attributes are needed on the same data sets. Even being aware of these limitations there some gotchas .

Provisioned Throughput and Hot Partitions

Provision throughput is split across the number of partitions for a table. What does this means in practice? If you have a table that is split across 10 partitions and have provisioned the table for 50 read units of capacity then each partition will have only 5 read units of capacity. This won’t typically be a problem if your reads are distributed evenly across the table but if you have a large number of reads on a small set of the data you can end up with a single partition receiving a disproportiate amount of traffic.

There isn’t very good monitoring on partition level states to monitor this in Amazon Cloudwatch. It’s difficult to even know how many partitions a table has. The symptom you will see is that you will start see throttled requests even though you are well below the provisioned capacity for table.

There is no way to manage the partitioning and only split the hot partition (like with Kinesis) so your easiest option is to implement a caching layer in the application to reduce frequent requests to the same rows.

Partition Splits Are One Way

Increasing capacity for read/write throughput or storage may cause dynamo db to require more partitions for a table. This will trigger a split of paritions. Reducing capacity does not reduce the number of partitions as partitions are never merged. This is a problem when trying to handle a burst of traffic. If you temporarily increase the provisioned throughput to accomodate something like a bulk load of data this will cause the partitions in dynamodb to split but reducing the capacity afterwards won’t reduce your partitions. So the number of partitions on a table is based on the maximum storage or throughput configured ever. This can be a contributing factor to the Hot Partition problem above.

See Guidelines for Working with Tables for more information on who this works.

Expensive

There has been a few blogs on the cost of dynamodb compare to other db layers. The summary is that you are effectively paying 50x the cost of Aurora or an RDS for the convenience of Dynamo DB. When you compare it to Aurora for larger data sets. In my recent projects DynamoDB as become a major portion of hosting costs.

Other references:

http://blog.cfelde.com/2015/08/amazon-aurora-performance-as-a-nosql-store/

 

Advertisements