Amazon DynamoDB

Amazon DynamoDB is a fast and flexible nonrelational database service for any scale. DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases to AWS so that they don’t have to worry about hardware provisioning, setup and configuration, throughput capacity planning, replication, software patching, or cluster scaling. Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

  • Using DynamoDB, users can create database tables that can store and retrieve any amount of data and serve any level of request traffic.
  • DynamoDB lets users offload the administrative burdens of operating and scaling a distributed database
  • DynamoDB provisions hardware setup, configuration, replication, software patching, and cluster scaling.
  • DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data. 
  • Users can scale up or scale down your tables’ throughput capacity without downtime or performance degradation.
  • Using AWS Management Console, users can monitor resource utilization and performance metrics.
Amazon DynamoDB

Amazon DynamoDB Benefits

DynamoDB supports some of the world’s largest scale applications by providing consistent, single-digit millisecond response times at any scale. Users can build applications with virtually unlimited throughput and storage. DynamoDB global tables replicate data across multiple AWS Regions, which enables fast, local access to data for users globally distributed applications. DynamoDB Accelerator (DAX) provides a fully managed in-memory cache enables faster access with microsecond latency.

DynamoDB is serverless such that there is no servers to provision, patch, or manage and no software to install, maintain, or operate. DynamoDB automatically scales tables up and down to adjust for capacity and maintain performance. Availability and fault tolerance are built in, that eliminate the need to architect applications for these capabilities. DynamoDB provides both provisioned and on-demand capacity modes so that users can optimize costs by specifying capacity per workload.

DynamoDB supports ACID transactions to enable users to build business-critical applications at scale. It encrypts all data by default and provides fine-grained identity and access control on all your tables. Users can create full backups of hundreds of terabytes of data instantly with no performance impact to the tables, and recover to any point in time in the preceding 35 days with no downtime. By exporting DynamoDB table data to the data lake in Amazon S3, users can perform analytics at any scale. 

When reading data from DynamoDB, users can specify whether they want the read to be eventually consistent or strongly consistent. The eventual consistency option maximizes users read throughput, but it might not reflect the results of a recently completed write. DynamoDB also gives users the flexibility and control to request a strongly consistent read if they application, or an element of the application, requires it. A strongly consistent read returns a result that reflects all writes that received a successful response before the read.

Amazon DynamoDB Features

Amazon DynamoDB Accelerator (DAX)

Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for Amazon DynamoDB that delivers up to a 10 times performance improvement—from milliseconds to microseconds—even at millions of requests per second.

  • DynamoDB + DAX takes performance to the next level with response times in microseconds for millions of requests per second for read-heavy workloads. With DAX, users applications remain fast and responsive, even when a popular event or news story drives unprecedented request volumes your way.
  • DAX is tightly integrated with DynamoDB—users provision a DAX cluster, use the DAX client SDK to point the existing DynamoDB API calls at the DAX cluster, and let DAX handle the rest. Because the retrieval of cached data reduces the read load on existing DynamoDB tables, users be able to reduce their provisioned read capacity and lower overall operational costs.
  • DAX automates many common administrative tasks such as failure detection, failure recovery, and software patching.
  • Users can use Amazon Identity and Access Management (IAM) to assign unique security credentials to each user and control each user’s access to services and resources. With Amazon CloudWatch users can gain systemwide visibility into resource utilization, application performance, and operational health. With AWS CloudTrail, users can easily log and audit changes to cluster configuration. DAX supports Amazon VPC for secure and easy access from the existing applications. 
Amazon DynamoDB global tables

Global tables build on the global Amazon DynamoDB footprint to provide users with a fully managed, multi-region, and multi-master database that delivers fast, local, read and write performance for massively scaled, global applications. Global tables replicate users DynamoDB tables automatically across their choice of AWS Regions.

  • Global tables enable users applications to stay highly available even in the unlikely event of isolation or degradation of an entire Region.
  • Multi-master replication ensures that updates performed in any Region are propagated to other Regions, and that data in all Regions is eventually consistent.
  • Global tables enable users to read and write the data locally providing single-digit-millisecond latency for the globally distributed application at any scale.
  • Users can select the Regions where they need data replicated, and DynamoDB handles the rest. Applications access global tables via the existing DynamoDB APIs and endpoints.
  • Global tables can help applications stay available and high performing for business continuity. If a single AWS Region becomes isolated or degraded, users application can redirect to a different Region and perform reads and writes against a different replica table.
  • Any changes made to any item in any replica table are replicated to all the other replicas within the same global table. In a global table, a newly written item is usually propagated to all replica tables within a second. With a global table, each replica table stores the same set of data items.
NoSQL Workbench

NoSQL Workbench for Amazon DynamoDB is a cross-platform client-side application for modern database development and operations and is available for Windows, macOS, and Linux. NoSQL Workbench is a unified visual tool that provides data modeling, data visualization, and query development features to help design, create, query, and manage DynamoDB tables.

  • Data modeling: With NoSQL Workbench for DynamoDB, users can build new data models from, or design models based on, existing data models that satisfy the application’s data access patterns. Users also can import and export the designed data model at the end of the process.
  • Data visualization: The data model visualizer provides a canvas where users can map queries and visualize the access patterns (facets) of the application without having to write code. Every facet corresponds to a different access pattern in DynamoDB. Users can manually add data to the data model or import data from MySQL.
  • Operation building: NoSQL Workbench provides a rich graphical user interface for users to develop and test queries. Users can use the operation builder to view, explore, and query datasets. They can also use the structured operation builder to build and perform data plane operations. The tool supports projection and condition expressions, and lets generate sample code in multiple languages.
Point-in-time recovery (PITR)

Amazon DynamoDB enables users to back up their table data continuously by using point-in-time recovery (PITR). When users enable PITR, DynamoDB backs up the table data automatically with per-second granularity so that it can be restored to any given second in the preceding 35 days. PITR helps protect against accidental writes and deletes. By using PITR, users can back up tables with hundreds of terabytes of data, with no impact on the performance or availability of the production applications. Users also can recover PITR-enabled DynamoDB tables that were deleted in the preceding 35 days, and restore tables to their state just before they were deleted.

  • Easy to use: Built into the DynamoDB console, users can enable PITR, or create, restore, and delete backups easily with a single click. Users can fully automate creation, retention, restoration, and deletion of backups via APIs.
  • Fully managed: PITR backups are automatically encrypted and catalogued, easily discoverable, and retained until the user explicitly delete them.
  • Fast and scalable: Users can enable PITR on tables of any size, and restore from a backup to a new table across AWS Regions to help meet multi-regional compliance and regulatory requirements, and to develop a disaster recovery and business continuity plan.
  • No performance impact: PITR does not consume any provisioned table capacity and has no impact on the performance or availability of the production applications.

DynamoDB Components

Tables, items, and attributes are the core components of DynamoDB. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility. DynamoDB Streams enables users to capture data modification events in DynamoDB tables.

  • Tables:- DynamoDB stores data in tables, and a table is a collection of data. 
  • Items – Each table contains zero or more items. An item is a group of attributes that is uniquely identifiable among all of the other items. Items in DynamoDB are similar in many ways to rows, records, or tuples in other database systems. There is no limit to the number of items customers can store in a table.
  • Attributes – Each item is composed of one or more attributes. An attribute is a fundamental data element, something that does not need to be broken down any further.
    • Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.

Primary Key:- The primary key uniquely identifies each item in the table, so that no two items can have the same key. DynamoDB supports two different kinds of primary keys:

  • Partition key:- A simple primary key, composed of one attribute known as the partition key.
  • DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
  • Each primary key attribute must be a scalar (meaning that it can hold only a single value). The only data types allowed for primary key attributes are string, number, or binary.

Partition key and sort key:- Referred to as a composite primary key, because it is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.

  • DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. 
  • All items with the same partition key value are stored together, in sorted order by sort key value.

Secondary Index:– A secondary index lets customers query the data in the table using an alternate key, in addition to queries against the primary key. DynamoDB supports two kinds of indexes:

  • Global secondary index:- An index with a partition key and sort key that can be different from those on the table.
  • Local secondary index:- An index that has the same partition key as the table, but a different sort key.
  • Each table in DynamoDB has a limit of 20 global secondary indexes (default limit) and 5 local secondary indexes per table.

DynamoDB Streams:- DynamoDB Streams is an optional feature that captures data modification events in DynamoDB tables. The data about these events appear in the stream in near-real time, and in the order that the events occurred, and each event is represented by a stream record. When a stream on a table is enabled, DynamoDB Streams writes a stream record whenever one of the following events occurs:

  • A new item is added to the table: The stream captures an image of the entire item, including all of its attributes.
  • An item is updated: The stream captures the “before” and “after” image of any attributes that were modified in the item.
  • An item is deleted from the table: The stream captures an image of the entire item before it was deleted.

Schemaless Web-scale 

DynamoDB is schemaless Web-scale applications, including social networks, gaming, media sharing, and Internet of Things (IoT). Every table must have a primary key to uniquely identify each data item, but there are no similar constraints on other non-key attributes. DynamoDB can manage structured or semistructured data, including JSON documents.

Users can use the AWS Management Console or the AWS CLI to work with DynamoDB and perform ad hoc tasks. Applications can use the AWS software development kits (SDKs) to work with DynamoDB using object-based, document- centric, or low-level interfaces.

  • DynamoDB is optimized for compute, so performance is mainly a function of the underlying hardware and network latency. As a managed service, DynamoDB insulates.

DynamoDB is designed to scale out using distributed clusters of hardware. This design allows increased throughput without increased latency. Customers specify their throughput requirements, and DynamoDB allocates sufficient resources to meet those requirements. There are no upper limits on the number of items per table, nor the total size of that table.

Amazon DynamoDB global tables provide a fully managed solution for deploying a multiregion, multi-master database, without having to build and maintain  replication solutions. With global tables customers can specify the AWS Regions where they want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.

    • DynamoDB global tables are ideal for massively scaled applications with globally dispersed users. 
    • Global tables provide automatic multi-master replication to AWS Regions worldwide, that enable customers to deliver low-latency data access to their users no matter where they are located.
    • Transactional operations provide atomicity, consistency, isolation, and durability (ACID) guarantees only within the region where the write is made originally. Transactions are not supported across regions in global tables.

Migrate to Amazon DynamoDB

Amazon DynamoDB is a NoSQL database with performance at any scale. Migrating to DynamoDB can save time and resources compared to other databases that require local provisioning and maintenance. DynamoDB is a fully managed service, so users do not have to perform tasks such as cluster management, operating system patching, and security updates. 

Indexing: When creating a table in Amazon DynamoDB, users need to specify the primary key of the table. The primary key uniquely identifies each item in the table so that no two items can have the same key. DynamoDB supports two kinds of primary keys:

  • Partition key – A simple primary key, composed of one attribute known as the partition key.
  • Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.

DynamoDB provides fast access to items in a table by specifying primary key values. However, to allow efficient access to data with attributes other than the primary key, many applications might benefit from having one or more secondary (or alternate) keys available. To address this scenario, DynamoDB supports two types of secondary indexes:

  • Global secondary index – An index with a partition key and a sort key that can be different from those on the base table
  • Local secondary index– An index that has the same partition key as the base table, but a different sort key

Users can create up to five local secondary indexes when creating a table, each referencing the same partition key as the base table and a range key. Users can also create up to five global secondary indexes with either a hash key or a hash key and a range key using attributes other than the item’s primary key. 

Queries: Amazon DynamoDB provides the following three operations for retrieving data from a table:

  • GetItem – Retrieves a single item from a table by its primary key.
  • Query – Retrieves all items that have a specific partition key. In addition, users can provide a filter condition to the sort key or to any other fields within the table and retrieve only a subset of the data.
  • Scan – Retrieves all items in the specified table. It provides more flexibility in defining filter conditions on any field in the table, but it might be costly or time-consuming because it scans the entire contents of the table.

In DynamoDB, users perform Query operations directly on the index; it is always advised to design the schema to leverage indexes for efficient lookup. Here are some of the database that can be migrated to  DynamoDB:

  1. Migrating from MongoDB to DynamoDB 
  2. Migrating from MySQL to DynamoDB
  3. Migrating from Cassandra to DynamoDB
  4. Migrating from an RDBMS to DynamoDB 

#01

MongoDB

 

AWS DMS supports migration from a MongoDB collection as a source to a DynamoDB table as a target. AWS DMS supports the MongoDB migration in two modes:

  • Document mode: In this mode, AWS DMS migrates all the JSON data into a single column named “_doc” in the target DynamoDB table.
  • Table mode: In this mode, AWS DMS scans a specified number of documents in the MongoDB database and creates a sample schema with all the keys and their types. During migration, users can use the object mapping feature in AWS DMS to transform the original data from MongoDB to the desired structure in DynamoDB.

To perform a database migration, AWS DMS connects to the source MongoDB database, reads the source data, transforms the data for consumption by the target DynamoDB tables, and loads the data into the DynamoDB tables. For a sharded collection, MongoDB distributes documents across shards using the shard key. To migrate a sharded collection from a sharded cluster, users need to migrate each shard separately.

Following are the high-level tasks involved in migrating data from a MongoDB sharded cluster to DynamoDB using AWS DMS:

  • Prepare the MongoDB cluster for migration.
  • Create the replication server.
  • Create the source MongoDB endpoint and the target DynamoDB endpoint.
  • Create and start the replication tasks to migrate data between the MongoDB cluster and DynamoDB.

#02

MySQL

 

 

Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS. However, migrations can have two common issues:

  • Service outage due to downtime, especially when customer service must be seamlessly available 24/7/365
  • Different key design between RDBMS and DynamoDB

There are two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.

  1. Use AWS DMS: AWS DMS supports migration to a DynamoDB table as a target. Users can use object mapping to restructure original data to the desired structure of the data in DynamoDB during migration.
  1. Use EMR, Amazon Kinesis, and Lambda with custom scripts: Consider this method when more complex conversion processes and flexibility are required. Fine-grained user control is needed for grouping MySQL records into fewer DynamoDB items, determining attribute names dynamically, adding business logic programmatically during migration, supporting more data types, or adding parallel control for one big table.

AWS Database Migration Service (AWS DMS) can migrate users data to and from most widely used commercial and open-source databases. It supports homogeneous and heterogeneous migrations between different database platforms.

Amazon EMR is a managed Hadoop framework that helps you process vast amounts of data quickly. Build EMR clusters easily with preconfigured software stacks that include Hive and other business software.

Amazon Kinesis can continuously capture and retain a vast amount of data such as transaction, IT logs, or clickstreams for up to 7 days.

AWS Lambda helps you run users code without provisioning or managing servers. Users code can be automatically triggered by other AWS services such Amazon Kinesis Streams.

Cassandra offers robust support for clusters spanning multiple datacenters. Highly scalable and high in performance. Cassandra has peer to peer distributed system across its nodes and data is distributed among all the nodes in a cluster. Each node is independent and interconnected to other nodes.

The serverless provisioning model of DynamoDB eliminates the need to overprovision database infrastructure and is provided without the need for specialized resourcing or licensing. As a result, DynamoDB-backed applications run with as much as a 70 percent total cost of ownership savings when compared to Cassandra.

Popular Cassandra features and third-party tools such as Transparent Data Encryption, multiple data center replication, and backup and restore are simplified with DynamoDB. Global tablespoint-in-time recovery, and encryption at rest provide developers similar functionality to what Cassandra offers. However, these capabilities have push-button implementation without overhead or downtime.

To offload the migration load from primary Cassandra cluster, and to ensure necessary data consistency for additional migration processing, users can create a new on-premises or Amazon EC2 Cassandra data center. To migrate data from a Cassandra cluster to a DynamoDB target:

  1. Roll out a new Cassandra data center using the AWS SCT Clone Data Center Wizard, or prepare and use the data center on your own.
  2. Extract the data from the existing or newly cloned Cassandra cluster by using data extraction agents, the AWS SCT, and AWS DMS tasks.

Data extraction is carried out directly from binary .db files with the Cassandra driver and data extraction agents. The following are the main benefits of this approach:

  • Users can use multiple data extraction agents as nodes to expedite the data extraction process.
  • Access is required to file systems only (there is no need for the Cassandra cluster to be active).

During the data extraction process, data is extracted into .csv files, and metadata is stored in table-mapping and task-setting JSON files, which AWS DMS tasks use.

#03

Cassandra

 

Amazon DynamoDB pricing

DynamoDB is a fast and flexible nonrelational database service for any scale. DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases to AWS so that they don’t have to worry about hardware provisioning, setup and configuration, throughput capacity planning, replication, software patching, or cluster scaling. Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

  • Using DynamoDB, users can create database tables that can store and retrieve any amount of data and serve any level of request traffic.
  • DynamoDB lets users offload the administrative burdens of operating and scaling a distributed database
  • DynamoDB provisions hardware setup, configuration, replication, software patching, and cluster scaling.
  • DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data. 
  • Users can scale up or scale down your tables’ throughput capacity without downtime or performance degradation.
  • Using AWS Management Console, users can monitor resource utilization and performance metrics.
Pricing for On-Demand Capacity

With on-demand capacity mode, users pay per request for the data reads and writes application performs on the tables. Users do not need to specify how much read and write throughput they expect application to perform as DynamoDB instantly accommodates the workloads as ramp up or down. 

Read request unit: API calls to read data from your table are billed in read request units. DynamoDB read requests can be either strongly consistent, eventually consistent, or transactional. A strongly consistent read request of up to 4 KB requires one read request unit. For items larger than 4 KB, additional read request units are required. For items up to 4 KB in size, an eventually consistent read request requires one-half read request unit, and a transactional read request requires two read request units. 

Write request unit: API calls to write data to the table are billed in write request units. A standard write request unit can write an item up to 1 KB. For items larger than 1 KB, additional write request units are required. A transactional write requires two write request units. 

Replicated write request unit: When using DynamoDB global tables, your data is written automatically to multiple AWS Regions of your choice. Each write occurs in the local Region as well as the replicated Regions.

Streams read request unit: Each GetRecords API call to DynamoDB Streams is a streams read request unit. Each streams read request unit can return up to 1 MB of data.

Transactional read/write requests: In DynamoDB, a transactional read or write differs from a standard read or write because it guarantees all operations contained in a single transaction set succeed or fail as a set.

Change data capture units: DynamoDB can capture item-level changes in users DynamoDB tables and replicate them to other AWS services such as Amazon Kinesis Data Streams and AWS Glue Elastic Views. DynamoDB captures these changes as delegated operations, which means DynamoDB performs the replication on users behalf. DynamoDB charges one change data capture unit for each write to the table (up to 1 KB). For items larger than 1 KB, additional change data capture units are required.

Pricing for Provisioned Capacity

With provisioned capacity mode, users specify the number of data reads and writes per second that require for the application. Users can use auto scaling to automatically adjust the table’s capacity based on the specified utilization rate to ensure application performance while reducing costs. 

Read capacity unit (RCU): Each API call to read data from your table is a read request. Read requests can be strongly consistent, eventually consistent, or transactional. For items up to 4 KB in size, one RCU can perform one strongly consistent read request per second. Items larger than 4 KB require additional RCUs. For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second. Transactional read requests require two RCUs to perform one read per second for items up to 4 KB.

Write capacity unit (WCU): Each API call to write data to users table is a write request. For items up to 1 KB in size, one WCU can perform one standard write request per second. Items larger than 1 KB require additional WCUs. Transactional write requests require two WCUs to perform one write per second for items up to 1 KB. 

Replicated write capacity unit (rWCU): When using DynamoDB global tables, users data is written automatically to multiple AWS Regions of their choice. Each write occurs in the local Region as well as the replicated Regions.

Streams read request unit: Each GetRecords API call to DynamoDB Streams is a streams read request unit. Each streams read request unit can return up to 1 MB of data.

Transactional read/write requests: In DynamoDB, a transactional read or write differs from a standard read or write because it guarantees that all operations contained in a single transaction set succeed or fail as a set.

Change data capture units: DynamoDB can capture item-level changes in users DynamoDB tables and replicate them to other AWS services such as Amazon Kinesis Data Streams and AWS Glue Elastic Views. DynamoDB captures these changes as delegated operations, which means DynamoDB performs the replication on your behalf. DynamoDB charges one change data capture unit for each write to the table (up to 1 KB). For items larger than 1 KB, additional change data capture units are required.

Amazon DynamoDB Best Practices 

Amazon DynamoDB

In database certain key table design decisions heavily influence overall query performance. The design choices that the customers make also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process queries. To avoid all this customers need to apply the best Practices presented by AWS for optimizing query performance. Here are some of them

  • Choose the Best Sort Key Amazon Redshift stores:- AWS customers data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans.
  • Choose the Best Distribution Style:- When  customers execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is run.
  • Define Primary Key and Foreign Key Constraints:- Define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans.
  • Use Date/Time Data Types for Date Columns:- Amazon Redshift stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Use the DATE or TIMESTAMP data type, depending on the resolution you need, rather than a character type when storing date/time information.
  • Use a COPY Command to Load Data:- The COPY command loads data in parallel from Amazon S3, Amazon EMR, Amazon DynamoDB, or multiple data sources on remote hosts. COPY loads large amounts of data much more efficiently than using INSERT statements, and stores the data more effectively as well.
  • Split Load Data into Multiple Files:- The COPY command loads the data in parallel from multiple files, dividing the workload among the nodes customers cluster. The number of files should be a multiple of the number of slices in cluster
  • Compress Data Files:- individually compress load files using gzip, lzop, bzip2, or Zstandard for large datasets.

Amazon DynamoDB is a fast and flexible nonrelational database service for any scale. DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases to AWS so that they don’t have to worry about hardware provisioning, setup and configuration, throughput capacity planning, replication, software patching, or cluster scaling. Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

  • Using DynamoDB, users can create database tables that can store and retrieve any amount of data and serve any level of request traffic.
  • DynamoDB lets users offload the administrative burdens of operating and scaling a distributed database
  • DynamoDB provisions hardware setup, configuration, replication, software patching, and cluster scaling.
  • DynamoDB also offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data. 
  • Users can scale up or scale down your tables’ throughput capacity without downtime or performance degradation.
  • Using AWS Management Console, users can monitor resource utilization and performance metrics.