Amazon Glacier

Amazon Glacier is an extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival. With Amazon Glacier, users can reliably store their data for as little as $0.07 per gigabyte per month. Amazon Glacier enables users to offload the administrative burdens of operating and scaling storage to AWS, so that they don’t have to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time-consuming hardware migrations.

Users store data in Amazon Glacier as archives. An archive can represent a single file or combination of several files to be uploaded as a single archive. Retrieving archives from Amazon Glacier requires the initiation of a job. Users organize the archives in vaults. Users control access to the vaults using the AWS Identity and Access Management (IAM) service. Amazon Glacier is designed for use with other Amazon Web Services. Amazon S3 allows users to seamlessly move data between Amazon S3 and Amazon Glacier using data lifecycle policies. Users can also use AWS Import/Export to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport.

  • Organizations are using Amazon Glacier to support a number of use cases. These include archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement.
  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived. Amazon Glacier jobs typically complete in 3 to 5 hours.
  • To keep costs low yet suitable for varying retrieval needs, Amazon S3 Glacier provides three options for access to archives, from a few minutes to several hours, and S3 Glacier Deep Archive provides two access options ranging from 12 to 48 hours.
Amazon Glacier

Amazon Glacier Benefits

Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores data across multiple physical Availability Zones that are geographically separated within an AWS Region before returning SUCCESS on uploading archives. Unlike traditional systems, which can require laborious data verification and manual repair, Amazon Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.

Amazon Glacier scales to meet growing and often unpredictable storage requirements. A single archive is limited to 40 TB in size, but there is no limit to the total amount of data users can store in the service. Whether storing petabytes or gigabytes, Amazon Glacier automatically scales the storage up or down as needed. Amazon S3 Glacier and S3 Glacier Deep Archive are designed to be the lowest cost Amazon S3 storage classes, allowing users to archive large amounts of data at a very low cost. This makes it feasible to retain all the data for use cases like data lakes, analytics, IoT, machine learning, compliance, and media asset archiving. 

In addition to integration with most AWS services, Amazon S3 object storage services include tens of thousands of consulting, systems integrator and independent software vendor partners, with more joining every month. AWS Partner Network partners have adapted their services and software to work with Amazon S3 storage classes for solutions like Backup & RecoveryArchiving, and Disaster Recovery. No other cloud provider has more partners with solutions that are pre-integrated to work with their service.

The Amazon S3 Glacier and S3 Glacier Deep Archive storage classes offer sophisticated integration with AWS CloudTrail to log, monitor and retain storage API call activities for auditing, and supports three different forms of encryption. Amazon Glacier uses server-side encryption to encrypt all data at rest. Amazon Glacier handles key management and key protection for users, by using one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES256). Users who want to manage their own keys can encrypt data prior to uploading it.

Amazon Glacier Features

Vault Lock: Amazon S3 Glacier Vault Lock allows users to easily deploy and enforce compliance controls on individual S3 Glacier vaults via a lockable policy. Users can specify controls such as “Write Once Read Many” (WORM) in a Vault Lock policy and lock the policy from future edits. Once locked, the policy becomes immutable and Amazon S3 Glacier will enforce the prescribed controls to help achieve compliance objectives. 

Vault access policies: Vault access policies allow users to easily manage access individual S3 Glacier vaults. Users can define an access policy directly on a vault to grant vault access to users and business groups internal to the organization, as well as to external business partners. 

Vault inventory: Amazon S3 Glacier maintains an inventory of all archives in each vaults for disaster recovery or occasional reconciliation. The vault inventory is updated approximately once a day. Users can request a vault inventory as either a JSON or CSV file which will contain details about the archives including the size, creation date, and the archive description if provided during upload. The inventory will represent the state of the vault as of the most recent inventory update.

Access control: Amazon S3 Glacier uses AWS Identity and Access Management (IAM) to help users securely control access to AWS and Amazon S3 Glacier data. Users can create users in IAM, assign individual security credentials (i.e., access keys, passwords, and multi-factor authentication devices) and IAM policies on each Amazon S3 Glacier vault to grant permitted activities to intended users.

Tagging support: Amazon S3 Glacier allows users to tag S3 Glacier vaults for easier resource and cost management. Tags are labels that users can define and associate with the vaults, and using tags adds filtering capabilities to operations such as AWS cost reports. 

Audit logs: Amazon S3 Glacier supports audit logging with AWS CloudTrail, which records Amazon S3 Glacier API calls for your account and delivers these log files to the user. These log files provide visibility into actions performed on Amazon S3 Glacier assets. Using audit logging can help implement compliance and governance objectives for cloud-based archival system. 

Data upload and retrieval are done using the AWS SDKs or the underlying Amazon S3 Glacier API. Amazon S3 Glacier is supported by the AWS SDKs for Java, .NET, PHP, and Python (Boto). The SDK libraries wrap the underlying Amazon S3 Glacier API, simplifying your programming tasks. These SDKs provide libraries that map to an underlying REST API and enable users to easily construct requests and process responses. The AWS SDKs for Java and SDK for .NET offer high-level and low-level API libraries.

Low-level API: The low-level wrapper libraries map closely to the underlying Amazon S3 Glacier API and provide the most complete implementation of the underlying Amazon S3 Glacier operations.

High-level API: The high-level APIs further simplify application development with a higher-level of abstraction for some of the operations. For example, when uploading an archive, the high-level API will automatically compute the checksum for the user.

Amazon S3 Glacier works together with Amazon S3 lifecycle rules to help automate archiving of Amazon S3 data and reduce overall storage costs. Users can easily set up a rule that stores all previous Amazon S3 object versions in the lower cost S3 Glacier storage class and deletes them from S3 Glacier storage after 100 days. 

Amazon S3 Glacier provides three retrieval features for archives to meet varying access time and cost requirements: Expedited, Standard, and Bulk retrievals. Archives requested using Expedited retrievals are typically available within 1 – 5 minutes, allowing users to quickly access data when occasional urgent requests for a subset of archives are required. With Standard retrievals, archives typically become accessible within 3 – 5 hours. Or users can use Bulk retrievals to cost-effectively access significant portions of the data, even petabytes, for just a quarter-of-a-cent per GB.

Data retrieval policies: Amazon S3 Glacier data retrieval policies allows users to define the data retrieval limits with a few clicks in the AWS console. They can limit retrievals to “Free Tier Only”, or to retrieve more than the free tier, by specifying a “Max Retrieval Rate” to limit the retrieval speed and establish a retrieval cost ceiling. 

Amazon S3 Glacier can be accessed using the AWS Management Console, an easy-to-use web interface that provides the capability to create vaults, configure vault-level access permissions, and set up SNS notifications for data retrieval. The console also presents a storage usage summary for each vault as well as the last refresh time for the vault inventory.

Amazon S3 Glacier Select allows queries to run directly on data stored in Amazon S3 Glacier without having to retrieve the entire archive. Amazon S3 Glacier Select changes the value of archive storage by allowing users to process and find only the bytes they need out of the archive to use for analytics.

Users analytics application can call the Amazon S3 Glacier Select API to retrieve only the relevant data for query from the Amazon S3 Glacier archive. Amazon S3 Glacier Select will soon integrate with Amazon Athena and Amazon Redshift Spectrum so users can now consider S3 Glacier archives a part of the data lake.

Prior to S3 Glacier Select, an Amazon S3 Glacier archive had to be completely restored before the data could be used. Now AWS customers can use S3 Glacier Select to lower their costs and uncover more insights from their archive data.

AWS Snowball can accelerate movement of large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the Internet. For large data sets, AWS Snowball is often faster than Internet transfer and more cost effective than upgrading connectivity. Users can use AWS Snowball for migrating data into the cloud, distributing content to their customers, sending backups to AWS, and disaster recovery.

AWS Direct Connect establish a high-bandwidth, dedicated network connection from users premises to AWS. With AWS Direct Connect, users can transfer the business critical data directly from their datacenter into AWS, bypassing the Internet service provider and removing network congestion. AWS Direct Connect provides 1 Gbps and 10 Gbps connections, and users can provision multiple connections if more capacity is needed.

Amazon S3 Glacier Select

Amazon S3 Glacier Select allows users to run queries on the data stored in Amazon S3 Glacier, without the need to restore the entire object to a hotter tier like Amazon S3. With Amazon S3 Glacier Select, users can perform filtering and basic querying using a subset of SQL directly against their data in Amazon S3 Glacier. When users provide a SQL query and list of Amazon S3 Glacier objects, Amazon S3 Glacier Select will run the query in-place and write the output results to the bucket that was specified in Amazon S3.

  • Amazon S3 Glacier Select enables users to perform analysis on their data in Amazon S3 Glacier without first staging it in a hotter storage tier like Amazon S3. Which makes it cheaper, faster and easier to gather insights from the cold data in Amazon S3 Glacier.
  • Higher-level Big Data applications can also use the Amazon S3 Glacier Select APIs to provide Amazon S3 Glacier as an additional data source, so that customers can use their tools and languages against their S3 Glacier data.
  • For customers occasionally face situations where they need to perform filtering on specific keys in response to an audit where they must respond in a few hours, such as a customer who might need to query all of their usage logs for the past year to respond to a billing dispute.
  • Amazon S3 Glacier Select can unlock exciting business value for archives, that opens up multiple scenarios of using Amazon S3 Glacier for Big Data, IoT, and custom analytics workloads.

S3 Glacier provides three retrieval options – Expedited, Standard, and Bulk. All of these options provide different retrieval times and costs. Amazon S3 Glacier Select works with each of these retrieval options, allowing users to choose the option best aligned to the speed at which they want the query to return results.

  • Data accessed using Expedited retrievals are typically made available within 1 – 5 minutes.
  • Standard retrievals complete within 3 – 5 hours.
  • Bulk retrievals complete within 5 – 12 hours. 

Legacy archival solutions: Like on-premises tape libraries, legacy archival solutions have highly restricted data retrieval throughput and rarely have idle compute capacity nearby. The problem is even worse if tapes have been sent to an off-site storage facility. 

  • Running any kind of analysis on legacy archival solutions can easily take anywhere from weeks to even months.
  • In contrast, with Amazon S3 Glacier Select it is easy to analyze Amazon S3 Glacier data in-place quickly and inexpensively at latencies you choose ranging from minutes to hours.
Querying Archives with S3 Glacier Select
With S3 Glacier Select, Users can perform filtering operations using simple Structured Query Language (SQL) statements directly on their data in S3 Glacier. When users provide an SQL query for a S3 Glacier archive object, S3 Glacier Select runs the query in place and writes the output results to Amazon S3. Using S3 Glacier Select, users can run queries and custom analytics on their data that is stored in S3 Glacier, without having to restore the data to a hotter tier like Amazon S3.
  • When users perform select queries, S3 Glacier provides three data access tiers—expeditedstandard, and bulk. All of these tiers provide different data access times and costs, and users can choose any one of them depending on how quickly the data need to be available. 
  • Users can use S3 Glacier Select with the AWS SDKs, the S3 Glacier REST API, and the AWS Command Line Interface (AWS CLI). 

In order to use S3 Glacier Select, users need to fulfil the following are requirements and quotes:

  • Archive objects that are queried by S3 Glacier Select must be formatted as uncompressed comma-separated values (CSV).
  • Users need to have an S3 bucket to work with. In addition, the AWS account they are using to initiate a S3 Glacier Select job need to have write permissions for the S3 bucket. The Amazon S3 bucket must be in the same AWS Region as the vault that contains the archive object that is being queried.
  • Users must have permission to call Get Job Output (GET output).
  • There are no quotas on the number of records that S3 Glacier Select can process. An input or output record must not exceed 1 MB; otherwise, the query fails. There is a quota of 1,048,576 columns per record.
  • There is no quota on the size of your final result. However, your results are broken into multiple parts.
  • An SQL expression is limited to 128 KB.

Using S3 Glacier Select, users can use SQL commands to query S3 Glacier archive objects that are in uncompressed CSV format. With this restriction, users can perform simple query operations on your text-based data in S3 Glacier. 

To query S3 Glacier data, create a select job using the Initiate Job (POST jobs) operation. When initiating a select job, users provide the SQL expression, the archive to query, and the location to store the results in Amazon S3.

S3 Glacier Select supports a subset of the ANSI SQL language. It supports common filtering SQL clauses like SELECT, FROM, and WHERE. It does not support SUM, COUNT, GROUP BY, JOINS, DISTINCT, UNION, ORDER BY, and LIMIT.

Amazon Glacier Data Model

 

The Amazon S3 Glacier (S3 Glacier) data model core concepts include vaults and archives. S3 Glacier is a REST-based web service. In terms of REST, vaults and archives are the resources. In addition, the S3 Glacier data model includes job and notification-configuration resources. These resources complement the core resources.

  • Vault: A vault is a container for storing archives. When users create a vault, they specify a name and choose an AWS Region where the vault created. Each vault resource has a unique address.
  • Archive: An archive can be any data such as a photo, video, or document and is a base unit of storage in S3 Glacier. Each archive has a unique ID and an optional description. Each archive has a unique address.
  • Job: S3 Glacier jobs can perform a select query on an archive, retrieve an archive, or get an inventory of a vault. When performing a query on an archive, users initiate a job providing a SQL query and list of S3 Glacier archive objects. S3 Glacier Select runs the query in place and writes the output results to Amazon S3.
  • Notification Configuration: Because jobs take time to complete, S3 Glacier supports a notification mechanism to notify when a job is completed. Users can configure a vault to send notification to an Amazon Simple Notification Service (Amazon SNS) topic when jobs complete.

Vault

 
 

A vault is a way to group archives together in Amazon S3 Glacier. Users can organize data in Amazon S3 Glacier using vaults. Each archive can be stored in a vault where it is needed. Users may control access to these data by setting vault-level access policies using the AWS Identity and Access Management (IAM) service. Using the Amazon Simple Notification Service (Amazon SNS), users cab attach notification policies to the vaults. Which enable them or their  application to be notified when data that requested for retrieval is ready for download.

  • When creating a vault, users specify a vault name and the AWS Region in which create the vault. For a list of supported AWS Regions, see Accessing Amazon S3 Glacier.
  • Amazon S3 Glacier allows users to tag the Glacier vaults for easier resource and cost management. Tags are labels that can define and associate with vaults, and using tags, users can adds filtering capabilities to operations such as AWS cost reports. 
  • Users can create up to 1,000 vaults per account per region.
  • Users may delete any S3 Glacier vault that does not contain any archives using the AWS Management Console, the Amazon Glacier direct APIs or the SDKs.

S3 Glacier supports various vault operations. Vault operations are specific to particular AWS Regions. In other words, when  creating a vault, users create the vault in a specific AWS Region. When listing vaults, S3 Glacier returns the vault list from the AWS Region, where it was specified in the request.

  • Users can retrieve vault information (metadata) such as the vault creation date, number of archives in the vault, and the total size of all the archives in the vault. S3 Glacier provides API calls to retrieve this information for a specific vault or all the vaults in a specific AWS Region in the account. 
  • For each archive in the list, the inventory provides archive information such as archive ID, creation date, and size. S3 Glacier updates the vault inventory approximately once a day, starting on the day the first archive is uploaded to the vault. A vault inventory must exist for users to be able to download it. A vault inventory refers to the list of archives in a vault. 

An archive is a durably stored block of information. It is any object, such as a photo, video, or document, that users store in a vault. It is a base unit of storage in Amazon S3 Glacier (S3 Glacier). Each archive has a unique ID and an optional description. When uploading an archive, S3 Glacier returns a response that includes an archive ID. This archive ID is unique in the AWS Region in which the archive is stored. Users store data in Amazon S3 Glacier as archives.

  • Users may upload a single file as an archive, but the costs will be lower if it is aggregated in the data.
  • TAR and ZIP are common formats that customers use to aggregate multiple files into a single file before uploading to Amazon S3 Glacier. The total volume of data and number of archives stored are unlimited.
  • Individual Amazon S3 Glacier archives can range in size from 1 byte to 40 terabytes. The largest archive that can be uploaded in a single Upload request is 4 gigabytes. For items larger than 100 megabytes, customers should consider using the Multipart upload capability.
  • Archives stored in Amazon S3 Glacier are immutable, i.e. archives can be uploaded and deleted but cannot be edited or overwritten.
  • When uploading large archives (100MB or larger), users can use multi-part upload to achieve higher throughput and reliability. Multi-part uploads allow to break large archive into smaller chunks that are uploaded individually. Once all the constituent parts are successfully uploaded, they are combined into a single archive.

S3 Glacier supports basic archive operations such as uploading, downloading, and deleting. 

  • Users can upload an archive in a single operation or upload it in parts. Using API call, users can upload an archive in parts is known as Multipart Upload
  • Downloading an archive is an asynchronous operation. users need to initiate a job in order to download a specific archive. Once  the job request received, S3 Glacier prepares the archive for download. After the job completes, users can download the archive data.
  • S3 Glacier provides an API call that users can use to delete one archive at a time. After uploading an archive, users cannot update its content or its description. In order to update the archive content or its description is by deleting the archive and uploading another archive.

Archives

 
 

 

Amazon Glacier Security 

Amazon Glacier

Amazon S3 Glacier (S3 Glacier) provides highly durable cloud storage for data archiving and long-term backup. S3 Glacier is designed to deliver 99.999999999 percent durability and provides comprehensive security and compliance capabilities that can help users meet stringent regulatory requirements. S3 Glacier redundantly stores data in multiple AWS Availability Zones (AZ) and on multiple devices within each AZ. To increase durability, S3 Glacier synchronously stores your data across multiple AZs before confirming a successful upload.

  • Data protection refers to protecting data while in-transit (as it travels to and from Amazon S3 Glacier) and at rest (while it is stored in AWS data centers). Users can protect data in transit that is uploaded directly to S3 Glacier using Secure Sockets Layer (SSL) or client-side encryption.
  • Users can access S3 Glacier through Amazon S3. Amazon S3 supports lifecycle configuration on an Amazon S3 bucket, which enables users to transition objects to the Amazon S3 S3 Glacier storage class for archival. Data in transit between Amazon S3 and S3 Glacier via lifecycle policies is encrypted using SSL.
  • Data at rest stored in S3 Glacier is automatically server-side encrypted using 256-bit Advanced Encryption Standard (AES-256) with keys maintained by AWS. I
  • Amazon S3 Glacier encrypts data (Server-side encryption) as it writes it to its data centers and decrypts it for users when they access it, also known as data encryption at rest. 
  • Data at rest stored in S3 Glacier is automatically server-side encrypted using AES-256, using keys maintained by AWS. As an additional safeguard, AWS encrypts the key itself with a master key that we regularly rotate.
  • Access to Amazon S3 Glacier (S3 Glacier) requires credentials that AWS can use to authenticate users requests. Those credentials must have permissions to access AWS resources, such as a S3 Glacier vault or an Amazon S3 bucket.
  • When using S3 Glacier via Amazon S3, users can use Amazon CloudWatch alarms to watch a single metric over a time period that was specified. If the metric exceeds a given threshold, a notification is sent to an Amazon SNS topic or AWS Auto Scaling policy

Amazon Glacier is an extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival. With Amazon Glacier, users can reliably store their data for as little as $0.07 per gigabyte per month. Amazon Glacier enables users to offload the administrative burdens of operating and scaling storage to AWS, so that they don’t have to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time-consuming hardware migrations.

Users store data in Amazon Glacier as archives. An archive can represent a single file or combination of several files to be uploaded as a single archive. Retrieving archives from Amazon Glacier requires the initiation of a job. Users organize the archives in vaults. Users control access to the vaults using the AWS Identity and Access Management (IAM) service. Amazon Glacier is designed for use with other Amazon Web Services. Amazon S3 allows users to seamlessly move data between Amazon S3 and Amazon Glacier using data lifecycle policies. Users can also use AWS Import/Export to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport.

  • Organizations are using Amazon Glacier to support a number of use cases. These include archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement.
  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived. Amazon Glacier jobs typically complete in 3 to 5 hours.
  • To keep costs low yet suitable for varying retrieval needs, Amazon S3 Glacier provides three options for access to archives, from a few minutes to several hours, and S3 Glacier Deep Archive provides two access options ranging from 12 to 48 hours.