Everything you need to know about Amazon S3

Amazon S3 Buckets

Amazon S3 is one of the corner stones of your cloud infrastructure environment. Before we get started, here are a few quick fire facts:

  • The smallest object you can store in S3 is 1 byte

  • The largest object you can store in S3 is 5tb

  • When an object is uploaded, it is automatically synchronised to other availability zones

  • You can have a maximum of 100 S3 buckets per account

  • The bucket ownership cannot be transferred

Next, we have S3 security, which is pretty comprehensive. Here is a high level overview:

  • All objects are private by default

  • To share an S3 bucket with another AWS account we can use an ACLs

  • IAM policies grant users or groups of users access to buckets

  • Public content can be downloaded via a URL to the file

  • We can also create a signed URL, which uses a key to create a certificate. This generates a unique URL (with an expiry time). This means that the person you’ve sent the link to must download the document before the link expires.

  • Bucket policies can be used to further enhance security

    • You can grant permissions to anonymous users

    • You can restrict access to certain IP addresses

    • You can restrict access based on the HTTP refer (a specific website they come from)

Everything in S3 can be encryped. To do this, you have 2 primary methods. The first is server side encryption which is the process through which a document is loaded into an S3 bucket, it is encrypted before being saved. This is then decrypted when it is downloaded.

Alternatively, you can use your own encryption keys. You can upload your already encrypted file to S3 and decrypt it after download. This does mean that you’ll be looking after your own encryption keys, which can cause chaos if you lose them. It is always worth considering integrating with the AWS key management service.

There are a number of uses for Amazon S3 buckets beyond just file storage. Firstly, you can use S3 to host static HTML web pages. This is used in conjunction with Route53 to set up your domain names pointing at the S3 bucket.

S3 can also act as an origin to Cloudfront CDN. What that means is S3 can store all the images and PDF documents – Cloudfront can then pull the files from S3 and serve them through the CDN network.

Multi part upload lets you upload objects in parts. This enables you to upload multiple parts of the same object concurrently. This means the file is uploaded much faster. You must use this for objects over 5GB, but, it should be used for objects over 100MB for speed purposes.

When we upload an object to S3, it stays within the specified region that we upload to. It is then synchronised to all of the other availability zones within that region.

There is an issue known as eventual consistency which only applies only to the US east region. In this particular region the availability zones are very far apart. Sending data takes longer than in other regions. If you upload an object, you may hit any availability zone. If your application tries to immediately read that object, the file might not be synchronised to the other availability zones yet. This may cause an object not found error.

We can also set event notifications when a specific event or action occurs against the S3 bucket. For example, if RRS storage is enabled and an object is lost we can prompt an RRS object lost notification. This will use SNS, SQS or Lambda to write automation to hit a website to recreate the lost object and notify the administrators that this has happened.

Image used under creative commons

This article was brought to you by Netshock. Netshock aim to provide technology guides and insight to our readers

Tagged under: