Kinesis Streams: An overview

Overview details:

  • Kinesis streams are fully managed
  • They enable data processing on the data stream using Lambda to enable transformation of data (discussed in detail later)
  • They can concurrently be consumed by multiple applications
  • The streams can replay data within the data retention period for reprocessing
  • Streams provide sub-second processing latency, which means that data is available for use by your application less than a second after being ingested into the stream
  • They guarantee the order of data based on the order that it was ingested into the stream (FIFO)
  • They guarantee the delivery of data at least once

Worker instances in a stream are referred to as shards. The number of shards determines the throughput of a Kinesis stream. Shards are spread across multiple availability zones for availability and durability.

Shard Limits:

  • Each shard allows a max of 1MB/s or 1000 records/s into the stream
  • Each shard allows for 2MB/s or 5 transactions/s out of the stream – allowing for multiple consumers to read data out of the stream. A transaction could be a batched group of records.

Scaling your stream:

Re-sharding allows you to scale streams by adding or removing shards. You can either merge or split shards (as described below). Both the merge and split functions take less than 1 minute to complete. It’s only possible to merge or split a single shard at a time. So, 20 shards will take ~20 mins.

  • Merge Shards: this process takes 2 shards and combines the data onto a single shard, reducing the overall capacity of the stream.
  • Split shard: this process creates a new shard and splits the data from the existing shard between the two, increasing the overall capacity of the stream.

When data is added to a stream, it must include: the stream name, the payload and the partition key.

You use partition keys to keep related data together on the same shard (e.g. usage could have a partition key of city, to keep all data for a particular city on a shard) or all the data for a specific B2B customer on a single shard. Kinesis will automatically distribute the keys for you across shards as it sees fit.

General Limits:

  • 50 shards for US East, US West and EU Regions. 25 for all other regions. You can request more, this is a soft limit.
  • 24 hours of retention period by default. Max 7 days
  • Max payload size is 1MB

Shard Limits:

  • 1MB/s IN
  • 1000 records/s IN
  • 2MB/s OUT
  • 5 transactions/s OUT
  • 10MB Max GetRecords API
  • 5 calls per second on the CreateSteam, DeleteStream, ListStream, GetShardIterator and MergeShards APIs
  • 10 calls per second on the DescribeStream API
  • Kinesis is designed for more consumers than producers

Cloudtrail & Cloudwatch Integration

Kinesis streams integrate with Cloudtrail, which provides you will a full audit trail on the API calls made against each of the APIs listed above.

Kinesis is also integrated with Cloudwatch, which enables us to monitor metrics such as:

  • Number of records in
  • Number of records out
  • Size of data ingested
  • Count of records ingested
  • Stream latency
  • WriteProvisionedThroughputExceeded – means that there are not enough shards to handle the input
  • ReadProvisionedThroughputExceeded – means that there are not enough shards to handle the output

Kinesis Stream Pricing

Streams are priced:

  • Per shard hour 
  • Per PUT payload unit (rounded to the nearest 25kb)
  • For extended data retentions.

If you have 10 * 1kb files, you should aggregate them. Otherwise, each will be rounded up to 25kb and you’ll be paying for 250kb data, rather than 10kb

Getting data out of the streams

The Kinesis Client Library (KCL) is the easiest way to get Data out of Kinesis streams into a consumer (e.g. EC2 instances). A few key points on KCL are:

  • KCL only works for Kinesis stream service, and is not for use with Firehose
  • It’s available for a wide variety of languages, making it applicable to many projects
  • Automatically creates a DynamoDB table with the same name as your application to manage state
  • Multiple KCLs can work on the same or different streams. E.g. One KCL can take and utilise data; another can pipe into machine learning etc…
  • KCL checkpoints processed records, so it knows where it left off, should there be any issues or breaks in the feed processing
  • KCLs can load balance among each other
  • KCL automatically deals with stream scaling (shard splits / merges) and automatically knows to pull from newly provisioned shards


  • Always over provision shards slightly to cope with spikes in demand
  • Use multiple consumers for A/B testing without downtime enabling you to test new code on the stream, by replaying the stream, without impacting the live system
  • Dump data into S3 to decouple the application from the streaming function. It’s cheap & durable
  • Lambda is great for transformations and processing (e.g. enrichment)
  • Keep state in DynamoDB, so you can apply logic. For example, if you require ‘exactly-once’ delivery, you should utilise DynamoDB to store transaction IDs. Logic can then be built into the consumer application ensure no duplicate records
  • Utilise tagging on streams to identify data. This is useful for cost segregation between clients.