Data and IoT unveiled as a major focus at AWS re:Invent 2016

Image Credit: Amazon Web Services

re:Invent is now well and truly over for another year and while the flood of announcements has slowed, in true AWS style they haven’t stopped, just slowed to a trickle.

Our team have now had time to absorb the major announcements and we’re going to discuss our top 5 key releases. We’ve painstakingly picked out the ones that we think will have the most impact to our customers, the industry and well, just some that we think are cool.

From the flood of announcements that came out of re:Invent, the most exciting releases had a central theme. We can see that Big Data, Analytics and IoT is going to be a significant focus for many organisations in 2017. From data transfer via AWS Snowmobile, faster analysis with Amazon Athena, through to Greengrass providing connectivity to Cloud in the most remote corners of the world. The underlying theme is changing what organisations can do with their data.

Without further ado here are our top 5.

AWS Snowmobile

Of course we had to pick Snowmobile. It’s cool, there’s no doubting that. Who wouldn’t want a bright white 45ft long container that requires 100Gbps Fibre, 3 phase power and chilled water, not to forget the round the clock security detail on site and in transit, sitting at their datacentre? I know I would.

In reality who has this much data? 100PB of data – most likely very unstructured – is clearly a problem for at least a few AWS customers and I’m sure there are some companies in Australia who would make use of Snowmobile (If you are one such company, we really want to talk to you). Think government or regulatory institutions with all those years of records, and potentially a mountain of digitised paper records too. It can start to add up.

More likely you are in the Snowball category (or maybe several). These are still cool, but are so 2015. 2016 brings you Snowball Edge, with just a tad more storage and compute, as well as Greengrass.

AWS Greengrass

AWS Greengrass is really exciting. It provides a way to talk to the cloud without the cloud being there – and the potential for that is amazing.

Today many of the devices we take for granted don’t work. When I was on the flight over to re:Invent, I realised what a massive inconvenience it is to not have Internet. I’m so used to being able to connect to everything, stream music and videos and look something up. Even on a more recent flight, I hadn’t synced my Kindle and I finished the book I was reading and there wasn’t another one there ready to read. I was lost!

I realise these aren’t great examples to why we would use Greengrass. But the point is our reliance on Internet connectivity is now as ubiquitous as power, gas, running water and the telephone. And even more so, as it’s completely mobile and with us almost everywhere we go.

Even though we have this high level of connectivity, I would say it’s still not as reliable as power and still not everywhere (think remote locations such as deserts, mines or at sea). Or if it is (via Satellite), it’s very expensive. Greengrass gives us the ability to have devices make decisions and buffer data even if no connection to the cloud is present.

Greengrass isn’t available yet, but you can sign up for the preview. You can get started with Greengrass on any Linux 4.4 based distribution. AWS have validated Greengrass on Ubuntu and Amazon Linux. The requirements for Greengrass are minimal (1GHz CPU and 128MB of RAM) as well as being available for both x86 and ARM architectures. Qualcomm along with others have announced their support for Greengrass and plan to release products using the technology soon.

Amazon Athena

If you have a lot of data in standard data formats (e.g. CSV, JSON, ORC or Parquet), or it can be converted to this format and it is or could be stored in S3, then you can get started on your ‘Big Data’ adventure today with Amazon Athena. The data need not be structured either.

There is no need to spin up EMR clusters or Redshift clusters and no requirement to learn R or Spark. You can open up the Athena console, define your schema and then get started with just plain SQL. This will open the door for analysing data for even the smallest of IT shops and remember data is powerful, it can mean great things when used correctly.

Amazon Athena uses Presto under the hood and integrates with Quicksight for easy visualisation of your data. The possibilities are endless. For example, you could use Athena to analyse customer spending history and seasonalities within your Magento store without loading data into or learning Redshift or EMR. Process CloudFront, CloudTrail and ELB logs to analyse web requests and AWS API calls with ease.


The really cool new types in compute are F1 and Elastic GPUs, but I’m going to call out specifically the families below as these will be of great interest to many of our traditional workload customers with COTS or open-source systems running on EC2 and RDS.

The new t2.xlarge and t2.2xlarge are now a really compelling argument for those mildly memory intensive workloads or for those larger memory development projects. If you are currently going through a cost optimisation process with us, it might be worth evaluating some of your C4 and M4 workloads. For instance, if you are using c4.2xlarge instance types today, you might be able to move to t2.xlarge instance types for a significant cost reduction. Similarly, looking at your xlarge m4 instance types you might be able to replace these with the t2.xlarge for a modest cost reduction but higher burst performance.

Then there is the new i3 family. These are great for customers who have a really I/O intensive workload. Like NoSQL databases, or massive cache requirements. You can have up to 15TB of low-latency, Non Volatile Memory Express NVMe SSD with up to 3.3 million IOPS at a 4KB block size, and 16GB/second of throughput.

To keep those disks loaded with I/O you’ll need lots of vCPU, 64 of them, lots of RAM, 488GB worth and the new Elastic Network Adapter released earlier this year. ENA provides the next generation of enhanced networking for AWS. They provide checksum generation, multi-queue device interfaces and receive-side steering. If you have some very large I/O intensive workloads then these new i3 instance types should be a consideration.

The new r4 instance types from r4.large to r4.16xlarge, ranging from 2 vCPU, 15GB of RAM to 64 vCPU and 488 GB with up to 20Gbps networking. These feature DDR4 memory and Dual Socket Intel Xeon Broadwell processors (2.3GHz). If you are currently working with us on a rightsizing or cost optimisation project then you’ll want to consider moving your r3 instance types to the new r4s. You might even be able to save a little by dropping a size. Remember the r instance types are perfect for your memory intensive workloads and I expect to see them available as options for RDS and Elasticache real soon.

The new C5 instance types are really exciting as they bring a 2x performance increase over the c4 series. They feature the Skylake processor from Intel as the successor to the Broadwell. These processors are very fast and include the AVX-512 instruction set. When released – expected to be early 2017 – there will be six sizes ranging up to 72 vCPUs and 144GB of RAM. These instances will also support the ENA adaptor. These instances should be considered as replacements for c4 instance types.

Depending on performance testing, you will likely be able to drop one size down from your existing c4 type and reduce your spend. If you have very computationally expensive instructions and are currently making use of the AVX instructions, you might be able to get a further improvement by moving to the c5 instance types and compiling your applications to support the new micro-architecture to get that edge. If you are looking at purchasing c4 RIs I would consider holding off for 1-2 months until the new year and then consider if the c5s are a better fit for your workload.

AWS Organisations

We’re kind of cheating picking Organizations as it wasn’t announced at the keynotes, but was announced during re:Invent. But the potential impact to us and the way we manage accounts will have profound changes for us and for our customers.

Since we started using AWS to deploy customer workloads several years ago, we have built up a lot of tooling to manage multiple accounts. This stuff just didn’t exist, open-source or commercial and thus we built our own. As many are likely aware there are some components as part of the provisioning process of AWS accounts that are very manual and have no API. This has frustrated us as we strive to automate as much as possible in any of our workflows.

With Organisations, we expect to be able to move policy management to a hierarchical manner allowing us to build smarter access control for our customers and staff alike, to ensure we are providing the best security posture we can. The other significant benefit we hope to see from Organizations is the ability to completely provision AWS accounts programmatically and have them automatically obtain our policies, roles and billing details without any manual steps. This will mean a faster, less error prone turn around of new accounts and policy and role changes.

These are just a handful of the announcements that have come out over the past month and I have only touched the surface of each. If you missed my earlier pieces on pre re:Invent announcements, and updates on the ground, feel free to catch up with all the latest announcements from AWS.

Download your free AWS TCO Checklist
Greg Cockburn
Principal Cloud Architect/Chief Engineer for Bulletproof, Greg has extensive expertise in the Information Technology space and has specialised in UNIX and Linux Systems Administrations and broader application operations. Certified as an AWS APN Cloud Warrior and recognised for his AWS subject matter expertise, Greg has most recently focused on human organisation and how culture, process and interaction helps teams and businesses success.