Newsletter #4: Google's AlloyDB Database, History of AWS S3

Welcome to Data Management Newsletter #4, where I curate interesting articles on data management and data protection for data practitioners and executives. For this week's newsletter, I want to share two long-form articles I wrote on AlloyDB, Google's new PostgreSQL-compatible database, and AWS S3.

AlloyDB Design & Architecture

One of the more exciting announcements at Google I/O this year was the preview of AlloyDB for PostgreSQL, a fully-managed, PostgreSQL-compatible database service on Google Cloud.

The AlloyDB team boldly made these claims about the service:

Compared to standard PostgreSQL, AlloyDB is more than 4x faster for transactional workloads, and up to 100x faster for analytical workloads.
Compared to Amazon's service (they don't say but we can guess it's Aurora 🤔), AlloyDB is 2x faster for transactional workloads.

These are impressive numbers!

How were the AlloyDB developers able to achieve this kind of acceleration for both transactional and analytical workloads? What architectural innovations were necessary in order to get there? And are there lessons to be learned for architects and developers of similar cloud scale distributed systems?

I had a lot of fun writing a blog post about the architecture and design details of AlloyDB, focussing particularly on its storage engine, which is the secret sauce behind these performance numbers.

Read the entire post on the Dragon's Egg website!

History of AWS S3

AWS S3 is notorious for exposing confidential data from leaky misconfigured buckets. Common misconfigurations include leaving access to data open to the public, not using SSE (server side encryption), or even leaving a copy of a bucket's access keys in plaintext in a public GitHub repo!

To be fair, these incidents are indicative of the operational challenges involved in managing a service like S3 that is easy-to-use and automation-friendly. The typical DevOps view of S3 is centered around discussion of its vulnerabilities, and on playbooks to provision, manage, and secure buckets, in a way that both safeguards confidential data, and allows teams to be agile and automation-driven.

Beyond the operational challenges, though, how might an architect or a technologist look at the S3 service itself? Are there examples of successful products that have incorporated it into their architectures in unique ways? What can we learn from its evolution from a Simple Storage Service to being the backbone of modern data infrastructure?

I wrote about an architect-technologist's view of S3, that chronicles the pivotal role it has played as a key architectural component of successful data warehouses and data lakes of the last decade, and the promise it holds for the next generation of data products.

Read the entire post on the Dragon's Egg website!

I had a lot of fun writing these, and I hope you enjoy reading them as well! If you want to share any thoughts or feedback, just hit the Reply button!

Cheers, and hope you've a great weekend!