Optimizing Amazon Easy Queue Service (SQS) for velocity and scale


Voiced by Polly

After a number of public betas, we launched Amazon Easy Queue Service (Amazon SQS) in 2006. Practically 20 years later, this totally managed service continues to be a basic constructing block for microservices, distributed techniques, and serverless purposes, processing over 100 million messages per second at peak instances.

As a result of there’s all the time a greater means, we proceed to search for methods to enhance efficiency, safety, inner effectivity, and so forth. Once we do discover a potential technique to do one thing higher, we’re cautious to protect present conduct, and sometimes run new and previous techniques in parallel to permit us to check outcomes.

Immediately I want to let you know how we just lately made enhancements to Amazon SQS to cut back latency, enhance fleet capability, mitigate an approaching scalability cliff, and scale back energy consumption.

Enhancing SQS
Like many AWS providers, Amazon SQS is carried out utilizing a set of inner microservices. Let’s deal with two of them immediately:

Buyer Entrance-Finish – The shopper-facing front-end accepts, authenticates, and authorizes API calls resembling CreateQueue and SendMessage. It then routes every request to the storage back-end.

Storage Again-Finish -This inner microservice is answerable for persisting messages despatched to plain (non-FIFO) queues. Utilizing a cell-based mannequin, every cluster within the cell accommodates a number of hosts, every buyer queue is assigned to a number of clusters, and every cluster is answerable for a mess of queues:

Connections – Outdated and New
The unique implementation used a connection per request between these two providers. Every front-end had to connect with many hosts, which mandated the usage of a connection pool, and in addition risked reaching an final, hard-wired restrict on the variety of open connections. Whereas it’s typically doable to easily throw {hardware} at issues like this and scale out, that’s not all the time one of the best ways. It merely strikes the second of fact (the “scalability cliff”) into the longer term and doesn’t make environment friendly use of assets.

After fastidiously contemplating a number of long-term options, the Amazon SQS group invented a brand new, proprietary binary framing protocol between the shopper front-end and storage back-end. The protocol multiplexes a number of requests and responses throughout a single connection, utilizing 128-bit IDs and checksumming to stop crosstalk. Server-side encryption supplies a further layer of safety in opposition to unauthorized entry to queue information.

It Works!
The brand new protocol was put into manufacturing earlier this yr and has processed 744.9 trillion requests as I write this. The scalability cliff has been eradicated and we’re already in search of methods to place this new protocol to work in different methods.

Efficiency-wise, the brand new protocol has diminished dataplane latency by 11% on common, and by 17.4% on the P90 mark. Along with making SQS itself extra performant, this modification advantages providers that construct on SQS as nicely. For instance, messages despatched by Amazon Easy Notification Service (Amazon SNS) now spend 10% much less time “inside” earlier than being delivered. Lastly, as a result of protocol change, the present fleet of SQS hosts (a mixture of X86 and Graviton-powered situations) can now deal with 17.8% extra requests than earlier than.

Extra to Come
I hope that you’ve got loved this little peek contained in the implementation of Amazon SQS. Let me know within the feedback, and I’ll see if I can discover some extra tales to share.

Jeff;



Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *