Session Handling for 1 million requests per hour

Published in

Paper Planes

4 min readJan 25, 2017

The Problem

We have a varied and distributed server stack that uses three different languages along with 50–100 EC2 instances running at any given time. We handle about 20 million requests a day with more than a million requests at peak hours. Most of these requests (75%) are authenticated requests i.e. they hit a session store. This made us re-evaluate our session store to be maintainable as well as scalable.

The Options

We had various options for the session store: Raw file, MySQL, Memcached, Redis and DynamoDB. Here is a quick note on each of them and what we ended up with.

Raw file

Clearly, a file based session store would never work. Imagine multiple servers reading/writing to a raw file and therefore being serialized. Maintaining the session files on a single server, so that the other servers can read/write, along with high IO and backup issues. Woah! I can’t imagine doing that.

MySQL

The database buddy of PHP, MySQL. Let’s use MySQL for the session store, a team member suggested. We would need high provisioned read IOPS on multiple read replicas and manage sharding. Seems like quite a bit of work and high maintenance. All the modules would have to build this complexity and we would need to ensure high availability. With it’s high maintenance and complex sharding logic tightly coupled with application logic, we rejected MySQL.

Memcached & Redis

Memcached was quickly dropped because of its lack of persistence. Redis was fast, could handle lot’s of connections and had an option to enable flush to disk for persistence. AWS also has it as a part of the ElasticCache offering. It looked like a front runner. But the problem with Redis was limited querying ability and it can handle just 1k — 3k req/second beyond which sharding will be needed. So, we left Redis as the last resort.

DynamoDB

Time to evaluate DynamoDB. At the outset, it looked great. It was easy to increase the read/write provisioning and had primary and secondary indexes. So unlike Redis, we could query on multiple keys. Session handling was one of the use cases for which DynamoDB was being promoted. The problem with DynamoDB was executing update/delete queries. Say we want to delete all sessions except the current one. There is no simple way to do that. You have to fetch all the sessions of the user, iterate through them and ignore the current session and then fire a batch write. Probably the same for every other session store except MySQL.

There was another gotcha — while firing updateItem if you forget to mention the ‘Expected’ clause it will create that entry in the table with the Keys in the ‘KeyConditions’. This was easily handled by writing a wrapper and enforcing the Expected.

One of our components is written in C++ and of course there is no AWS SDK for C++. We had to write curl calls to query DynamoDB and anyone who has tried to fire a curl call for AWS API’s knows the complex series of hashing required to create the signatures. This again required some programming and testing but was just a one time effort.

All these seems like much smaller issues when compared to other alternatives. So we went with DynamoDB. Over the last few months, as our traffic increased from 10,000 requests per hour to 1 million requests per hour, all we had to do to scale was increase read IOPS from 100 to 1000 and write IOPS from 10 to 100. That’s the extent of it.

Tips to integrate DynamoDB with PHP

AWS’ PHP SDK uses Guzzle framework to handle all http requests. They have a retry logic for DynamoDB calls (in case of failure it retries upto 11 times) with default curl timeouts. Your PHP thread keeps waiting, with over 100 requests per second per server. We cannot afford PHP threads to wait for so long just on session verification. So we have overridden the config, reduced the failure retry count to 3 and curl timeouts to 2 seconds.

Second optimization we performed — if and only if your code base is on AWS, you can make your AWS API’s over http instead of https. This reduces the API response time by half. Again, this is suggested by the AWS support only if your code runs on EC2 instances and on the same region.

One major drawback was that we recently wanted to switch to Hip Hop Virtual Machine (HHVM) but HHVM does not clear test cases for Guzzle. API calls to DynamoDB were being sent with incomplete data. Hence for now we could not use HHVM. The guys at Facebook are apparently working to support Guzzle, hope its available soon.

Would love to hear what you’re using for Session Store. Or if you have some interesting experiences with DynamoDB.