HQ: Scaling A Web App to Meet Demand

HQ ScreenshotFor the last few months, I grab my phone most nights at 9 p.m. and open the HQ app and join a live trivia game. For 15 minutes, users from around the world answer increasingly-difficult trivia questions for money. You miss a question, you’re out. You can keep watching, but you won’t win any money. I’ve never won, but I’ve made it to question 10 (out of 12) once.

Being a nerd, the aspect that really interests me about HQ is the technology stack behind it. The people that run the app haven’t talked about the tech behind, but I’m really intrigued by it.

As the app has grown in popularity, and has now rolled out an Android version, it’s not out of the ordinary for 1 million people to be in the game, which presents a pretty big technical challenge. Just watch the chat to see users complaining about glitches, time outs, lag, and so on.

The app has three parts: the live video of the host, the (awful) game chat, and the questions/answers. The video is probably the easiest to scale. The app, after all, is from the creators of Vine. They were working on a video app that morphed into HQ, so they get streaming mobile video.

The chat system also is pretty straight-forward. Think very fast moving IRC just with an overabundance of emojis. I swipe it away most days.

Where my interest is really piqued is how the questions and answer portion works. In the game, users get 10 seconds to answer each question. Within moments after the time is up, the host and app display the number of people that got it right.

While there are only 3 answer choices, there can be a million users bombarding an API with requests within the same 10 second span. That’s a lot for any system to handle. The system then needs to aggregate those answers, calculate how many got it right, and not only update the app but let each user know if they are right or if they’ve been eliminated.

As I said earlier, I don’t really know what the tech stack is, but I would assume it uses Google’s or Amazon’s cloud to be able to handle that sort of mass rush of inputs.

Perhaps they use Amazon Simple Notification Service (SNS). SNS is a fully managed pub/sub messaging service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. A service like that could quickly take in a million messages and process them in a few seconds. That’s pretty good techno-speak for taking in a bunch of very short messages and routing them to the appropriate place they need to do, be it other services, additional code, and so on.

With the explosive growth of the app in the last few months, managing a million concurrent users is a major challenge. There have been glitches and errors – I’ve been eliminated before even though I answered correctly. This strain was especially noticeable when HQ decided to do a game during the Super Bowl halftime show. Myself and two million of my closest friends joined the game and it just couldn’t really handle it. Instead of knowing the answer within seconds, it was taking a minute to process 2 million entries. I’d love to see the MRTG graphs for the servers running that game.

If you haven’t tried the game, I’d recommend it. You can even use my code and get a free life in the game. See you at 9 p.m.