Understanding rate limits

Why APIs have them and how to overcome them

May 08, 2023

Why do public APIs have rate limits? Public means open to anyone. Even for bad actors on the internet. If you don't know what you're doing sometimes, it can also be perceived as bad at the other end.

Rate limits are primarily implemented to prevent DDoS attacks by bad actors or accidentally by good users. When a server gets a DDoS attack, it goes down when it crosses the threshold of how many requests it can take before crashing. To safeguard the API server from going down or being unable to serve all the customers equally, it has to be prevented by the API server maintainers.

Even if an API service doesn’t get that much load or is popular to get a DDoS attack, rate limits are often implemented to reduce server costs and respond faster. Most APIs are going to have these rate limits.

So, how to circumvent these rates limits?

It’s simple. Consume when you can and wait for the next availability.

Usually, APIs are limited at the minute level. Sometimes in hours, days, and even months level. When API fails with 429 HTTP error codes, replay the API request for the next minute. If hour, day, or month level limits, the replay has to be moved accordingly to the next time.

Then how to prevent API failures due to too many requests errors?

1. Prevent it

Some public APIs will have the current API rate in the response header. Keep watching them and decide if you should make the following API.

For example, GitHub API documentation mentioned that they send multiple headers in their API response with x-ratelimit prefix with every request.

Okay, you decide to fail the user actions after the allowed limit. What is the best way to do it?

2. Fail fast

When you know, something will fail in a workflow. It's better to throw an error to the user at the earliest. If a particular API fails due to the rate limit, make it known at the earliest and throw an error to the user, or better, don't even allow the user to invoke this action and make them wait.

For example, banks do not allow you to withdraw or transfer money more than specific daily numbers. They usually show the error when you attempt it rather than towards the completion of this action.

If the user’s action cannot be prevented, you must think of alternative ways here. One alternative is delaying the rest of the workflow for the subsequent availability of the APIs.

3. Queue them up

Let's take the same banking example. If you can't take out money, whether to allow the user action or not is critical.

Suppose the bank uses a communication services API for the post-action SMS. Say, sending an SMS acknowledgment about a money transfer/withdrawal. The user action need not be stopped just because they can't send out the SMS at the same minute. The SMS can be delayed for the next minute or so.

You can do the exact implementation and schedule the rest of the workflow for the next available time. From the scheduled list, execute the workflows in batches until the current quota. Let the rest of the list wait until the next available time and repeat until the list becomes empty.

When your application has to do this regularly, you better build a distributed message queuing service to keep the list and make the entire scheduler and worker operation smooth. This is one of the critical mechanisms in microservices architectures.

This is not an ultimate list of alternatives. These are some of the common patterns. I hope you find it useful. Let me know if you think of other options or flaws.

PS: How to Get New Ideas

Dev++

Discussion about this post