Implementing Rate Limiting with Resilience4j

  • August 5, 2020
Table Of Contents

In the previous article in this series, we learned about Resilience4j and how to use its Retry module. Let’s now learn about the RateLimiter - what it is, when and how to use it, and what to watch out for when implementing rate limiting (or “throttling”, as it’s also called).

Example Code

This article is accompanied by a working code example on GitHub.

What is Resilience4j?

Please refer to the description in the previous article for a quick intro into how Resilience4j works in general.

What is Rate Limiting?

We can look at rate limiting from two perspectives - as a service provider and as a service consumer.

Server-side Rate Limiting

As a service provider, we implement rate limiting to protect our resources from overload and Denial of Service (DoS) attacks.

To meet our service level agreement (SLA) with all our consumers, we want to ensure that one consumer that is causing a traffic spike doesn’t impact the quality of our service to others.

We do this by setting a limit on how many requests a consumer is allowed to make in a given unit of time. We reject any requests above the limit with an appropriate response, like HTTP status 429 (Too Many Requests). This is called server-side rate limiting.

The rate limit is specified in terms of requests per second (rps), requests per minute (rpm), or similar. Some services have multiple rate limits for different durations (50 rpm and not more than 2500 rph, for example) and different times of day (100 rps during the day and 150 rps at night, for example). The limit may apply to a single user (identified by user id, IP address, API access key, etc.) or a tenant in a multi-tenant application.

Client-side Rate Limiting

As a consumer of a service, we want to ensure that we are not overloading the service provider. Also, we don’t want to incur unexpected costs - either monetarily or in terms of quality of service.

This could happen if the service we are consuming is elastic. Instead of throttling our requests, the service provider might charge us extra for the additional load. Some even ban misbehaving clients for short periods. Rate limiting implemented by a consumer to prevent such issues is called client-side rate limiting.

When to Use RateLimiter?

resilience4j-ratelimiter is intended for client-side rate limiting.

Server-side rate limiting requires things like caching and coordination between multiple server instances, which is not supported by resilience4j. For server-side rate limiting, there are API gateways and API filters like Kong API Gateway and Repose API Filter. Resilience4j’s RateLimiter module is not intended to replace them.

Resilience4j RateLimiter Concepts

A thread that wants to call a remote service first asks the RateLimiter for permission. If the RateLimiter permits it, the thread proceeds. Otherwise, the RateLimiter parks the thread or puts it in a waiting state.

The RateLimiter creates new permissions periodically. When a permission becomes available, the thread is notified and it can then continue.

The number of calls that are permitted during a period is called limitForPeriod. How often the RateLimiter refreshes the permissions is specified by limitRefreshPeriod. How long a thread can wait to acquire permission is specified by timeoutDuration. If no permission is available at the end of the wait time, the RateLimiter throws a RequestNotPermitted runtime exception.

Using the Resilience4j RateLimiter Module

RateLimiterRegistry, RateLimiterConfig, and RateLimiter are the main abstractions in resilience4j-ratelimiter.

RateLimiterRegistry is a factory for creating and managing RateLimiter objects.

RateLimiterConfig encapsulates the limitForPeriod, limitRefreshPeriod and timeoutDuration configurations. Each RateLimiter object is associated with a RateLimiterConfig.

RateLimiter provides helper methods to create decorators for the functional interfaces or lambda expressions containing the remote call.

Let’s see how to use the various features available in the RateLimiter module. Assume that we are building a website for an airline to allow its customers to search for and book flights. Our service talks to a remote service encapsulated by the class FlightSearchService.

Basic Example

The first step is to create a RateLimiterConfig:

RateLimiterConfig config = RateLimiterConfig.ofDefaults();

This creates a RateLimiterConfig with default values for limitForPeriod (50), limitRefreshPeriod(500ns), and timeoutDuration (5s).

Suppose our contract with the airline’s service says that we can call their search API at 1 rps. Then we would create the RateLimiterConfig like this:

RateLimiterConfig config = RateLimiterConfig.custom()

If a thread is not able to acquire permission in the 1s timeoutDuration specified, it will error out.

We then create a RateLimiter and decorate the searchFlights() call:

RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter limiter = registry.rateLimiter("flightSearchService");
// FlightSearchService and SearchRequest creation omitted
Supplier<List<Flight>> flightsSupplier = 
    () -> service.searchFlights(request));

Finally, we use the decorated Supplier<List<Flight>> a few times:

for (int i=0; i<3; i++) {

The timestamps in the sample output show one request being made every second:

Searching for flights; current time = 15:29:39 847
Flight search successful
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 15:29:40 786
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 15:29:41 791
[Flight{flightNumber='XY 765', ... }, ... ]

If we exceed the limit, we get a RequestNotPermitted exception:

Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(       
  at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(
... other lines omitted ...

Decorating Methods Throwing Checked Exceptions

Suppose we’re calling FlightSearchService.searchFlightsThrowingException() which can throw a checked Exception. Then we cannot use RateLimiter.decorateSupplier(). We would use RateLimiter.decorateCheckedSupplier() instead:

CheckedFunction0<List<Flight>> flights = 
    () -> service.searchFlightsThrowingException(request));

try {
} catch (...) {
  // exception handling

RateLimiter.decorateCheckedSupplier() returns a CheckedFunction0 which represents a function with no arguments. Notice the call to apply() on the CheckedFunction0 object to invoke the remote operation.

If we don’t want to work with Suppliers , RateLimiter provides more helper decorator methods like decorateFunction(), decorateCheckedFunction(), decorateRunnable(), decorateCallable() etc. to work with other language constructs. The decorateChecked* methods are used to decorate methods that throw checked exceptions.

Applying Multiple Rate Limits

Suppose the airline’s flight search had multiple rate limits: 2 rps and 40 rpm. We can apply multiple limits on the client-side by creating multiple RateLimiters:

RateLimiterConfig rpsConfig = RateLimiterConfig.custom().

RateLimiterConfig rpmConfig = RateLimiterConfig.custom().

RateLimiterRegistry registry = RateLimiterRegistry.of(rpsConfig);
RateLimiter rpsLimiter = 
  registry.rateLimiter("flightSearchService_rps", rpsConfig);
RateLimiter rpmLimiter = 
  registry.rateLimiter("flightSearchService_rpm", rpmConfig);        

We then decorate the searchFlights() method using both the RateLimiters:

Supplier<List<Flight>> rpsLimitedSupplier = 
    () -> service.searchFlights(request));

Supplier<List<Flight>> flightsSupplier 
  = RateLimiter.decorateSupplier(rpmLimiter, rpsLimitedSupplier);

The sample output shows 2 requests being made every second and being limited to 40 requests:

Searching for flights; current time = 15:13:21 246
Searching for flights; current time = 15:13:21 249
Searching for flights; current time = 15:13:22 212
Searching for flights; current time = 15:13:40 215
Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted: 
RateLimiter 'flightSearchService_rpm' does not permit further calls 
at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(
at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(

Changing Limits at Runtime

If required, we can change the values for limitForPeriod and timeoutDuration at runtime:


This feature is useful if our rate limits vary based on time of day, for example - we could have a scheduled thread to change these values. The new values won’t affect the threads that are currently waiting for permissions.

Using RateLimiter and Retry Together

Let’s say we want to retry if we get a RequestNotPermitted exception since it is a transient error. We would create RateLimiter and Retry objects as usual. We then decorate a rate-limited Supplier and wrap it with a Retry:

Supplier<List<Flight>> rateLimitedFlightsSupplier = 
    () -> service.searchFlights(request));

Supplier<List<Flight>> retryingFlightsSupplier = 
  Retry.decorateSupplier(retry, rateLimitedFlightsSupplier);

The sample output shows the request being retried for a RequestNotPermitted exception:

Searching for flights; current time = 17:10:09 218
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]
2020-07-27T17:10:09.484: Retry 'rateLimitedFlightSearch', waiting PT1S until attempt '1'. Last attempt failed with exception 'io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls'.
Searching for flights; current time = 17:10:10 492
2020-07-27T17:10:10.494: Retry 'rateLimitedFlightSearch' recorded a successful retry attempt...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]

The order in which we created the decorators is important. It would not work if we wrapped the Retry with the RateLimiter.

RateLimiter Events

RateLimiter has an EventPublisher which generates events of the types RateLimiterOnSuccessEvent and RateLimiterOnFailureEvent when calling a remote operation to indicate if acquiring a permission was successful or not. We can listen for these events and log them, for example:

RateLimiter limiter = registry.rateLimiter("flightSearchService");
limiter.getEventPublisher().onSuccess(e -> System.out.println(e.toString()));
limiter.getEventPublisher().onFailure(e -> System.out.println(e.toString()));

The sample output shows what’s logged:

RateLimiterEvent{type=SUCCESSFUL_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.127+05:30}
... other lines omitted ...
RateLimiterEvent{type=FAILED_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.186+05:30}

RateLimiter Metrics

Suppose after implementing client-side throttling we find that the response times of our APIs have increased. This is possible - as we have seen, if permissions are not available when a thread invokes a remote operation, the RateLimiter puts the thread in a waiting state.

If our request handling threads are often waiting to get permission, it could mean that our limitForPeriod is too low. Perhaps we need to work with our service provider and get additional quota provisioned first.

Monitoring RateLimiter metrics helps us identify such capacity issues and ensure that the values we’ve set on the RateLimiterConfig are working well.

RateLimiter tracks two metrics: the number of permissions available (resilience4j.ratelimiter.available.permissions), and the number of threads waiting for permissions (resilience4j.ratelimiter.waiting.threads).

First, we create RateLimiterConfig, RateLimiterRegistry, and RateLimiter as usual. Then, we create a MeterRegistry and bind the RateLimiterRegistry to it:

MeterRegistry meterRegistry = new SimpleMeterRegistry();

After running the rate-limited operation a few times, we display the captured metrics:

Consumer<Meter> meterConsumer = meter -> {
  String desc = meter.getId().getDescription();
  String metricName = meter.getId().getName();
  Double metricValue =, false)
    .filter(m -> m.getStatistic().name().equals("VALUE"))
    .map(m -> m.getValue())
  System.out.println(desc + " - " + metricName + ": " + metricValue);

Here’s some sample output:

The number of available permissions - resilience4j.ratelimiter.available.permissions: -6.0
The number of waiting threads - resilience4j.ratelimiter.waiting_threads: 7.0

The negative value for resilience4j.ratelimiter.available.permissions shows the number of permissions that have been reserved for requesting threads. In a real application, we would export the data to a monitoring system periodically and analyze it on a dashboard.

Gotchas and Good Practices When Implementing Client-side Rate Limiting

Make the Rate Limiter a Singleton

All calls to a given remote service should go through the same RateLimiter instance. For a given remote service the RateLimiter must be a singleton.

If we don’t enforce this, some areas of our codebase may make a direct call to the remote service, bypassing the RateLimiter. To prevent this, the actual call to the remote service should be in a core, internal layer and other areas should use a rate-limited decorator exposed by the internal layer.

How can we ensure that a new developer understands this intent in the future? Check out Tom’s article which shows one way of solving such problems by organizing the package structure to make such intents clear. Additionally, it shows how to enforce this by codifying the intent in ArchUnit tests.

Configure the Rate Limiter for Multiple Server Instances

Figuring out the right values for the configurations can be tricky. If we are running multiple instances of our service in a cluster, the value for limitForPeriod must account for this.

For example, if the upstream service has a rate limit of 100 rps and we have 4 instances of our service, then we would configure 25 rps as the limit on each instance.

This assumes, however, that the load on each of our instances will be roughly the same. If that’s not the case or if our service itself is elastic and the number of instances can vary, then Resilience4j’s RateLimiter may not be a good fit.

In that case, we would need a rate limiter that maintains its data in a distributed cache and not in-memory like Resilience4j RateLimiter. But that would impact the response times of our service. Another option is to implement some kind of adaptive rate limiting. While Resilience4j may support it in the future, it is not clear when it will be available.

Choose the Right Timeout

For the timeoutDuration configuration value, we should keep the expected response times of our APIs in mind.

If we set the timeoutDuration too high, the response times and throughput will suffer. If it is too low, our error rate may increase.

Since there could be some trial and error involved here, a good practice is to maintain the values we use in RateLimiterConfig like timeoutDuration, limitForPeriod, and limitRefreshPeriod as a configuration outside our service. Then we can change them without changing code.

Tune Client-side and Server-side Rate Limiters

Implementing client-side rate limiting does not guarantee that we will never get rate limited by our upstream service.

Suppose we had a limit of 2 rps from the upstream service and we had configured limitForPeriod as 2 and limitRefreshPeriod as 1s. If we make two requests in the last few milliseconds of the second, with no other calls until then, the RateLimiter would permit them. If we make another two calls in the first few milliseconds of the next second, the RateLimiter would permit them too since two new permissions would be available. But the upstream service could reject these two requests since servers often implement sliding window-based rate limiting.

To guarantee that we will never get a rate exceeded from an upstream service, we would need to configure the fixed window in the client to be shorter than the sliding window in the service. So if we had configured limitForPeriod as 1 and limitRefreshPeriod as 500ms in the previous example, we would not get a rate limit exceeded error. But then, all the three requests after the first one would wait, increasing the response times and reducing the throughput. Check out this video which talks about the problems with static rate limiting and the advantages of adaptive control.


In this article, we learned how we can use Resilience4j’s RateLimiter module to implement client-side rate limiting. We looked at the different ways to configure it with practical examples. We learned some good practices and things to keep in mind when implementing rate limiting.

You can play around with a complete application illustrating these ideas using the code on GitHub.

Written By:

Saajan Nagendra

Written By:

Saajan Nagendra

Saajan is an architect with deep experience building systems in several business domains. With a clean and minimalist approach to design, he is passionate about code - the aesthetics of it and creating maintainable and flexible solutions. He enjoys both sharing with and learning from others.

Recent Posts

Inheritance, Polymorphism, and Encapsulation in Kotlin

In the realm of object-oriented programming (OOP), Kotlin stands out as an expressive language that seamlessly integrates modern features with a concise syntax.

Read more

Publisher-Subscriber Pattern Using AWS SNS and SQS in Spring Boot

In an event-driven architecture where multiple microservices need to communicate with each other, the publisher-subscriber pattern provides an asynchronous communication model to achieve this.

Read more

Optimizing Node.js Application Performance with Caching

Endpoints or APIs that perform complex computations and handle large amounts of data face several performance and responsiveness challenges. This occurs because each request initiates a computation or data retrieval process from scratch, which can take time.

Read more