Modelling back-end service requests


I’m trying to understand how best to realistically model the maximum load on a back-end service (let’s call it BE). My back-end service is 4 nodes sitting behind a load balancer and is not exposed to the internet. The back-end service has two main clients (let’s call them A and B). A and B are front end web sites sitting on the internet. A has 4 nodes and B has 4 nodes. Each node of A and B is a tomcat web application which uses Apache HttpClient to make types of requests (of type A and B) to the back-end service via the load balancer. HttpClient has a connection pool with max 150 connections.

How would I realistically model the load on BE so it mimics the maximum I could expect in the production environment? I’m thinking that I should create a scenario A, and then inject 4 * 150 users (which will represent the theoretical maximum number of connections I can expect from each of the four A nodes) and replay the types of requests I can expect. Do the same for client B in a scenario B. Point gatling at the load balancer (rather than my individual back-end service nodes), and measure the throughput and response time. From what I understand this is essentially a closed system with a fixed set of concurrent users - is this correct, and is this the best way to load test such a back-end service?

Thanks for your help, and for such a great tool!

Here is another approach. Why not isolate Web and BE to a unit each and run the test to get a lowest common metric on each tier?

Internet → 1 Web (150 con pool max) → 1 BE

If you do this first you will know -

  1. Optimal gatling user load to set in order to max out Web tier connection pool of 150.
  2. See how much head room you will have in BE given Web tier is maxed out with connection.

Once you get this numbers you can add a box each on each tier and compare with the previous run. You will feel much better with this approach given that you have some actual data to make to predict what would be an optimal setting. Also this will help you with capacity planning.

Good luck!


I would configure Gatling HTTP client to have a single client behavior, instead of the default behavior where each virtual user is like a browser with its own connection pool:

Note that, IIRC, Apache HttpClient enqueues requests when there’s no connection available in the pool. If you’re using that, I don’t think you’ll be able to reproduce.


Thanks for the feedback and Stéphane. If all my requests are sent to the same host, what is the difference between:

.maxConnectionsPerHost method (scala DSL)
gatling.conf - gatling.http.ahc.maxConnectionsPerHost

gatling.conf - gatling.http.ahc.maxConnections

The latter two are AsyncHttpClient internals. Their scope is the whole client instance, not virtual users. We should probably stop exposing them.