Executing large number of requests using few connections

I’m trying to benchmark an API whose expected load will consist of a large number of requests sent over significantly fewer connections. For example, using wrk we were able to achieve 500 requests per second using 10-20 connections. Is it possible to simulate this same kind of load using gatling? I’ve tried using the shareConnections option, but this resulted in approximately one connection being opened for every request, so about 400 to 500 connections used throughout the test. Is there any way to reduce the number of active connections even further, while still maintaining a high request rate? Here’s my injection logic:

setUp(scn.inject(constantUsersPerSec(targetQPS) during(duration seconds))).throttle(reachRps(targetQPS.toInt) in (rampRate seconds), holdFor(duration seconds))

I’m guessing that the best gatling can do here is limit connections to one per user, which would still be far too many connections for our expected load. Thank you for any help. I understand this isn’t exactly what gatling is designed for.

Hi Jeremy,

maybe I don’t understand the problem at your hand but I think this is pretty much the default behaviour of Gatling

  • please see https://gatling.io/docs/current/http/http_protocol/
  • IMHO you fire up a bunch of virtual user and use “http.maxConnectionsPerHost(1)” you should have one connection per user
  • the only thing left to do is to have a loop/feeder so one virtual user sends a large number of requests

Thanks in advance,

Siegfried Goeschl

shareConnections lets Gatling share the connection pool amongst virtual users.
With such set up, when a virtual user doesn’t use a connection, it can be used by another virtual user instead of sitting idle in the pool.

Obviously, this helps reducing the number of concurrent connections only if connections would be idle otherwise, typically when virtual users perform pauses between requests.
If you don’t have any pause in your scenario, the number of concurrent connections will be equal to the number of concurrent virtual users.

AFAIK, amongst other tricks, wrk uses HTTP pipelining: it sends multiple requests at the same time on the same socket.
This technique sure is efficient (you can pack multiple requests inside a single TCP segment) but you have to make sure that:

  • your server properly supports pipelining and doesn’t mess up with the responses order

  • the clients you’re trying to simulate really perform pipelining (browsers don’t), otherwise you’re testing a behavior different from what happens on your real system and are comparing apples and oranges