Max qps testing of an API

Hi,

First I want to thank everyone who’s worked on gatling. I’m a sysadmin who’s struggling to try and test the maximum QPS an API can handle and gatling has been my constant friend for the last few days. However, having said that, I’m running into problems and it’s hard for me to determine if the issue is gatling or the service. I simply can’t get above ~4000 mean requests/sec no matter what I do. Here’s my current conf tweaks:

allowPoolingConnections = true

And I’ve been tweaking and tinkering with:

package nroute

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class BasicSimulation extends Simulation {

val httpConf = http
.baseURL(“http://10.124.151.134:8280”)
.maxConnectionsPerHost(1000)
.shareConnections

val tsvFeeder = tsv(“test2.json”, rawSplit = true).random
val time:Int = 120

val scn = scenario(“BasicSimulation”)
.during(time) {
feed(tsvFeeder)
.exec(http(“Send JSON”)
.post("/mopub")
.body(StringBody("${record}"))
.asJSON)
}

setUp(
scn.inject(
atOnceUsers(50)
)
).protocols(httpConf)

}

My test2.json is a single entry, like:

record
{json]

I’ve also been testing with a test.json that’s a 2G file of entries selected by random, but the results are identical. My questions are:

1/ What is the “best way” to simulate 1, then 50, non-human users posting events as fast as possible. That’s the only real thing I want to test, it simulates our use case best. No browsers will be hitting this. I think in the real world it’s going to open a single connection, using keep-alive, and just blast events in a constant stream as fast as they can be handled.
2/ What is the fastest mean average/sec people have seen, am I hitting some limits of gatling here?

Anything that would help me improve my testing would be VERY much appreciated as I’ve been bumping my head against this all week trying to working out the best way to test as fast as possible.

I simply can’t get above ~4000 mean requests/sec no matter what I do.

What’s your OS and set up look like (hopefully Linux)? How did you tune the TCP stack?

allowPoolingConnections = true

Don’t bother, that’s the default

.maxConnectionsPerHost(1000)

This limit handling comes with a cost, so you should remove it as you don’t use it (50 virtual users max)

.shareConnections

So this is indeed the way to simulate user-agent programs that can deal with keep-alive.

.body(StringBody("${record}"))

You can maybe go a but faster if you convert your JSON payload into UTF-8 bytes at the feeder level (so conversion happens upstream), so you can save on encoding during the run. But we’re talking micro tuning here.

1/ What is the “best way” to simulate 1, then 50, non-human users posting events as fast as possible. That’s the only real thing I want to test, it simulates our use case best. No browsers will be hitting this. I think in the real world it’s going to open a single connection, using keep-alive, and just blast events in a constant stream as fast as they can be handled.

share the connection pool like you’re doing.

2/ What is the fastest mean average/sec people have seen, am I hitting some limits of gatling here?

Depends on many things, one being payload size. My own personal record is ~40krps with very small payloads (~20 bytes).

Hi,

I simply can’t get above ~4000 mean requests/sec no matter what I do.

What’s your OS and set up look like (hopefully Linux)? How did you tune the TCP stack?

This bit is a little ugly, and I’ve been experimenting to try and see if this hurts me. I have Centos 7 (untuned, just out of the box) running with docker, and then 4 docker containers. One is an haproxy container and the other 3 are the application services. The idea being to round-robin between them for better throughput. Each app can have a configurable number of users but my understanding is we’re basically going to get a single (or couple) of connections from upstream and that’s all we get.

allowPoolingConnections = true

Don’t bother, that’s the default

.maxConnectionsPerHost(1000)

This limit handling comes with a cost, so you should remove it as you don’t use it (50 virtual users max)

Ah, good to know, this bit I was unsure on when experimenting with.

[snip]

2/ What is the fastest mean average/sec people have seen, am I hitting some limits of gatling here?

Depends on many things, one being payload size. My own personal record is ~40krps with very small payloads (~20 bytes).

This was definitely helpful, with the existing tests I had above I did another run bypassing all the haproxy stuff which dragged me up to 8500qps, which is a little better. At this point I definitely think any slowness is with our end, and not gatling. I did some tests with ab as well and got the same ~4500qps I was getting before.

Good luck, then!

Hi,

Have you tried, by any chance, to test your app without it being “containerized” through Docker ?
Its overhead, especially on IO, is known to be quite small, yet measurable.
It could be interesting to rule out Docker influence on your app’s performance :wink:

Cheers,

Pierre

Also, take special care to directly map heavy write files directly onto the underlying filesystem, as Docker uses copy-on-write. A typical example is Gatling’s simulation.log file.

I did! I was worried it was going to cause problems so I stuck to running a
single instance on bare metal and did the loadtesting from the same
machine, and got pretty much the same performance. I was definitely
worried that docker was somehow crippling things. I tested it with
--host=net as well as without docker and the performance was pretty much
identical. (Which was a relief as my plan was to dockerize everything we
have .)

Hi Stéphane,

Can I ask if the 40K RPS (personal record) was off one Linux box?

Adrian

That was just some simple ping-pong: https://groups.google.com/d/msg/gatling/iF2BfYsqrYg/LWpJUx7HivgJ
Gatling was sitting on my OSX laptop and server on a Linux desktop. Both 8 i7 cores.

This was 5 months ago, I suspect I can do better at this game now. :wink: I just reached 34k rps peak with both Gatling and server sitting on the same laptop.