Differences in throughput between 1.5.2 and 2.0.0-RC5

Hey All,

Hopefully I’ve just made a stupid error, but I tried benchmarking gatling between 1.5.2 and 2.0.0-RC5 against the same /ping => pong service and got much different peak throughputs.
The 1.5.2 version peaks at ~23k RPS and the 2.0.0-RC5 peaks out at about 7.5k RPS.

Futher info:

  1. Done on a macbook pro (java 1.7.0-67), invoked via gatling.sh
  2. Server on same box as load driver
  3. Both runs were done against a totally warm service.
  4. Yes we would like to be able to drive more than 20k. We can and have done this over the network on a harder workload with 1.5.2 gatling.
  5. I’ve tried tweaking http settings and didn’t manage to get any wins.

The main differences between the two are:

  1. Gatling version
  2. .users(1000) vs .inject(atOnceUsers(1000))

The scripts used on each:

– 152 —
val httpConf = httpConfig.baseURL(“http://localhost:9114”)

val scn =
scenario(“Ping Simulation”)
.during(10 seconds)(
exec(http(“ping”)
.get("/ping")
.check(status.in(Seq(200)))
.check(bodyString.is(“pong\n”)))
)
.users(1000)
.protocolConfig(httpConf)

setUp(scn)

— 200RC5 —

val httpConf = http.baseURL(“http://localhost:9114”)

val scn = scenario(“Ping”)
.during(10 seconds)(
exec(http(“ping”)
.get("/ping")
.check(status.is(200))
.check(bodyString.is(“pong\n”))))

setUp(scn.inject(atOnceUsers(1000)).protocols(httpConf))

What does your gatling.conf file looked like in 1.5.2?
Are you sure you weren’t sharing the connection pool amongst virtual users?
Is there any way you can share a reproducer?

What does your gatling.conf file looked like in 1.5.2?
Are you sure you weren’t sharing the connection pool amongst virtual users?
Is there any way you can share a reproducer?

Mmmm, I see you’re not setting the Connection header to keep-alive. Is this intended?
I’m starting to wonder if keep-alive was somehow enforced in Gatling 1.

I have some great news!

There was indeed a huge performance regression since Gatling 1 for your use case (also impacted generate web browsing in a less obvious fashion).
The fix was actually very simple (very stupid located mistake): https://github.com/gatling/gatling/issues/2223

On my test, Gatling 2 went from 66% slower to 59% faster than Gatling 1.5!

All this last year hard work indeed paid off!!! :slight_smile:

Thanks for reporting this.

Cheers,

Stéphane

Just one last thing: my test case is the same as yours: small service replying “pong\n”, Gatling and server on same host, OSX, latest JDK7.
Once Gatling is warm (after 15sec), I get ~33kRPS. :slight_smile:

Awesome! Thanks for digging in. I was trying to grab a snapshot to try out your change but it looks like they are lagging the repo’s commits by a little bit. I’ll follow up early next week if anything goes awry.

Regards,
John

Mistake in our build chain.
Snapshots are back on Sonatype.

Awesome, thanks for fixing that.

I ran the test this morning on my laptop and peak throughput went from 7.5k rps with RC5 to 38k rps on the SNAPSHOT! Thanks again for digging in to an issue that could have easily been ignored!

That’s very impressive!

Could you share how you tuned your OSX, please?
Is there also any way you could share/explain your ping service? Spray? How is it tuned?

Note, that is 38k peak (i.e. looking at the throughput graph over time) rather than the mean. The means were in the 27k-29k range.

Yeah here’s my info:Macbook pro, retina 2.7ghz core i7 (4 cores, 8 virtual cores), 16gb memory, ssd, OS X 10.9.4.
You’ve seen my simulation. I use the default gatling config. I don’t think I’ve really adjusted much on OSX beyond what you guys suggest for open file descriptors and whatnot.

The service itself is actually really boring Jetty8 + a servlet not built on any framework with no ssl. Unfortunately it is tightly bound to our internal service infrastructure so I can’t usefully put up the jetty part of the extraction, but here’s the servlet:

import javax.servlet.ServletException;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;

/**

  • An HTTP servlet which outputs a {@code text/plain} {@code “pong”} response.
    */
    public class PingServlet extends HttpServlet {
    private static final String CONTENT_TYPE = “text/plain”;
    private static final String CONTENT = “pong”;

@Override
protected void doGet(HttpServletRequest req,
HttpServletResponse resp) throws ServletException, IOException {
resp.setStatus(HttpServletResponse.SC_OK);
resp.setHeader(“Cache-Control”, “must-revalidate,no-cache,no-store”);
resp.setContentType(CONTENT_TYPE);
final PrintWriter writer = resp.getWriter();
try {
writer.println(CONTENT);
} finally {
writer.close();
}
}
}

Here are the settings on the embedded jetty8 we use:
val acceptors = Some(1)

val acceptQueueSize = None
val minThreads = Some(3)
val maxThreads = Some(10)

with no real access logging going on. If you are using any sort of access or per request logs you need to make sure to turn on async flushing / async appenders or you’ll probably spend a bunch of time waiting for disk syncs.

We’ve also had some issues with back pressure for async servlets where the server lets a lot of work into the system and then proceeds to thrash on execution contexts. You’ll notice my min and maxThreads above are pretty low so whatever you use to provide back pressure in spray or other framework make sure you use it. I tweaked the minThreads and maxThreads numbers up to 8/16 and the throughput stayed about the same.

Hi John,

You mention you managed to get around 27k-29k RPS. Do you think this load is sustainable using Gatling with the hardware specification you gave of your laptop?

Aidy

It means we should use SNAPSHOT version, not RC5?

If you can afford to run snapshots, yes.

So we have similar results. :slight_smile:
I also peak to 38k rps with my Spray app. Mean is about 36k rps once the JVM is warm (after ~20s).

There’s a lot of problems with the test setup that I specified. I mostly do it as a smell test to figure out where certain limits are in ideal conditions. Things definitely shift as you add the network and request latency of real service. There are a ton of variables to play with which can adjust these numbers. Your best bet is to get multiple load drivers and services and scale throughput that way rather than relying on a single laptop to get an accurate picture.

Hi Stephan,

Will this fix also result in a lower CPU usage?

Cheers

Daniel

Absolutely.

OK thanks, back to SNAPSHOT then :slight_smile:

Don’t bother, RC6 will be out in 1hr top.