Gatling Max throughput and JVM tuning

Hi all,

I was wondering what’s the biggest sustained requests per second people can get out of single Gatling instance.

Currently I can get a solid 5K requests/s using a simple load model with just simple JVM heap size tuning (GC policy and settings are whatever is default), running on Ubuntu 14.04 LTS
on x86_64 machine with 4 CPUs and 8GB, using the HotSpot JVM with Xms to 4g and Xmx 7g and Java version “1.7.0_80”. I also applied the recommended tuning
from Gatling docs (e.g. increase open-file descriptors, ephemeral sockets, etc).

This is pretty much the scenario I’m running:

`


val scn = scenario(SCENARIO_NAME)
    .group("PAGE")(
      exec(
        http("NAME")
          .httpRequest("GET", HTTP_URL)
          .check(status.is(200))))

  setUp(scn.inject(
    rampUsersPerSec(1.0).to(5000.0)
      .during(30 seconds),
    constantUsersPerSec(5000.0).during(120 seconds),
    rampUsersPerSec(5000.0).to(1.0)
      .during(30 seconds)
  ).protocols(httpProtocol))

`

The GET request is against a simple server written in Go. That returns {“message”: “Hello World”} as a response.

Here’s the data from the report:

Now, when I ramp-up the same scenario to about 7500 req/s, it starts out well but then I can see some big hiccups on the throughput (most likely GC).

I’m looking to share experiences, best practices, and/or tunings for the JVM GC, to get the most out of a single Gatling instance. Any suggestions?

Hi Carlos,

most likely GC

Don’t shoot in the dark: enable GC logging and check it.
If that’s indeed the case, try tuning depending on what you see in the logs (could be that you’re promoting too fast, so you might have to tune the young size or the survivor ratio).
The Friends of jClarity group is definitively the place for all GC things, don’t go ask there without putting some effort first.

You can try upgrading to latest JDK 1.8.0 and switch from CMS to G1.

Please send feedback once you have results, I’m sure everyone would be interested.

Cheers,

Please send feedback once you have results, I’m sure everyone would be interested.

Hi Carlos,

If you can commit the Go app to github, I will also do some tests on Cloud instances, including JDK 8 and GC G1. And a table of how you intend to present your results, would be great.

Aidy

True, while I know I’m kind of shooting in the dark by trying to infer causes just on the pattern of throughput, it’s more of a hypothesis for me which I have to confirm or deny. :slight_smile:

I decided to go to first to the Gatling community to make sure I’ve done at least the current recommended best practices first, before I invest a lot of time in tuning and profiling.

Thanks for jClarity link, I wasn’t aware such group existed.

How about if we start a Google Doc spreadsheet, to document this effort, and different tunings.

Probably we should also come up with different scenarios, not just the simple Open-Model GET requests.

I uploaded the simulation I’m using to Github here: https://github.com/meteorfox/gatling-benchmarking
also here’s the server I’m using for testing: https://github.com/meteorfox/HelloGo

Here’s the spreadsheet: https://docs.google.com/spreadsheets/d/1Cnds2jz_EYXvuTJAaJzoS8i4hqrY2TukLbHwmHwinzI/edit?usp=sharing

True, while I know I'm kind of shooting in the dark by trying to infer
causes just on the pattern of throughput, it's more of a hypothesis for me
which I have to confirm or deny. :slight_smile:

Then, it does look like GC pauses :slight_smile:
But like every GC tuning, there's no silver bullet.
As your vitual users have a very short lifespan, I really suspect premature
promotion (= Young being too small).

I decided to go to first to the Gatling community to make sure I've done
at least the current recommended best practices first, before I invest a
lot of time in tuning and profiling.

We currently target JDK7, so we can't default to G1 (it seems you need at
least 1.8.0_40 to get acceptable results).
We'll probably switch to G1 when we'll dump JDK7 support, probably for
Gatling 2.3.
There's a chance that G1 becomes the default for JDK9.

Thanks for jClarity link, I wasn't aware such group existed.

Kirk Pepperdine, Charlie Hunt, Gil Tene, Martijn Verburg...
I also recommend the mechanical sympathy group.

Oh, and if you really want to have fun, you can even use Zing on AWS: https://aws.amazon.com/marketplace/pp/B00M25ZRMC/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1430495874599

At least I knew about Mechanical Sympathy, and Martin Thompson, and Gil Tene. Plus I own Charlie Hunt’s Java Performance book (Great book). :slight_smile:

About Java 7, since Today, Java 7 is already EOL (end-of-life). http://www.oracle.com/technetwork/java/javase/eol-135779.html#Java6-end-public-updates

About Java 7, since Today, Java 7 is already EOL (end-of-life).
http://www.oracle.com/technetwork/java/javase/eol-135779.html#Java6-end-public-updates

I know, but tons of people out there don't upgrade. We'll drop JDK7 support
in about 6 months, I guesss.

I’m already over 15,000 req/s, and waiting for the 30,000 reqs/s.

https://docs.google.com/spreadsheets/d/1Cnds2jz_EYXvuTJAaJzoS8i4hqrY2TukLbHwmHwinzI/edit#gid=391094691

So far it’s been working great, I’m seeing some jitter at the higher req/s, but it’s still pretty solid to me.

My current hypothesis is that the big hiccups I was experiencing earlier are because of slower response times it was causing to accumulate active users, and pending requests very quickly, and putting a lot of pressure on the GC.

This is a different environment from where I ran the test I showed above, and the networking and CPUs are better.

The results for 30K req/s are in, and with this one, it’s show indications of hiccups in the request rate. Interestingly, the hiccups are correlated with the large response times, which might add evidence to my hypothesis above. (accumulating pending requests/users). Here I should probably start looking at the GC logs.

By the way, for those interested, I configured my server to listen on all network interfaces. This way I can increase the number of available ephemeral ports, which is 64K per IPv4 address. In my case, I have server with a public and private networks, each with different IPs, then I configured Gatling to load balance across these 2 IP address (using baseURLs()).

Also, I set the number of file descriptors in the system to 300000, not 65535, you can have a lot of file-descriptors since it only depends on your available memory, not just connections.