Problems running Gatling with Amazon ELB

Hi Nadine,

I run my Gatling tests on EC2 and for real-time console metrics I’ve replaced the awk script with a python script.

https://github.com/BBC/gatling-load-tests#real-time-metrics-optional

Aidy

Nice. My Python skills are a little rusty, but it would be a welcome diversion to see what you’ve done. I glanced through a few of your Simulation classes and saw you have setup many of the scenarios with one request per user.

Have you been able to hit your target of 750 RPS with this kind of setup?

I’m in a similar kind of situation and am currently experimenting with adding more requests per user to see how well it runs.

@Marius:
Were you ever able to get your script to run at 1000 RPS?

If so, what was your final resolution for this problem?

Hi Nadine,

the only way I found was running each user in a loop. Having one user
per request did not work with ELB.

Hi Marius,

I’d love to hear about how you deal in your application with what looks to me like very serious shortcomings in ELB.
I mean that having the ELB frequently change IP:

  • breaks DNS caching, so clients have to disable it, causing a overhead (all the more that most DNS requests implementations, such as Java standard one, are blocking)

  • breaks connection pooling, so clients have to disable keep-alive
    Cheers,

Stéphane

Hi Marius:
I have an update.

This week I made a change in my scripts that dramatically increased the traffic I can send to my service that sits behind an ELB.
I don’t have any special settings in my gatling.conf file, nothing has been configured in the Service Under Test, and my scenarios are designed to send 1 request per user.

The only thing I changed was to add a line to my httpProtocol definition — .shareConnection.

After doing this, I was able to routinely send 1000+ requests/sec to the service without any http errors whatsoever.
I’ve found that around 1200 I might see some http 500 or 503/504 errors, but there have not been any errors that indicate problems with network card saturation on the script client machine side (which for me is Jenkins)

The work load contains a few requests that use a .repeat or .pause construct, but the majority do not.
It’s basically lots of single-request users.