What is the Max Load for Gatling?

Hi,

I am trying to simulate a scenario like this:

Max Load: 3.000.000 RPM

Scaling out:
I used 12 machines. Each machine thrown 250000 RPM

Script:

setUp(

 IpAddress.inject(rampUsers(250000) over (60 seconds))

).protocols(httpConf)

I am following this doc:
http://gatling.io/docs/2.1.7/general/simulation_setup.html

But I saw a couple of error with gatling:
“java.netConnectException: Cannot assign request address”.

We review the log of our application with splunk and our services don’t have any problem. This error was causing by gatling and the request didn’t arrive to our server.

I think that Gatling is not supporting that load or maybe I need to use another injection model.

I wonder If someone has tested with this load and it could finish the test ok?

Search the documentation for OS tuning. And then plan on having a bigger Gatling cluster. The problem has to do with local TCP/IP ports. When generating load in that kind of volume, ports become the scarce resource.

Hi,

Before running my test I changed the ulimit:

I checked that each machines had:

$ ulimit -n
65535

I did that with the following steps:

  1. Step : open the sysctl.conf and add this line fs.file-max = 65536

vi /etc/sysctl.conf add end of line fs.file-max = 65536
save and exit.

2)Step : vi /etc/security/limits.conf and add below the mentioned

  • soft nproc 65535
  • hard nproc 65535
  • soft nofile 65535
  • hard nofile 65535

I read this http://gatling.io/docs/2.0.0-RC2/general/operations.html#os-tuning

So I need to add this

Mac OS/X
On Mac you need to run the following commands in order to unbuckle the belts:
$ sudo sysctl -w kern.maxfilesperproc=300000
$ sudo sysctl -w kern.maxfiles=300000
$ sudo sysctl -w net.inet.ip.portrange.first=1024
You may also increase your ephemeral port range or tune your TCP timeout so that they expire faster.

I will try to run again my test with these changings.

Do you consider that I need to add another configuration?

To generate this much load, you should use multiple machines. Even if you can open up ports, network bandwith could skew your results.

We are scaling out the test. We are using 12 machines and it machine will throw 250000 request.

Do you recommend using more machines and low the load?

I have not worked with this huge injection models but with my limited exposure to high load, IMO your injection model is little aggressive.

Based on your injection model, you are injecting almost 4166 users per seconds. Instead, you can ease up ramp up time and create few rendezvous points to capture your response time that way. You may have to experiment a bit here. You can start off with 250000 over 5 mins and see if you still see java connection errors.

If you are starved for sockets you should look into ephemeral port range as well as TIME_WAIT settings (on Linux that would be tcp_fin_timeout and tcp_tw_recycle specifically)

On the servers receiving traffic it may also be a good idea to tune those settings, to fit your network infrastructure and stack setup. The defaults are quite conservative.

I tested again with 10 machines . This should throw: 500k RPM during 5 min → 2500000 request.

setUp(

test1.inject(constantUsersPerSec(833) during (500 seconds))

).protocols(httpConf)

and I got the same error. :frowning:

Also I followed the instructions from http://gatling.io/docs/2.0.0-RC2/general/operations.html#os-tuning

These are my configuration:
ulimit -n
65535
nano /proc/sys/net/ipv4/tcp_fin_timeout
30
nano /proc/sys/net/ipv4/ip_local_port_range
32768 65535
Also I changed

/etc/pam.d/sshd , I added session required pam_limits.so
/etc/ssh/sshd_config, I changed UseLogin yes

What am I doing wrong?

You’re trying to open too many concurrent connections, or trying to open them at a rate that’s too fast so the OS doesn’t have time to recycle ports.

That’s a limit on the OS and hardware, and even tuning has some limits.

Then, have you properly considered your injection profile? Do you really that many different users, resulting in that many different connections (as each user has its own connections)?

We need to simulate 500K rpm during 5 min. This produces 2500000 request.

As you can see in this thread we try with:

  1. test1.inject(rampUsers(8333) over (300 seconds))

  2. test1.inject(constantUsersPerSec(833) during (500 seconds))

I saw that there is a throttling option:
http://gatling.io/docs/2.0.0-RC2/general/simulation_setup.html?highlight=inject#throttling

Do you think that this option could work?

setUp(
test1.inject(atOnceUsers(200))
).throttle(

reachRps(833) in (10 seconds),
holdFor(5 minute)

Thanks in advance.

“We need to simulate 500K rpm during 5 min.”

No, this is not a proper requirement. It’s incomplete.
A load profile is a function of 2 parameters:

  • the rpm you throw at it
  • the number of connections you ask to connect (and close, and connect new ones)
    This second parameter is very important. Getting 500K rpm with opening and closing a new connection on each request, and doing so with a shared connection pool are very different.
    Some tools only let you do the latter, which, of course, gives better results, as it brings way less stress on both the system under test and the load test tool. BUT, this is a very unrealistic behavior in most cases. If you want to simulate web browsers, each of them would have its own connection pool that it doesn’t share with the others.

So, back to my original question: in the real world, how do user-agents (browsers, remote applications) connect to your system?

  • a swarm of web browsers, each of them sending only one requests, and then never connect again? In that case, you want many virtual users, and a scenario with one single request.
  • a swarm of web browsers, each them sending many requests? In that case, you want many virtual users, and a scenario with many requests, and maybe loops.
  • a few client applications that support keep-alive? In that case, you want just a few virtual users to run concurrent requests, and share the connection pool (see gatling http conf).
    If you’re in one of the 2 first cases, and you still can’t open enough connections, I’m afraid you’ll still need more servers. It’s really a matter of OS and hardware, and not a limit of the load test tool.
    If you share the connection pool, you’ll probably be able to get the rpm you want, but you’ll probably be actually testing in unrealistic conditions.