Hi,
I work for a major media company and we have applications that undergo a high rate of requests. There are numerous back-end services and I am currently load testing one.
I have been asked to test a service of up to 750 rps and I thought of ramping up to this amount.
rampUsersPerSec(10) to(750)
However, I am told I am running out of ephemeral ports, even after following the Gatling operations page and increasing the number of ports.
more ports for testing
sudo sysctl -w net.ipv4.ip_local_port_range=“1025 65535”
I am now not sure on how best to continue with this.
Many Thanks
Aidy
Out of curiosity, do you have keep-alive turned on in the requests? And before the scenario dies, how many requests did you successfully accomplish?
Hi John,
Keep-alive is not explicitly specified but looking through the requests in debug mode, it seems to be on by default. I probably got a few hundred request per second without cache - but it is difficult to note because those metrics are not as yet presented in the summary.
Aidy
So, any chance you had 64,000 (ish) open connections, and ran out of port numbers? How long had your scenario been running, and how many successful connections had it made when things blew up?
If the number is well below 64,000 connections in a relatively short period, then I would suspect the steps taken to “unleash” the OS have not taken effect, and I would research starting there.
We established in the other thread
25719 TIME WAIT
So based on that only the limit isn’t breached.
Short answer: consider reducing the tcp_fin_timeout linux setting.
At least to test whether that helps, there are some risks with that method though.
Long answer:
In the defect Stéphane was sense checking your workload model was ok.
I think the first question Is what is the workload of this service in plain English?
We know several 100 RPS,
But nothing currently about the clients of the service , whether they make several requests or just one for example. This will affect the connection throughput.
The time wait is just a result of the high connection throughput. Normally in the live system this would not present itself as a problem because all those clients would likely be on separate machines.
If the workload is well defined and the scripts reflect it and there’s still an issue with ports then look into Running 2 or more load injectors for example.
I you follow my suggestion on graphite/netcat/awk then multiple injectors can be reported on in realtime easily also.
So, any chance you had 64,000 (ish) open connections, and ran out of port numbers? How long had your scenario been running, and how many successful connections had it made when things blew up?
If the number is well below 64,000 connections in a relatively short period, then I would suspect the steps taken to “unleash” the OS have not taken effect, and I would research starting there.
Alex wrote,
Short answer: consider reducing the tcp_fin_timeout linux setting
I have reduced the TIME_WAIT setting from 60 - 30 - ‘echo 30 | sudo tee /proc/sys/net/ipv4/tcp_fin_timeout’, which has been successful in this case.
You are correct again however, in that I also need to model my simulation better to reflect real world activity (currently gradually ramping-up)
I have also seen the new graphite documentation and that will be the next thing I look into.
Many Thanks
Aidy
Aidy, Alex: I did my best to explain open and close models and how they impact connections: https://github.com/gatling/gatling/issues/2270
I hope thing will get clearer with that.
Alex, Stéphane,
To implement a closed model load and still achieve X requests per second, could I not loop on atOnceUsers() and increment the integer that is passed into it? Would this be any different from sharing the connection pool?
Thanks
Aidy
To implement a closed model load and still achieve X requests per second,
could I not loop on atOnceUsers() and increment the integer that is passed
into it?
WDYM? You can use rampUsers.
Would this be any different from sharing the connection pool?
Yes, if you have pauses.
By default, users have their own connections, so those are idle during
pauses.
If you share the connection pool, connections are usually never idle, and
you end up using less connections for getting the same rps.