keep-alive and large number of users

Martin_Kovacik · February 19, 2015, 10:05am

Hello,

I’m testing a very simple scenario using gatling 2.1.4 with 3000 users (with 20 second pauses). When keep-alive is enabled on both gatling side and server side (15 seconds on the server side) I’m getting timeouts, bad response times and generally I think inaccurate results.

Exactly same situation with 3000 users and keep-alive disabled either on gatling side or server side the response times are 10x better and with no timeouts.

I suspect the problem is the connection pooling used for keep-alive connections. Is it possible to change size for the connection pool ?

Martin

slandelle · February 19, 2015, 10:48am

Not a Gatling issue, your your system under test can’t deal with 3.000 concurrent open connections. Gatling doesn’t have such limitations (it’s non blocking, except for DNS resolution that relies on the standard JDK impl which is blocking), as long as OS is fine (typically file descriptor limits).

Martin_Kovacik · February 19, 2015, 7:05pm

I think my system is fine I have no problem in opening 3000 connections (and keep them open). For example this is wrk doing 3000 keep-alive connections. netstat -tnp showing 3000+ open connections, which is 2000 more than gatling running 3000 users with 20s pauses.

wrk -c3000 -t100 -d 60 --latency http://192.168.1.51
Running 1m test @ http://192.168.1.51
100 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 541.67ms 40.16ms 708.17ms 91.16%
Req/Sec 55.73 22.98 169.00 75.10%
Latency Distribution
50% 533.97ms
75% 546.50ms
90% 560.47ms
99% 700.18ms
224821 requests in 1.00m, 1.57GB read
Socket errors: connect 0, read 0, write 0, timeout 29544
Requests/sec: 3745.35
Transfer/sec: 26.70MB

Also the problem gets even bigger with longer server side keep-alive timeouts.

slandelle · February 19, 2015, 7:25pm

From what I see, the usage is a bit different: you ask wrk to keep 3000 open connections, even when you don’t use them, it it probably reconnects in the background (so basically, your keep-alive timeout doesn’t matter much). With Gatling, you have 3000 users, that might need reconnecting when trying to perform a request after a pause. And response time accounts for reconnection. I also wonder how much wrk tells to that it had to retry to reconnect because some tentative fails. Just wondering, I’m not familiar enough with wrk intrinsics.

If you can share a reproducer, I could investigate.

Topic		Replies	Views
Gatling performance test causing tons of sockets in CLOSE_WAIT Gatling (Open-Source)	10	239	November 28, 2017
Timeouts when going above 110TPS Gatling (Open-Source)	11	166	May 31, 2013
Connection timed out when hitting more than 10000 QPS via gatling, Gatling (Open-Source)	3	240	May 22, 2019
One more tread about shared connections, keep-alive, sockets reuse etc. Gatling (Open-Source)	8	228	December 28, 2015
Result of changing settings Gatling (Open-Source)	7	116	March 27, 2015

keep-alive and large number of users

Related topics