Hi all,
I have followed all the advice at http://gatling.io/docs/current/general/operations/ and can confirm that my sysctl values are as written there, i.e.
net.ipv4.tcp_max_syn_backlog = 40000
net.core.somaxconn = 40000
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_mem = 134217728 134217728 134217728
net.ipv4.tcp_rmem = 4096 277750 134217728
net.ipv4.tcp_wmem = 4096 277750 134217728
net.core.netdev_max_backlog = 300000
and that my ulimits are
~ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 249580
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 249580
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
However, when running gatling against my application, I start to have connection problems just after 3k requests / second. This is typical past 3k:
[info] ---- Errors --------------------------------------------------------------------
[info] > j.n.ConnectException: Failed to open a socket. 99110 (60.30%)
[info] > j.u.c.TimeoutException: Request timeout to localhost/127.0.0.1 34100 (20.75%)
[info] :9000 after 60000 ms
[info] > j.n.ConnectException: Cannot assign requested address: localho 31039 (18.88%)
[info] st/127.0.0.1:9000
[info] > j.n.ConnectException: connection timed out: localhost/127.0.0. 110 ( 0.07%)
[info] 1:9000
[info] ================================================================================
The server doesn’t seem to be particularly stressed out, but any bind operations to add more user requests are failing.
I also tried using the IPv4 properties (no difference)
Is there anything else I can do to get past this limit?
I can watch /proc/sys/fs/file-nr when it starts to get stressed and it looks like
140384 0 6374169
I can also watch /proc/net/sockstat and it looks like
sockets: used 127172
TCP: inuse 56387 orphan 0 tw 90 alloc 126709 mem 60821
UDP: inuse 4 mem 10
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
what should I be looking at increasing? I’m not really sure how to interpret this. I understand the gatling values set the network “backlog” to 40000 but I’m not really sure what that means in terms of the absolute maximum connections.
BTW, the app does open a lot of connections, and I am hosting the “external” services on the same box (in docker containers), so I wouldn’t be surprised if every user connection resulted in another 10 or 20 connections being made in the box somewhere.
Best regards,
Sam