using OS parameters but hitting "Failed to open a socket"

Hi all,

I have followed all the advice at http://gatling.io/docs/current/general/operations/ and can confirm that my sysctl values are as written there, i.e.

net.ipv4.tcp_max_syn_backlog = 40000

net.core.somaxconn = 40000
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_mem = 134217728 134217728 134217728
net.ipv4.tcp_rmem = 4096 277750 134217728
net.ipv4.tcp_wmem = 4096 277750 134217728
net.core.netdev_max_backlog = 300000

and that my ulimits are

~ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 249580
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 249580
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

However, when running gatling against my application, I start to have connection problems just after 3k requests / second. This is typical past 3k:

[info] ---- Errors --------------------------------------------------------------------
[info] > j.n.ConnectException: Failed to open a socket. 99110 (60.30%)
[info] > j.u.c.TimeoutException: Request timeout to localhost/127.0.0.1 34100 (20.75%)
[info] :9000 after 60000 ms
[info] > j.n.ConnectException: Cannot assign requested address: localho 31039 (18.88%)
[info] st/127.0.0.1:9000
[info] > j.n.ConnectException: connection timed out: localhost/127.0.0. 110 ( 0.07%)
[info] 1:9000
[info] ================================================================================

The server doesn’t seem to be particularly stressed out, but any bind operations to add more user requests are failing.

I also tried using the IPv4 properties (no difference)

Is there anything else I can do to get past this limit?

I can watch /proc/sys/fs/file-nr when it starts to get stressed and it looks like

140384 0 6374169

I can also watch /proc/net/sockstat and it looks like

sockets: used 127172
TCP: inuse 56387 orphan 0 tw 90 alloc 126709 mem 60821
UDP: inuse 4 mem 10
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

what should I be looking at increasing? I’m not really sure how to interpret this. I understand the gatling values set the network “backlog” to 40000 but I’m not really sure what that means in terms of the absolute maximum connections.

BTW, the app does open a lot of connections, and I am hosting the “external” services on the same box (in docker containers), so I wouldn’t be surprised if every user connection resulted in another 10 or 20 connections being made in the box somewhere.

Best regards,
Sam

Hey Sam,

No idea what your use case is.
It looks like you’re throwing tons of distinct virtual users and you run out of ephemeral ports.
As you have “net.ipv4.tcp_tw_reuse”, I don’t expect tons of connections in timewait.
So my best guess (ss is your friend here) is that all those connections are alive and you need to scale out or add some IP aliases.

Cheers,

Thank you for replying Stéphane,

My use case is:

  • I have a single webserver
  • I have several third party services in the background
  • a virtual user does GET to the webserver. The webserver hits the third party services with various combinations of GET and PUT
  • both the webserver and the third party services are running inside native docker containers

Nothing should be creating a listener port… the webserver and each third party service operates off a single port.

However, you’re quite right about ephemeral ports. It’s been so long since I studied TCP that I’d forgotten that each fresh client request creates an ephemeral port.

This exercise has revealed that one of the parts of the app is not using connection pooling so has been generating (lots of!) fresh connections on every incoming user request, so thank you I’ll work on fixing that ASAP :smiley:

Glad it helped :slight_smile: