AsyncHandler - Request xxx failed : Too many open files???

Hey guys!

How are you doing?
I’m having a strange problem…
I’m doing a simulation with a constant rate of 100 users p/sec, during 120. And after maybe 40 seconds, every request start failing systematically throwing the following error:

12:25:28.169 [WARN ] i.g.h.a.AsyncHandler - Request ‘echo service’ failed: Demasiados archivos abiertos
12:25:28.170 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘echo service’ failed : Demasiados archivos abiertos

(where “Demsasiados archivos abiertos” means “Too many open files”)

Do you know what is happening in here??? I’m running on Ubuntu 13.10 using java 1.7.0_45.

Thanks!!

https://github.com/excilys/gatling/wiki/HTTP#wiki-tuning

As you are a Ubuntu user, you may want to read this : http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/#Linux

However,

How can that be happening? The scenario I’m running only sends 100 requests per seconds, and all are finished almost immediately. Under “running” column I have pretty much all the time the value 0, so there souldn’t be any open file increment. It looks like something remains open? Could it be because of the FIN_WAIT state of the socket?

In fact it’s pretty high already:

alex@alex:/etc$ sysctl fs.file-max
fs.file-max = 782747

In fact it’s pretty high already:

alex@alex:/etc$ sysctl fs.file-max
fs.file-max = 782747

And what is “ulimit -n” returning you ?

my 2 cents:

  • You start 100 new users/sec. Even if you scenario is very short, connections don’t get closed from the client side, even if the user is done. That’s how browsers behave too.
  • You don’t have a proper idle keep-alive time out on your server (5 ou 10 sec is a common value), so the server doesn’t close them either, hence the too many open files.

What kind of client behavior do you want to mimic? Many browsers or just a few bashing clients?

Awesomeness.

I thought the connection was closing but FIN_WAIT made the socket count as still open.

ulimit -Hn returned 4096. And the app was having problems while approaching the 4000. So it all made sence. I modified /etc/security/limits.conf and added

  • hard nofile 65535

Rebooted, and now:
ulimit -Hn = 65535
ulimit -Sn = 1024
ulimit -n = 1024

By setting the hard limit to 65535 was enough. I don’t understand how the soft limit limits anything, if everything works just by changing the hard limit.
I’m simulating many clients (particulary from smartphones), that’s why I opened different users.

I also found that I had spray.can.server.idle-timeout set to 65 seconds, in an old try to dodge an ugly timeout I was having once. I’ll remove the property so connections don’t linger.

Thanks guys!

Great that it all makes sense.
Have fun!