Shared Max Connections

I think this may also be a problem resulting from recent 2.0 changes.

So, everything is compiling and will run.

However, I’m configured to share connections - because I’m not really simulating unique users. And I’m configured to use a max number of connections*

Right now I’m testing against our servers heartbeat service - the service used by our ELB to determine whether the service is there and responsive.

I’m getting responses back, and everything OKs …

… until I’ve reached the number of requests equal to the number of max connections.

So, for example, I have 10K total connections. I make 500 requests per second. After 20s, I begin getting exception spam from not having enough connections to run the new user simulations.

This use to work before. It was my belief that once I had an OK, that the connection that had been used to generate the OK was freed for a new user.

It doesn’t appear to be working that way right now. Is there a new setting that was introduced? Can someone confirm that they aren’t seeing the same thing on their machine?

I’ll follow-up with a stripped down example of what I’m doing.

Thanks,

–Spencer

  • When I began using gatling last year, I realized that as latency increased more connections would get spawned to backlog users. So I could never get a real honest sense of throughput and server performance. I began to use max connections so that when that number of connections was exceeded, I would realize there was some kind of performance problem - either the server was underperforming or there were some kind of latency hiccups or I needed more connections to accomodate the latency. It gives me a better sense of what kind of performance I’m really talking about. E.g. our server can handle 10K connections, but it can only handle 6K requests through that pool of connections. I pair these observations with CPU performance and memory usage on the server.

How long are your requests?

I mean that if you send 500 req/sec but your requests last 3 sec, you might need 1500 connections.

Then, is it AsyncHttpClient that complains, or the OS (file descriptors starvation)?

The requests are all < 500ms

The running requests is always reported as < 50, with completed requests registering as OK

Attached, you’ll find my pom (named run.xml, with my jvm args), my gatling.conf, and my Demo class which recreates identically the problem I’m having.

It looks like it’s akka that’s complaining. The stack trace that prints out once I run out of connections is:

[ERROR] [03/31/2014 14:53:28.409] [GatlingSystem-akka.actor.default-dispatcher-19] [akka://GatlingSystem/user/$d] Too many connections 10001
java.io.IOException: Too many connections 10001
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.doConnect(NettyAsyncHttpProvider.java:1035)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.execute(NettyAsyncHttpProvider.java:927)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:524)
at io.gatling.http.ahc.HttpEngine.startHttpTransaction(HttpEngine.scala:210)
at io.gatling.http.action.HttpRequestAction$.startHttpTransaction$1(HttpRequestAction.scala:36)
at io.gatling.http.action.HttpRequestAction$.startHttpTransaction(HttpRequestAction.scala:47)
at io.gatling.http.action.HttpRequestAction$$anonfun$sendRequest$2.apply(HttpRequestAction.scala:85)
at io.gatling.http.action.HttpRequestAction$$anonfun$sendRequest$2.apply(HttpRequestAction.scala:83)
at io.gatling.core.validation.Success.map(Validation.scala:31)
at io.gatling.http.action.HttpRequestAction.sendRequest(HttpRequestAction.scala:83)
at io.gatling.http.action.RequestAction$$anonfun$executeOrFail$1.apply(RequestAction.scala:33)
at io.gatling.http.action.RequestAction$$anonfun$executeOrFail$1.apply(RequestAction.scala:31)
at io.gatling.core.validation.Success.flatMap(Validation.scala:32)
at io.gatling.http.action.RequestAction.executeOrFail(RequestAction.scala:31)
at io.gatling.core.action.Failable$class.execute(Actions.scala:97)
at io.gatling.http.action.RequestAction.execute(RequestAction.scala:25)
at io.gatling.core.action.Action$$anonfun$receive$1.applyOrElse(Actions.scala:30)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:166)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

demo.tgz (3.99 KB)

Get it!

We have started cleaning up the gatling.conf file.
Many options could be both set up in protocol and in this file, too many ways to do the same thing, bringing confusion.

This has to be set up on the protocol now: https://github.com/excilys/gatling/blob/master/src/sphinx/http/http_protocol.rst#connections-sharing

I think we have to find a way to warn users of removed properties, let me think about it.

One option could be requiring a version to be set in each config file.

If the version number is not present or wrong, end the simulation, and print a message redirecting someone to migration notes.

Having an easy way of validating your config would be awesome, too. Or even migrating your config to a new version.

Good ideas.

We’ll see about that in 2M5, as we intend to split the configuration into multiple files, so that Gatling gets more modular and plugins can use the same configuration facilities.

https://github.com/excilys/gatling/issues/1764

Just 'cause I thought it’d be fun to mention - I’ve got gatling generating 72K requests per second. That’s about 4.3 Mill requests / minute.

Wow!

Could you share some details, please?

How many Gatling instances?
Do you share connections?
How many concurrent connections?
Mean response time?
Mean response payload?
What does the system under test look like?

One last question: did you notice some performance improvements in Gatling 2 snapshots in the last months?

Cheers,

Stéphane