I was about to close down when I noticed the datapower hiccup seemed to occur, resulting request time outs. In this version the time outs are followed by this error:
20:49:23.535 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘publishConfirmation’ failed: java.util.concurrent.TimeoutException: Request timed out to .com/:443 of 60000 ms
[ERROR] [08/25/2014 20:49:23.583] [GatlingSystem-akka.actor.default-dispatcher-3] [TaskInvocation] null
java.lang.NullPointerException
at com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool.removeAll(DefaultChannelPool.java:280)
at com.ning.http.client.providers.netty.channel.ChannelManager.removeAll(ChannelManager.java:267)
at com.ning.http.client.providers.netty.channel.ChannelManager.closeChannel(ChannelManager.java:318)
at com.ning.http.client.providers.netty.request.NettyRequestSender.abort(NettyRequestSender.java:406)
at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:40)
at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:45)
at io.gatling.http.ahc.AkkaNettyTimer$$anonfun$1.apply$mcV$sp(AkkaNettyTimer.scala:55)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
in the logs) when the request time outs occur. I’m not sure if it’s caused by Gatling or the SUT, or an combination of the two Do you need additional information or should I wait for the new snapshot version?
My Maven experience is limited I’m afraid. What I did is remove all gatling artifacts from my local repo and then rebuild my project. This is what my POM look like:
I cleaned up as you suggested and now the stacktrace is printed again:
[ERROR] [08/26/2014 17:35:02.683] [GatlingSystem-akka.actor.default-dispatcher-4] [TaskInvocation] null
java.lang.NullPointerException
at com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool.removeAll(DefaultChannelPool.java:280)
at com.ning.http.client.providers.netty.channel.ChannelManager.removeAll(ChannelManager.java:267)
at com.ning.http.client.providers.netty.channel.ChannelManager.closeChannel(ChannelManager.java:318)
at com.ning.http.client.providers.netty.request.NettyRequestSender.abort(NettyRequestSender.java:406)
at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:40)
at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:45)
at io.gatling.http.ahc.AkkaNettyTimer$$anonfun$1.apply$mcV$sp(AkkaNettyTimer.scala:55)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I figured out what the problem was, the maven metadata was cached in our local maven proxy. I managed to build the snapshot version with the correct async-http-client version and the stacktraces have disappeared. I still see that Old Gen space of the heap is growing when time outs occur though, but I think you were still investigating this, correct?
I was suspecting much, I should have mentioned it, sorry.
I’ll investigate the Old Gen heap issue as soon as I can. It’s probably related to the AHC connection pool.
If you can get me a heap dump, it could help.
Then, is the heap usage still an issue for your test?
I was also wondering: what are you testing exactly? I mean, what do your virtual users do? Do they perform multiple requests, or are you just hitting some REST webservice? If the former, you could simply disable connection pooling.
I was suspecting much, I should have mentioned it, sorry.
No problem, this is a good way for me to learn more about maven
I’ll investigate the Old Gen heap issue as soon as I can. It’s probably related to the AHC connection pool.
If you can get me a heap dump, it could help.
I’ve started a new test and will create a heapdump as soon as a hiccup has occured and I see Old Gen growing.
Then, is the heap usage still an issue for your test?
For now: no. Based on the “closed system” test results the application has gone live. I prefer the “open system” simulation though and would really like to test all our applications that way. In order to replace LoadRunner as the standard load test tool in our team a tool should be able to run 48 hour endurance tests under “high load” using a reasonable amount of resources and should be able to survive SUT hiccups during those tests. Since I love working with Gatling, after being to condemned to using LoadRunner for almost 10 years now, I hope Gatling will meet our requirement and we will migrate more of our scripts to Gatling.
I was also wondering: what are you testing exactly? I mean, what do your virtual users do? Do they perform multiple requests, or are you just hitting some REST webservice? If the former, you could simply disable connection pooling.
The virtual users do one soap webservice call each only, I’ll try disabling the connection pooling. Just to be sure, I should use these settings?
I've started a new test and will create a heapdump as soon as a hiccup
has occured and I see Old Gen growing.
Thanks!
Then, is the heap usage still an issue for your test?
For now: no. Based on the "closed system" test results the application has
gone live. I prefer the "open system" simulation though and would really
like to test all our applications that way. In order to replace LoadRunner
as the standard load test tool in our team a tool should be able to run 48
hour endurance tests under "high load" using a reasonable amount of
resources and should be able to survive SUT hiccups during those tests.
Since I love working with Gatling, after being to condemned to using
LoadRunner for almost 10 years now, I hope Gatling will meet our
requirement and we will migrate more of our scripts to Gatling.
Great!
The good thing is that I exactly know how to definitively how to fix this:
The virtual users do one soap webservice call each only, I'll try
disabling the connection pooling. Just to be sure, I should use these
settings?
Yeah.
But then, how many clients use your SOAP webservice?
If you're not trying to simulate some browser traffic, you have to
carefully consider your connection model.
The way you modeled it until now: you open 150 new connections per second,
and let your server decide when to close them.
If you disable connection pooling: you still open 150 new connections per
second but Gatling will forcefully close them after each request (no
keep-alive)
Is that really the behavior you want? How many alive connections do you
expect?
I’ve started a new test and will create a heapdump as soon as a hiccup has occured and I see Old Gen growing.
Thanks!
It’s in my dropbox!
Then, is the heap usage still an issue for your test?
For now: no. Based on the “closed system” test results the application has gone live. I prefer the “open system” simulation though and would really like to test all our applications that way. In order to replace LoadRunner as the standard load test tool in our team a tool should be able to run 48 hour endurance tests under “high load” using a reasonable amount of resources and should be able to survive SUT hiccups during those tests. Since I love working with Gatling, after being to condemned to using LoadRunner for almost 10 years now, I hope Gatling will meet our requirement and we will migrate more of our scripts to Gatling.
Yeah.
But then, how many clients use your SOAP webservice?
If you’re not trying to simulate some browser traffic, you have to carefully consider your connection model.
The way you modeled it until now: you open 150 new connections per second, and let your server decide when to close them.
If you disable connection pooling: you still open 150 new connections per second but Gatling will forcefully close them after each request (no keep-alive)
Is that really the behavior you want? How many alive connections do you expect?
I just ran with connection pooling disabled and see the same behavior, I’ll upload a heap dump for that run as well, tomorrow. For the next test I’ll indeed discuss and investigate what expected number of connection will be. But is there a way in Gatling to limit the number of connections used and still emulate an “open system”, in other words put a load on the SUT that is not influenced by the performance of the SUT?