2.0.0-RC2 Increasing heap size issue

I was about to close down when I noticed the datapower hiccup seemed to occur, resulting request time outs. In this version the time outs are followed by this error:

20:49:23.535 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘publishConfirmation’ failed: java.util.concurrent.TimeoutException: Request timed out to .com/:443 of 60000 ms
[ERROR] [08/25/2014 20:49:23.583] [GatlingSystem-akka.actor.default-dispatcher-3] [TaskInvocation] null
java.lang.NullPointerException
at com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool.removeAll(DefaultChannelPool.java:280)
at com.ning.http.client.providers.netty.channel.ChannelManager.removeAll(ChannelManager.java:267)
at com.ning.http.client.providers.netty.channel.ChannelManager.closeChannel(ChannelManager.java:318)
at com.ning.http.client.providers.netty.request.NettyRequestSender.abort(NettyRequestSender.java:406)
at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:40)
at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:45)
at io.gatling.http.ahc.AkkaNettyTimer$$anonfun$1.apply$mcV$sp(AkkaNettyTimer.scala:55)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Cheers

Daniel

Damn, looks like I did something stupid.

The heap dump is in my Dropbox.

Your heap dump is only 10 Mb, so I guess the 3Gb you see on start up is your Xss, isn’t it?

Yes you are right, it has been a long day :slight_smile: I’ve stopped the test BTW, the server seems to have broken down completely now…

Stephane,

I gave the snapshot version another go today but the results don’t look good, even under low load the test fails completely (lots of

“[ERROR] [08/26/2014 14:16:30.458] [GatlingSystem-akka.actor.default-dispatcher-12] [TaskInvocation] null java.lang.NullPointerException”

in the logs) when the request time outs occur. I’m not sure if it’s caused by Gatling or the SUT, or an combination of the two :slight_smile: Do you need additional information or should I wait for the new snapshot version?

cheers

Daniel

Did you grab a new snapshot?
I thought I had this NPE fixed.

What does the stacktrace look like?

I grabbed it this morning, should I try fetching a new one?

There’s no stacktraces in my console, should I look some other place?

I’m trying to reproduce.
Will let you know.

I haven’t been able to reproduce so far.
Are you sure you don’t have some old jars still in your classpath, like multiple versions of AHC?

Stephane,

My Maven experience is limited I’m afraid. What I did is remove all gatling artifacts from my local repo and then rebuild my project. This is what my POM look like:

<?xml version="1.0" encoding="UTF-8"?>


4.0.0

cis-gatling
com.klm.cis
1.0.0-SNAPSHOT

sonatype Sonatype OSS https://oss.sonatype.org/content/groups/public never true sonatype Sonatype OSS https://oss.sonatype.org/content/groups/public true 2.10.4-RC3 yyyyMMdd_HHmm UTF-8 2.0.0-SNAPSHOT 3.1.6 org.scala-lang scala-library ${scala.version} com.klm.gatling gatling-klm 1.0.1-SNAPSHOT src/test/scala net.alchim31.maven scala-maven-plugin ${scala-maven-plugin.version} io.gatling gatling-maven-plugin ${gatling-maven-plugin.version}

And for gatling-klm

<?xml version="1.0" encoding="UTF-8"?>


4.0.0

com.klm.gatling
gatling-klm
1.0.1-SNAPSHOT

excilys Excilys Repository http://repository.excilys.com/content/groups/public never false sonatype-snapshots Sonatype Snapshot Repository http://oss.sonatype.org/content/repositories/snapshots/ true 1.7 1.7 2.10.4-RC3 UTF-8 2.0.0-SNAPSHOT 2.0.0-SNAPSHOT 3.1.6 io.gatling gatling-app ${gatling.version} io.gatling gatling-recorder ${gatling.version} io.gatling.highcharts gatling-charts-highcharts ${gatling.version} org.scala-lang scala-library ${scala.version} org.scalaj scalaj-http_2.10 0.3.14 io.gatling.highcharts gatling-charts-highcharts io.gatling gatling-app io.gatling gatling-recorder org.scalaj scalaj-time_2.10.2 0.7 commons-codec commons-codec 1.8 junit junit 4.11 test src/main/scala src/test/scala net.alchim31.maven scala-maven-plugin compile testCompile

Cheers

Daniel

Some clean up:

  • upgrade scala from 2.10.4-RC3 to 2.10.4, it looks good to me.

  • drop Excilys Repository

  • upgrade gatling-maven-plugin to 2.0.0-RC2

You did well removing the local artifacts as we still have problems with maven metadata on Sonatype.

Still investigating.

I cleaned up as you suggested and now the stacktrace is printed again:

[ERROR] [08/26/2014 17:35:02.683] [GatlingSystem-akka.actor.default-dispatcher-4] [TaskInvocation] null
java.lang.NullPointerException
  at com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool.removeAll(DefaultChannelPool.java:280)
  at com.ning.http.client.providers.netty.channel.ChannelManager.removeAll(ChannelManager.java:267)
  at com.ning.http.client.providers.netty.channel.ChannelManager.closeChannel(ChannelManager.java:318)
  at com.ning.http.client.providers.netty.request.NettyRequestSender.abort(NettyRequestSender.java:406)
  at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:40)
  at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:45)
  at io.gatling.http.ahc.AkkaNettyTimer$$anonfun$1.apply$mcV$sp(AkkaNettyTimer.scala:55)
  at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
  at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

So just like I thought, your libs are not up-to-date.

The stacktrace shows a NPE on line 280 in DefaultChannelPool.
But current Gatling snapshot uses async-http-client 1.9.0-BETA9, and there’s nothing on this line.

And I checked that the current bundle on Sonatype indeed ships async-http-client 1.9.0-BETA9.

You seem to be still using async-http-client-1.9.0-BETA8.

Hi Stephan,

I figured out what the problem was, the maven metadata was cached in our local maven proxy. I managed to build the snapshot version with the correct async-http-client version and the stacktraces have disappeared. I still see that Old Gen space of the heap is growing when time outs occur though, but I think you were still investigating this, correct?

Cheers

Daniel

I was suspecting much, I should have mentioned it, sorry.

I’ll investigate the Old Gen heap issue as soon as I can. It’s probably related to the AHC connection pool.
If you can get me a heap dump, it could help.

Then, is the heap usage still an issue for your test?

I was also wondering: what are you testing exactly? I mean, what do your virtual users do? Do they perform multiple requests, or are you just hitting some REST webservice? If the former, you could simply disable connection pooling.

Cheers

I was suspecting much, I should have mentioned it, sorry.

No problem, this is a good way for me to learn more about maven :slight_smile:

I’ll investigate the Old Gen heap issue as soon as I can. It’s probably related to the AHC connection pool.
If you can get me a heap dump, it could help.

I’ve started a new test and will create a heapdump as soon as a hiccup has occured and I see Old Gen growing.

Then, is the heap usage still an issue for your test?

For now: no. Based on the “closed system” test results the application has gone live. I prefer the “open system” simulation though and would really like to test all our applications that way. In order to replace LoadRunner as the standard load test tool in our team a tool should be able to run 48 hour endurance tests under “high load” using a reasonable amount of resources and should be able to survive SUT hiccups during those tests. Since I love working with Gatling, after being to condemned to using LoadRunner for almost 10 years now, I hope Gatling will meet our requirement and we will migrate more of our scripts to Gatling.

I was also wondering: what are you testing exactly? I mean, what do your virtual users do? Do they perform multiple requests, or are you just hitting some REST webservice? If the former, you could simply disable connection pooling.

The virtual users do one soap webservice call each only, I’ll try disabling the connection pooling. Just to be sure, I should use these settings?

I've started a new test and will create a heapdump as soon as a hiccup

has occured and I see Old Gen growing.

Thanks!

Then, is the heap usage still an issue for your test?

For now: no. Based on the "closed system" test results the application has
gone live. I prefer the "open system" simulation though and would really
like to test all our applications that way. In order to replace LoadRunner
as the standard load test tool in our team a tool should be able to run 48
hour endurance tests under "high load" using a reasonable amount of
resources and should be able to survive SUT hiccups during those tests.
Since I love working with Gatling, after being to condemned to using
LoadRunner for almost 10 years now, I hope Gatling will meet our
requirement and we will migrate more of our scripts to Gatling.

Great!
The good thing is that I exactly know how to definitively how to fix this:
https://github.com/AsyncHttpClient/async-http-client/issues/679

The virtual users do one soap webservice call each only, I'll try
disabling the connection pooling. Just to be sure, I should use these
settings?

allowPoolingConnections = false allowPoolingSslConnections = false

Yeah.
But then, how many clients use your SOAP webservice?
If you're not trying to simulate some browser traffic, you have to
carefully consider your connection model.

The way you modeled it until now: you open 150 new connections per second,
and let your server decide when to close them.
If you disable connection pooling: you still open 150 new connections per
second but Gatling will forcefully close them after each request (no
keep-alive)

Is that really the behavior you want? How many alive connections do you
expect?

I’ve started a new test and will create a heapdump as soon as a hiccup has occured and I see Old Gen growing.

Thanks!

It’s in my dropbox!

Then, is the heap usage still an issue for your test?

For now: no. Based on the “closed system” test results the application has gone live. I prefer the “open system” simulation though and would really like to test all our applications that way. In order to replace LoadRunner as the standard load test tool in our team a tool should be able to run 48 hour endurance tests under “high load” using a reasonable amount of resources and should be able to survive SUT hiccups during those tests. Since I love working with Gatling, after being to condemned to using LoadRunner for almost 10 years now, I hope Gatling will meet our requirement and we will migrate more of our scripts to Gatling.

Great!
The good thing is that I exactly know how to definitively how to fix this: https://github.com/AsyncHttpClient/async-http-client/issues/679

The virtual users do one soap webservice call each only, I’ll try disabling the connection pooling. Just to be sure, I should use these settings?

allowPoolingConnections = false allowPoolingSslConnections = false

Yeah.
But then, how many clients use your SOAP webservice?
If you’re not trying to simulate some browser traffic, you have to carefully consider your connection model.

The way you modeled it until now: you open 150 new connections per second, and let your server decide when to close them.
If you disable connection pooling: you still open 150 new connections per second but Gatling will forcefully close them after each request (no keep-alive)

Is that really the behavior you want? How many alive connections do you expect?

I just ran with connection pooling disabled and see the same behavior, I’ll upload a heap dump for that run as well, tomorrow. For the next test I’ll indeed discuss and investigate what expected number of connection will be. But is there a way in Gatling to limit the number of connections used and still emulate an “open system”, in other words put a load on the SUT that is not influenced by the performance of the SUT?

Good night

Daniel

The other heap dump is in my DropBox too.

Cheers

Daniel