Gatling scenario never ends for some users

Hi,
I recently run into a weird problem with a Gatling simulation scenario for testing an eCommerce site.
Basically it happens that sometime (quite often recently), the simulation doesn’t finish properly and a few active users (1, 2 sometimes 4) are still logged, but I can’t see anymore requests sent on the simulation.log. So I’ve taken a Thread Dump on the gatling injector server and I’ve noticed that there are more or less a thread for each of the active users that doesn’t complete the navigation with the following pattern:

at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:206)

at scala.collection.immutable.HashMap.elemHashCode(HashMap.scala:80)

at scala.collection.immutable.HashMap.computeHash(HashMap.scala:89)

at scala.collection.immutable.HashMap.get(HashMap.scala:54)

at scala.collection.GenMapLike$$anonfun$liftedTree1$1$1.apply(GenMapLike.scala:120)

at scala.collection.GenMapLike$$anonfun$liftedTree1$1$1.apply(GenMapLike.scala:119)

at scala.collection.Iterator$class.forall(Iterator.scala:905)

at scala.collection.AbstractIterator.forall(Iterator.scala:1336)

at scala.collection.IterableLike$class.forall(IterableLike.scala:75)

at scala.collection.AbstractIterable.forall(Iterable.scala:54)

at scala.collection.GenMapLike$class.liftedTree1$1(GenMapLike.scala:119)

at scala.collection.GenMapLike$class.equals(GenMapLike.scala:118)

at scala.collection.AbstractMap.equals(Map.scala:59)

at io.gatling.http.cache.Cookies.equals(Cookies.scala:33)

at io.gatling.http.cache.PermanentRedirectCacheKey.equals(PermanentRedirectCacheSupport.scala:33)

at scala.collection.immutable.Map$Map1.get(Map.scala:107)

at io.gatling.core.util.cache.Cache.get(Cache.scala:79)

at io.gatling.core.util.cache.SessionCacheHandler.io$gatling$core$util$cache$SessionCacheHandler$$$anonfun$1(SessionCacheHandler.scala:64)

at io.gatling.core.util.cache.SessionCacheHandler$$Lambda$323/1463745981.apply(Unknown Source)

at scala.Option.flatMap(Option.scala:171)

at io.gatling.core.util.cache.SessionCacheHandler.getEntry(SessionCacheHandler.scala:64)

at io.gatling.http.cache.PermanentRedirectCacheSupport$class.permanentRedirect1$1(PermanentRedirectCacheSupport.scala:55)

at io.gatling.http.cache.PermanentRedirectCacheSupport$class.io$gatling$http$cache$PermanentRedirectCacheSupport$$permanentRedirect(PermanentRedirectCacheSupport.scala:64)

at io.gatling.http.cache.PermanentRedirectCacheSupport$class.applyPermanentRedirect(PermanentRedirectCacheSupport.scala:75)

at io.gatling.http.cache.HttpCaches.applyPermanentRedirect(HttpCaches.scala:24)

at io.gatling.http.action.sync.HttpTx$.startWithCache(HttpTx.scala:55)

at io.gatling.http.action.sync.HttpTx$.start(HttpTx.scala:94)

at io.gatling.http.ahc.ResponseProcessor.redirect$1(ResponseProcessor.scala:274)

at io.gatling.http.ahc.ResponseProcessor.processResponse(ResponseProcessor.scala:317)

at io.gatling.http.ahc.ResponseProcessor.onCompleted(ResponseProcessor.scala:66)

at io.gatling.http.ahc.AsyncHandler.io$gatling$http$ahc$AsyncHandler$$$anonfun$1(AsyncHandler.scala:136)

at io.gatling.http.ahc.AsyncHandler.io$gatling$http$ahc$AsyncHandler$$$anonfun$1$adapted(AsyncHandler.scala:134)

at io.gatling.http.ahc.AsyncHandler$$Lambda$331/434896179.apply(Unknown Source)

at io.gatling.http.ahc.AsyncHandler.withResponse(AsyncHandler.scala:127)

at io.gatling.http.ahc.AsyncHandler.onCompleted(AsyncHandler.scala:134)

at io.gatling.http.ahc.AsyncHandler.onCompleted(AsyncHandler.scala:47)

at org.asynchttpclient.netty.NettyResponseFuture.getContent(NettyResponseFuture.java:189)

at org.asynchttpclient.netty.NettyResponseFuture.done(NettyResponseFuture.java:223)

at org.asynchttpclient.netty.handler.HttpHandler.finishUpdate(HttpHandler.java:58)

at org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:159)

at org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:187)

at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:76)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:435)

at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)

at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)

at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:250)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1228)

at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1039)

at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)

at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

at java.lang.Thread.run(Thread.java:748)

So I’ve guessed from the stack trace that maybe was a problem related to infinite redirect and then I’ve added also the .maxRedirects(n) on the http protocol to avoid an infinite redirect loop but the problem is still there and, by the way, I found no traces in the log that a lot of redirects were issued particularly toward the end of the tests when I had just the 1,2 active users.

Did you have any idea what can be the problem?

When the issue starts to happens I’ve also a huge spike on the CPU of the gatling servers that then stay up till the end of the test.

gatlingInjectorCPU.png

You can see that around 19:00 the issue happens (just 1 user remains active at the end of the test) but with a server with just 2 cpu we had almost 50% of the cpu (1 core) fully used, and this is probably due to this thread that we see still RUNNABLE at the end of the test (like if it’s consuming all CPU on an infinite loop or similar).

Does anyone experienced a similar problem?

Oh sorry, I’m on gatling 2.5

Hi,

Oh sorry, I’m on gatling 2.5

Nope. There’s no 2.5

Could you please upgrade to 3.0.0-RC3 and check if your issue is fixed?

Stéphane Landelle

GatlingCorp CTO
slandelle@gatling.io

gatlingInjectorCPU.png

apoligize…was 2.2.5

regarding the migration, I can give it a try, but probably not for the engagement on this customer…I won’t have enough time.

Thanks the same

Gatling 2.2.5 was released in April 2017…
Latest stable is 2.3.1 and it will reach end of life by 2 weeks.

Hi Stéphane,
since you gave me the same advice for both problems (never ending simulation and timeout to not connected requests) I convinced the customer to let me invest some time on the upgrade. It was less painful than expected and now I’m having the tests running on 3.0 gatling version. Regarding the problem of scenario never ending I’ve found out this promosing error in the new report:

Invalid redirect to the same request

This was probably causing the infinite loop. Now my problem is to find out which are those requests…on gatling 2.2.5 I had 2 options for this:

  • I’ve enriched the simulation log with more information using the extraInfoExtractor . But this seems to be not an option anymore. So now if I grep the simulation log for the error I don’t see the url that was called

  • there were a couple of loggers: io.gatling.http.ahc and io.gatling.http.response that when set at TRACE or DEBUG levele were logging the request and response body of calls / failed calls.

Can you advice how to achieve something similar with version 3.0?

Thanks a lot

  • there were a couple of loggers: io.gatling.http.ahc and io.gatling.http.response that when set at TRACE or DEBUG levele were logging the request and response body of calls / failed calls.

https://github.com/gatling/gatling/blob/master/gatling-core/src/main/resources/logback.dummy#L11-L13

Stéphane Landelle

GatlingCorp CTO
slandelle@gatling.io

Thanks a lot Stéphane,
that was what I was looking for.