[Gatling 2.0.0-M3] Actor AsyncHandlerActor doesn't support message ReceiveTimeout

Frode_Nerbraten · July 9, 2013, 12:19pm

Hi!

When an http request times out I get the error below. Is this the correct behavior? Shouldn’t AsyncHandlerActor handle ReceiveTimeout when it specifies a timeout in the preStart (using configuration.http.ahc.requestTimeOutInMs)? I think this might be the cause that my scenario sometimes never completes.

[ERROR] [07/09/2013 14:00:08.180] [GatlingSystem-akka.actor.default-dispatcher-14] [akka://GatlingSystem/user/$V/$f] Actor AsyncHandlerActor doesn’t support message ReceiveTimeout
java.lang.IllegalArgumentException: Actor AsyncHandlerActor doesn’t support message ReceiveTimeout
at io.gatling.core.action.BaseActor.unhandled(Actions.scala:30)
at akka.actor.ActorCell$$anonfun$receiveMessage$1.apply(ActorCell.scala:425)
at akka.actor.ActorCell$$anonfun$receiveMessage$1.apply(ActorCell.scala:425)
at io.gatling.http.ahc.AsyncHandlerActor$$anonfun$receive$1.applyOrElse(AsyncHandlerActor.scala:54)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:425)

From the source of AsyncHandlerActor: (Neither AsyncHAndlerActor nor BaseActor handles timeout messages)

class AsyncHandlerActor extends BaseActor {

override def preStart {
context.setReceiveTimeout(AsyncHandlerActor.timeout)
}

def receive = {
case OnCompleted(task, response) => processResponse(task, response)

case OnThrowable(task, response, errorMessage) => ko(task, task.session, response, errorMessage)
}

Tank you and best regards,
Frode

Excilys · July 9, 2013, 12:49pm

Hi Frode,

There’s actually 2 (major) bugs:

https://github.com/excilys/gatling/issues/1282 that causes your simulation to never end
https://github.com/excilys/gatling/issues/1283 because we’ve changed the way those AsyncHandlerActor are used, so we shouldn’t use the same timeout as before

Do you think you could try those fixes (either you build yourself, or I upload a package)?

I’ll talk with the other committers, but I think we’ll release a M3a or something…

Cheers,

Stéphane

Frode_Nerbraten · July 9, 2013, 1:02pm

Thank you Stéphane!

I built it from master and it works perfectly. Timeouts are reported correctly as:
java.util.concurrent.TimeoutException: Request reached time out of 120000 ms after 120724 ms

And the non-termination issue is gone.

Great job!

Regards
Frode

Excilys · July 9, 2013, 3:56pm

Glad to hear, and thanks a lot for your feedback!
We’ll release a M3a tomorrow, as those bugs are really blocking ones.

Cheers,

Stéphane

Stefan_Magnus_Landro · July 10, 2013, 9:37am

Hi there,

Even after upgrading to M3a, some of our simulations are not ending. Could it be due to either of these issues:

javax.net.ssl.SSLException: Received fatal alert: handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1639) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1607) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1776) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1080) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758) ~[na:1.7.0_04]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.7.0_04]
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1225) [netty-3.6.6.Final.jar:na]
Wrapped by: java.net.ConnectException: Received fatal alert: handshake_failure to https://whereever
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:103) ~[async-http-client-1.7.19.20130706.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1417) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1293) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:913) [netty-3.6.6.Final.jar:na]

11:13:58.048 [WARN ] i.g.h.a.AsyncHandler - Request ‘Whatever’ failed
java.lang.NullPointerException: null
at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.isContentAlwaysEmpty(HttpClientCodec.java:179) ~[netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpMessageDecoder.readHeaders(HttpMessageDecoder.java:514) ~[netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:193) ~[netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.decode(HttpClientCodec.java:143) ~[netty-3.6.6.Final.jar:na]

Excilys · July 10, 2013, 12:02pm

Looks like a nasty one...

Having the simulation not ending means that some users are lost along the
workflow and don't reach the end that would make the engine stop as
expected. Could you double check that you use M3a (sorry if that's a stupid
request)?

Regarding the handshake_failure, I would expect to see it also logged by
the AsyncHandler, is that the case?
Regarding the NPE, it's logged by the AsyncHandler and should be handled
gracefully.

Do you see those errors in simulation.log too?

I've deployed a custom gatling-http jar into Excilys' repo. Could you try
it out, please? Version is 2.0.0-M3temp1 (force it with a
depencyManagement, for example)?

Frode_Nerbraten · July 10, 2013, 12:33pm

Unfortunately I’m also still having non terminating simulations. It appears to be caused by an exception thrown from a bodyString.transform() block. I think you will be able to reproduce it using something like:
bodyString.transform(_ => throw new Exception(“hello”))

I just wrapped my code in a Try().toOption so it’s not a big deal, but maybe it should be handled in gatling as well.

Regards,
Frode

Excilys · July 10, 2013, 12:42pm

Thanks Frode, I’ll try to reproduce ASAP.
What’s the exception you get exactly?

Excilys · July 10, 2013, 12:47pm

Oh, I think I misunderstood.
Do you mean that you did something custom with transform that crashed with an Exception, and that caused the message to be lost?

Frode_Nerbraten · July 10, 2013, 12:54pm

Yes. That is exactly what I’m doing. I’m using spray-json to parse the response to case-classes and storing them in the session. So if a parsing exception occurs, that user never finish. Maybe it should be my responsibility to prevent this

Frode

Stefan_Magnus_Landro · July 10, 2013, 1:01pm

Looks like a nasty one…

Having the simulation not ending means that some users are lost along the workflow and don’t reach the end that would make the engine stop as expected. Could you double check that you use M3a (sorry if that’s a stupid request)?

Double checked. Using correct version.

Regarding the handshake_failure, I would expect to see it also logged by the AsyncHandler, is that the case?

Yes, seems like it:

11:59:33.524 [WARN ] i.g.h.a.AsyncHandler - Request ‘Whatever’ failed
javax.net.ssl.SSLException: Received fatal alert: handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1639) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1607) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1776) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1080) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758) ~[na:1.7.0_04]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.7.0_04]
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1225) [netty-3.6.6.Final.jar:na]
Wrapped by: java.net.ConnectException: Received fatal

Regarding the NPE, it’s logged by the AsyncHandler and should be handled gracefully.

Do you see those errors in simulation.log too?

REQUEST Start dokumentinnsending og last opp dokumenter 47 Last opp dokument 1373450373158 1373450373158 1373450373158 1373450373235 KO Received fatal alert: handshake_failure to https://e34jbsl00680.devillo.no:8443/henvendelse/services/domene.Brukerdialog/OppdatereHenvendelsesBehandlingService_v1

REQUEST Start dokumentinnsending og last opp dokumenter 290 Last opp dokument 1373450538407 1373450538407 1373450538407 1373450538408 KO java.lang.NullPointerException

I’ve deployed a custom gatling-http jar into Excilys’ repo. Could you try it out, please? Version is 2.0.0-M3temp1 (force it with a depencyManagement, for example)?

I’m on it. The thing is, it is an intermittent error, and might take some to possibly reproduce. I’ll kepp you posted.

Stefan_Magnus_Landro · July 10, 2013, 1:17pm

Reproduced with the M3temp1 dependency. The maven plugin reports 1 running user and the simulation never finishes.

The only “fancy” things we’re doing are:

checks with regexes
injecting a custom saml assertion into the session like so:

val injectSamlAssertion = (session: Session) => {
session(“id”).validate[String].map(id =>
session.setAll(Map(“id” → id, “saml” → saml.generateSignedAssertion(id)))
)
}

exec(injectSamlAssertion)

Frode_Nerbraten · July 10, 2013, 1:24pm

Stefan, this is a longshot, but I have experienced the same NPE and we are probably working in similar environments
Are those multipart uploads?
Is BigIp involved?

Does it help if you set .connection(“close”) in the httpProtocolConfig?

I have wasted two days trying to figure out some weird issue with multipart, BigIp and Connection: keep-alive. It seems that BigIp sends additional response-data on the same connection after the reqular response. This causes the next request to fail. And since the buffer contains response data without any http version and status code (just html) I get the NPE that you mention. Either a pause or Connection: close seems to work around it.

Regards,
Frode

Excilys · July 10, 2013, 1:37pm

I’ve pushed:

the changes in M3temp1: https://github.com/excilys/gatling/commit/15a04ef8892fc9cf5024c6505fd1089363196598
the fix for Frode’s problem: https://github.com/excilys/gatling/commit/38037cd9e6d3ffe5fd251add6c557c837bf9c3a1
Still investigating why this is not sufficient for Stefan’s issue. Message loss for sure, but where?!

@Frode: regarding your BigIp problem, did you check this with something like Charles or Wireshark? I’d hate this to be a kind of buffer overflow in Gatling/AHC/Netty…

Stefan_Magnus_Landro · July 10, 2013, 1:38pm

Yes, multipart uploads and loads of them, but currenlty no big ip.
Since we’re using a crappy app server and are terminating ssl in the same crappy app server, I don’t want to close the connections but shareConnection() in order to reuse them (ssl handshake is quite heavy and kills our cpu).
we saw some HTTP 505s earlier, but then we introduced some pauses to get rid of the problem. We still pause after every multipart request actually.
Could it be that AHC is not flushing the buffer or something? Could it be that the server mixes up the requests? Request boundaries not ok or something?

Excilys · July 10, 2013, 1:45pm

@Stefan, with M3temp1, any message in the logs that would look like “AsyncHandlerActor crashed on message”?

Frode_Nerbraten · July 10, 2013, 2:04pm

I’ve pushed:

the changes in M3temp1: https://github.com/excilys/gatling/commit/15a04ef8892fc9cf5024c6505fd1089363196598

the fix for Frode’s problem: https://github.com/excilys/gatling/commit/38037cd9e6d3ffe5fd251add6c557c837bf9c3a1

Thank you!

@Frode: regarding your BigIp problem, did you check this with something like Charles or Wireshark? I’d hate this to be a kind of buffer overflow in Gatling/AHC/Netty…

I’m signing off now, but I’ll describe it in details tomorrow morning!

Have a nice day!
Frode

Excilys · July 10, 2013, 2:05pm

Thanks a lot for your help!

Stefan_Magnus_Landro · July 10, 2013, 2:37pm

No, I only see stuff like

15:01:07.498 [WARN ] i.g.h.a.AsyncHandler - Request ‘Opprett dokumentbehandling’ failed for user 122
java.lang.NullPointerException: null
at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.isContentAlwaysEmpty(HttpClientCodec.java:179) ~[netty-3.6.6.Final.jar:na]

Can find any references to “AsyncHandlerActor” in my logs.

Cheers

Stefan

Stefan_Magnus_Landro · July 10, 2013, 2:41pm

That was can’t find…

Topic		Replies	Views
[WARN ] i.g.h.a.AsyncHandler - Request 'Connexion' failed: Request reached time out of 60000 ms after 60016 ms Gatling (Open-Source)	4	138	July 18, 2014
Intermittent Gatling User is hung in active state due to Actor io.gatling.core.action.SingletonFeed@53a18ab2 crashed on message Gatling (Open-Source)	1	134	June 8, 2015
Actor ConsoleDataWriter doesn't support message Display Gatling (Open-Source)	4	109	May 28, 2014
Java.util.concurrentTimeout Exception : Request timed out to TestEnv of 60000 ms Gatling (Open-Source)	9	255	September 2, 2014
Gatling 2.0.0.20130625 Gatling (Open-Source)	2	106	June 28, 2013

[Gatling 2.0.0-M3] Actor AsyncHandlerActor doesn't support message ReceiveTimeout

Related topics