Amazon Elastic Load Balancer (ELB) and HTTP chunked response

Hi,

Just to share a problem that we encounter when running Gatling against an ELB,

There is a known issue in ELB:
“If the client cancels an HTTP request that was initiated with a Transfer-Encoding: chunked header, there is a known issue where the load balancer forwards the request to the instance even though the client canceled the request. This can cause backend errors.”

This creates a client error in the case of Gatling because by default Gatling doesn’t pile up response chunks unless a check is defined on the response,

If you only check for the HTTP response code and the response is chunked you will have some error like:

16:46:08 java.lang.IllegalA6:46:08 15:46:08.458 [DEBUG] i.g.h.a.AsyncHandler - Request ‘Some request’ failed for user 2131514281386885109-0
16:46:08 java.lang.IllegalArgumentException: invalid version format: 0
16:46:08 at org.jboss.netty.handler.codec.http.HttpVersion.(HttpVersion.java:94) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpResponseDecoder.createMessage(HttpResponseDecoder.java:104) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:191) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.decode(HttpClientCodec.java:143) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.decode(HttpClientCodec.java:127) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.4.Final.jar:na]
16:46:08 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
16:46:08 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
16:46:08 at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
16:46:08 15:46:08.460 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘Some request’ failed: java.lang.IllegalArgumentException: invalid version format: 0rgumentException: invalid version format: 0

Here “0” is the end of the chunk sent by the ELB and that Gatling is not expecting, it is processed as an HTTP response and does not match the HTTP version header.

Note that this can be a random problem depending on the server that may decide to chunk or not a response.

The solution is (as describe in the Gatling documentation) to add a check on the body or use the disableResponseChunksDiscarding or shareConnections options.

It might be a good idea to emphasis this case in the documentation.

Keep up the good work

Regards

ben

Hi,

I don’t think it has anything to do with chunk discarding or sharing the connection pool.
I think it has to do with connection pooling in general.

Could you please provide more details?

  • Is it the request that used Transfer-Encoding: chunked, or the response?
  • Does disabling connection pooling solves the issue?
  • How does cancel happen? timeout?
    Cheers,

The simulation send many times the same kind of request (non chunked) without errors, at some point the tomcat server decides to chunk the response and I have the “invalid version format” error.
The chunked response contains 2 TCP packets and the error is on the second one which contains the end of the chunk “0\r\n”
The error happens only when using an ELB.

So far adding .disableResponseChunksDiscarding to .check(status.in(201) seems to fix the problem.

I will give a try with a shareConnections but in my simulation there is only one user on this part.

Gatling marks the request as KO with "java.lang.IllegalArgumentException: invalid version format: 0 " and continue using another connection AFAIU.
I can send you the tcpdump if needed.

ben

Does this happen because ELB keeps on sending chunks from the previous request after the connection was closed and the port was finally reused for a new one after tcp wait?

Setting a check or disabling chunk discarding has nothing to do with the issue. At best, it has some influence on some race condition.
Gatling works downstream of Netty.

Ok my conclusion was wrong disableResponseChunksDiscarding or checking the response body DO NOT fix the problem.
It does improve the situation but the problem is still there I just reproduce the error, sorry for the noise.

I am testing the shareConnections solution and will let you know (after much more shooting) if it works.

After more testing:

  • shareConnections like disableResponseChunksDiscarding do improve the situation but don’t fix the problem.
  • disabling the connection pooling (gatling.conf allowPoolingConnections = false) do fix the problem.

I will try to provide a procedure to reproduce the problem.
Regards

ben

I suspected as much.
Your issue is that your reading a chunk that comes from the previous request on this socket, so the client can’t find the expected response line.

It can be:

  • a server/network issue, with a lingering/duplicate packet
  • a JDK/Netty/AHC issue, with the connection being offered to the pool too soon
    A reproducer will tell.

Hi,

At the end Amazon have acknowledge an ELB bug,
The ELB HTTP listener re-write the chunked transfer encoded response into single packet content length response.
They were able to reproduce the problem, the ELB generates a packet with an invalid content length response that raises the Jetty/Gatling error.

Possible workaround to use Gatling against an ELB (until it is fixed) are:

  • Change the ELB configuration to use a TCP listener instead of an HTTP listener (not possible if you need affinity)
  • Don’t use client Keep alive (gatling.conf allowPoolingConnections = false)
  • Don’t use server chunked response

Thanks Stéphane for your support

ben

Thanks a lot for the feedback.
Could you please let us know once the ELB issue is solved?

Hi guys, is there any new about this? do you happen to have the Amazon case? we’ve experienced the same and been struggling a whole week before hitting the same wall. We use okhttp and fails parsing an http request which first line is “0”

Thanks in advance.