Amazon Elastic Load Balancer (ELB) and HTTP chunked response

Benoit_Delbosc · December 17, 2015, 11:19am

Hi,

Just to share a problem that we encounter when running Gatling against an ELB,

There is a known issue in ELB:
“If the client cancels an HTTP request that was initiated with a Transfer-Encoding: chunked header, there is a known issue where the load balancer forwards the request to the instance even though the client canceled the request. This can cause backend errors.”

This creates a client error in the case of Gatling because by default Gatling doesn’t pile up response chunks unless a check is defined on the response,

If you only check for the HTTP response code and the response is chunked you will have some error like:

16:46:08 java.lang.IllegalA6:46:08 15:46:08.458 [DEBUG] i.g.h.a.AsyncHandler - Request ‘Some request’ failed for user 2131514281386885109-0
16:46:08 java.lang.IllegalArgumentException: invalid version format: 0
16:46:08 at org.jboss.netty.handler.codec.http.HttpVersion.(HttpVersion.java:94) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpResponseDecoder.createMessage(HttpResponseDecoder.java:104) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:191) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.decode(HttpClientCodec.java:143) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.decode(HttpClientCodec.java:127) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92) ~[netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:na]
16:46:08 at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.4.Final.jar:na]
16:46:08 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
16:46:08 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
16:46:08 at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
16:46:08 15:46:08.460 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘Some request’ failed: java.lang.IllegalArgumentException: invalid version format: 0rgumentException: invalid version format: 0

Here “0” is the end of the chunk sent by the ELB and that Gatling is not expecting, it is processed as an HTTP response and does not match the HTTP version header.

Note that this can be a random problem depending on the server that may decide to chunk or not a response.

The solution is (as describe in the Gatling documentation) to add a check on the body or use the disableResponseChunksDiscarding or shareConnections options.

It might be a good idea to emphasis this case in the documentation.

Keep up the good work

Regards

ben

slandelle · December 17, 2015, 11:38am

Hi,

I don’t think it has anything to do with chunk discarding or sharing the connection pool.
I think it has to do with connection pooling in general.

Could you please provide more details?

Is it the request that used Transfer-Encoding: chunked, or the response?
Does disabling connection pooling solves the issue?
How does cancel happen? timeout?
Cheers,

Benoit_Delbosc · December 17, 2015, 12:18pm

The simulation send many times the same kind of request (non chunked) without errors, at some point the tomcat server decides to chunk the response and I have the “invalid version format” error.
The chunked response contains 2 TCP packets and the error is on the second one which contains the end of the chunk “0\r\n”
The error happens only when using an ELB.

So far adding .disableResponseChunksDiscarding to .check(status.in(201) seems to fix the problem.

I will give a try with a shareConnections but in my simulation there is only one user on this part.

Gatling marks the request as KO with "java.lang.IllegalArgumentException: invalid version format: 0 " and continue using another connection AFAIU.
I can send you the tcpdump if needed.

ben

slandelle · December 17, 2015, 12:20pm

Does this happen because ELB keeps on sending chunks from the previous request after the connection was closed and the port was finally reused for a new one after tcp wait?

slandelle · December 17, 2015, 12:23pm

Setting a check or disabling chunk discarding has nothing to do with the issue. At best, it has some influence on some race condition.
Gatling works downstream of Netty.

Benoit_Delbosc · December 17, 2015, 1:52pm

Ok my conclusion was wrong disableResponseChunksDiscarding or checking the response body DO NOT fix the problem.
It does improve the situation but the problem is still there I just reproduce the error, sorry for the noise.

I am testing the shareConnections solution and will let you know (after much more shooting) if it works.

Benoit_Delbosc · December 28, 2015, 4:48pm

After more testing:

shareConnections like disableResponseChunksDiscarding do improve the situation but don’t fix the problem.
disabling the connection pooling (gatling.conf allowPoolingConnections = false) do fix the problem.

I will try to provide a procedure to reproduce the problem.
Regards

ben

slandelle · December 28, 2015, 5:22pm

I suspected as much.
Your issue is that your reading a chunk that comes from the previous request on this socket, so the client can’t find the expected response line.

It can be:

a server/network issue, with a lingering/duplicate packet
a JDK/Netty/AHC issue, with the connection being offered to the pool too soon
A reproducer will tell.

Benoit_Delbosc · January 25, 2016, 3:32pm

Hi,

At the end Amazon have acknowledge an ELB bug,
The ELB HTTP listener re-write the chunked transfer encoded response into single packet content length response.
They were able to reproduce the problem, the ELB generates a packet with an invalid content length response that raises the Jetty/Gatling error.

Possible workaround to use Gatling against an ELB (until it is fixed) are:

Change the ELB configuration to use a TCP listener instead of an HTTP listener (not possible if you need affinity)
Don’t use client Keep alive (gatling.conf allowPoolingConnections = false)
Don’t use server chunked response

Thanks Stéphane for your support

ben

slandelle · January 25, 2016, 3:38pm

Thanks a lot for the feedback.
Could you please let us know once the ELB issue is solved?

Federico_Donnarumma · August 23, 2016, 7:08pm

Hi guys, is there any new about this? do you happen to have the Amazon case? we’ve experienced the same and been struggling a whole week before hitting the same wall. We use okhttp and fails parsing an http request which first line is “0”

Thanks in advance.

Topic		Replies	Views
Problems running Gatling with Amazon ELB Gatling (Open-Source)	25	235	December 7, 2014
Spikes at regular 1 minute intervals Gatling (Open-Source)	22	381	July 8, 2016
Intermittent RequestTimeoutException Gatling (Open-Source)	3	1298	December 4, 2019
Response body Gatling (Open-Source)	5	109	November 25, 2014
j.l.IllegalArgumentException: invalid version format: 0 in galing Gatling (Open-Source)	5	389	November 3, 2020

Amazon Elastic Load Balancer (ELB) and HTTP chunked response

Related topics