Does netty http client have limitation on responce body?

Hi,

We are having very big application database, and some responses without pagination are very big(about 15Mb in size).
For small responses scenario works, but for case with big response Connection Closed Remotely error is thrown.

I have increased few gatling parameters to avoid closing of connection by timeout

connectionTimeout = 1800000 # timeout when establishing a connection
idleConnectionInPoolTimeoutInMs = 1800000 # timeout when a connection stays unused in the pool
idleConnectionTimeoutInMs = 1800000 # timeout when a used connection stays idle
maxRetry = 4 # number of times that a request should be tried again
requestTimeoutInMs = 1800000 # timeout of the requests

Actual connection is not closed by remote server because the same scenario works for web browser but takes more than 1 minute.
So I have suspicion that netty client causes that.

Thanks in advance,
Ievgen

The fact that you need to increase that timeout just to get the page to load already screams “badly designed, doomed to fail in production, go back to start, do not pass boardwalk”.

Even if they never leave the company there is no excuse for unpaginated responses of that size. Put any kind of serious load on that and any non-tuned application server is bound to hit all kinds of resource limits (connections, memory, bandwidth …) really really quickly.

If I were to find an application written like that the first thing I’d do is write a quick application impact study detailing exactly how much of a fail it is and why. Single user response times > 8 seconds already fail even least stringent of the company wide requirements on web page responstimes without performing a single loadtest, and pointing that out would be our first priority before we started doing tests that are likely to be unnecessary.

(The actual requirement is having 99% of the end user responses in less than 8 seconds (including rendering!), for pages in the ‘heavy’ category. Normal pages have that requirement lowered to 3 seconds, and frankly, lowering that further to 1 second is not unreasonable.)

So, think hard before you continue down this path. Is the customer better served by jumping through hoops to get something of a loadtest going, or is he better served by you telling the developers that this will never pass whatever requirements your customer has, regardless of how much testing you do?

Yeah, I know that! We have page load standard set not not more than 7 seconds for heavy pages.
But this one is exception for be able to get all data in one request.

Пʼятниця, 22 листопада 2013 р. 13:15:06 UTC+2 користувач Floris Kraak написав:

Hi,

No, there’s no limit on response body.

Is the problem systematic or random? Does it happens under load or even with just one user? Does it also happens when the user’s connection pool is empty (more than 5s pause before performing the request, IIRC, your keep-alive is set to 5s)?

Stéphane

If the client really wants that much data in one response, why not deliver it as a file download rather than some really heavy webpage? This sounds like something that also could be put in a large excel file or a PDF of some kind.

I stand by my assertion that this kind of design is bound to fail. Pagination wasn’t invented for no reason…

Problem is permanent even with one user.
Before request I have random pause in range 10…30 seconds.
Keep-Alive is 5 sec.

Пʼятниця, 22 листопада 2013 р. 13:46:47 UTC+2 користувач Stéphane Landelle написав:

Can you please provide a wireshak dump, please? Or even better, a test case or private access to reproduce.

Sorry, unfortunately not this time.
My home lab is not able accommodate such environment and it is running at work.
But here we have strict rules.

Nothing special in tcpdump. Request is handled by apache and I’m able to see this in log (as opposite to previous time when there were no trace of requests).
In apache log I see 200 response code that mean request was handled and response provided.
But gatling somehow considers it failed because I see that it is trying to repeat the same request number of times specified in maxRetry property.

Пʼятниця, 22 листопада 2013 р. 15:09:38 UTC+2 користувач Stéphane Landelle написав:

Status code is the first response element being sent on the wire. Getting a 200 means that the server could successfully compute the response and will start writing the expected response body. That’s during this phase that the connection is being closed. Questions are why, and by whom?

Yeah, and I think gatling causes this because from browser it takes a lot of time but response is read.
Only one question is why gatling considers response as failed.

Maybe I can enable full logging of client that gatling uses and trace this issue down?

Пʼятниця, 22 листопада 2013 р. 15:30:15 UTC+2 користувач Stéphane Landelle написав:

Gatling considers response as failed because AsyncHttpClient threw an Exception while still reading the response because the socket was closed.
The root cause could be anywhere: your infrastructure, your NIC, your OS, your JDK, Netty… Anyway, it happens in a different layer than the client one, so logging there won’t help. A wireshark dump could, maybe.

That’s really something I can’t investigate without getting my hands on a test case and being able to reproduce, sorry.

Not entirely true: Having the client logging may / should / could reveal how much of the data was received before the socket was closed. Which would then provide some hints as to what might be going on here.

In theory, you’re right. The thing is I’m pretty sure there’s currently no debug logging on response chunks.
But I could be wrong, and there’s no harm in giving it a try. @Ievgen, please lower logging level to trace in gatling.conf, please.

I suspected as much. Hence the ‘may’.

Loadrunner tends to be a fair bit lower level. Without intervening layers logging exactly what was received is quite a bit easier.

Here is only what I got in console log

17:41:07.223 [WARN ] i.g.h.a.AsyncHandler - Request ‘request_2 Redirect 2’ failed
java.io.IOException: Remotely Closed [id: 0xf084adf8, /127.0.0.1:57951 :> app01.projects.local/192.168.241.58:80]
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.channelClosed(NettyAsyncHttpProvider.java:1388) [async-http-client-1.7.19.20130706.jar:na
]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpContentDecoder.channelClosed(HttpContentDecoder.java:147) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpClientCodec$Decoder.channelClosed(HttpClientCodec.java:221) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.http.HttpClientCodec.handleUpstream(HttpClientCodec.java:92) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:376) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.6.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_21]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_21]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_21]
17:41:07.251 [WARN ] i.g.h.a.AsyncHandlerActor - Request ‘request_2 Redirect 2’ failed : Remotely Closed [id: 0xf084adf8, /127.0.0.1:57951 :> app01.projects
.local/192.168.241.58:80]
17:41:07.267 [TRACE] i.g.h.a.AsyncHandlerActor -

Request:
request_2 Redirect 2: KO Remotely Closed [id: 0xf084adf8, /127.0.0.1:57951 :> app01.projects.local/192.168.241.58:80]

As I expected, you’d get more chance with a wireshark dump.

Hi,

Found some time today to try to debug with tcpdump.
And I noticed that host where gatling is running got complete response. So looks like gatling considers it as failed.

Неділя, 24 листопада 2013 р. 19:23:42 UTC+2 користувач Stéphane Landelle написав:

Hi,

Do you have a stacktrace on the gatling side?

Stacktrace is the same that I’ve provided in one of the replies above - Connection is remotely closed.

Вівторок, 26 листопада 2013 р. 12:36:49 UTC+2 користувач Stéphane Landelle написав:

Could you try upgrading to AHC 1.7.21 and Netty 3.8.0, please?