Strange WS connection timeouts when ramping users

Constantin_Gerstberg · February 29, 2016, 5:41pm

Hi,

i’ve run into a (to me) strange problem while load-testing the WebSocket endpoint of a SockJs-server:

Whenever i set the scenario to spawn all users at once, everything works fine (there has been only one rare case in which the server did not accept any connection at all).

However, when spawning 10% of the user immediately and ramping up the rest over time, in most (~50%-80%) of the runs, 3-8% of the ws-open requests time-out and subsequent actions for the involved connections fail.

Some additional observations:

Which connections are affected, seems rather random
It seems to happen less likely when the ramp duration is <= 5 seconds
The server does not report any errors (unless they are buried within the debug log …)
The client reports a timeout of the HTTP-Upgrade request after ~60 seconds
Disabling the firewall on my machine did not change anything
I regularly see gatling.http.action.ws.CheckTimeout’s while running the simulation, though their numbers does not match the one of failed connections (see below)
I’ve run multiple successful load-tests using 500 users for ~3 mins resulting in ~5k requests/s, so i do not think file handles or the like at the server side are causing the issue
I’ve run a mostly equivalent load-test using HTTP request-response instead of WS messages and did not have any problems in about 20 runs
Here’s the simulation code (client and server simply ping-pong mostly the same json content):

`

val numAtOnceUsers = 10
val numRampUsers = 90
val rampUpDuration = 9.seconds

val numberOfMessagesSentPerUser = 10
val pingPongTimeout = 2.seconds
val timeBetweenPings = 0.1.seconds

val data = Data(numberOfTargetIds = 1, messages = Set("foo"))

val httpConf = http
  .wsBaseURL("ws://localhost:9000")

val incrementalUserId = new AtomicInteger

val scn = scenario("WsLoadSimulation")
  .exec(
    _.set("clientId", incrementalUserId.getAndIncrement())
  )
  .exec(ws("Open WebSocket")
    .open("/session/websocket?clientId=${clientId}")
  )
  .repeat(numberOfMessagesSentPerUser, "i") {
    feed(data.pingPongFeeder, Feeds.oneExpression)
      .exec(
        ws("Send Ping Message")
          .sendText("${ping}")
          .check((wsAwait within pingPongTimeout until 1).message is "${pong}")
      )
      .pause(timeBetweenPings)
  }
  .exec(ws("Close WebSocket")
    .close
  )

setUp(
  scn.inject(
    atOnceUsers(numAtOnceUsers),
    rampUsers(numRampUsers) over rampUpDuration
  )
) protocols httpConf

`

Environment

Gatling 2.1.7
play2-sockjs 0.5.0-SNAPSHOT
Play framework 2.5.0-M2
I cannot change the Play version at this point because i cannot downgrade the play2-sockjs (plugin) version. Given that everything works fine when not ramping up users, i do not think the server is to blame, too.

CheckTimeout example log:

INFO] [02/29/2016 18:32:20.012] [GatlingSystem-akka.actor.default-dispatcher-9] [akka://GatlingSystem/deadLetters] Message [io.gatling.http.action.ws.CheckTimeout] from Actor[akka://GatlingSystem/deadLetters] to Actor[akka://GatlingSystem/deadLetters] was not delivered. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings ‘akka.log-dead-letters’ and ‘akka.log-dead-letters-during-shutdown’.

Netty timeout log

`
18:36:21.308 [DEBUG] c.n.h.c.p.n.r.t.TimeoutTimerTask - Request timed out to localhost/127.0.0.1:9000 of 60000 ms for NettyResponseFuture{currentRetry=0,

isDone=false,

isCancelled=false,

asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,

nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,

content=null,

uri=ws://localhost:9000/session/websocket?clientId=64,

keepAlive=true,

httpHeaders=null,

exEx=null,

redirectCount=0,

timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,

inAuth=false,

statusReceived=false,

touch=49009365} after 60007 ms

18:36:21.308 [DEBUG] c.n.h.c.p.n.c.ChannelManager - Closing Channel [id: 0x529af2fe, /127.0.0.1:64137 => localhost/127.0.0.1:9000]

18:36:21.308 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Aborting Future NettyResponseFuture{currentRetry=0,

isDone=false,

isCancelled=false,

asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,

nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,

content=null,

uri=ws://localhost:9000/session/websocket?clientId=64,

keepAlive=true,

httpHeaders=null,

exEx=null,

redirectCount=0,

timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,

inAuth=false,

statusReceived=false,

touch=49009365}

18:36:21.308 [DEBUG] c.n.h.c.p.n.h.Processor - Channel Closed: [id: 0x529af2fe, /127.0.0.1:64137 :> localhost/127.0.0.1:9000] with attribute INSTANCE

18:36:21.313 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Request timed out to localhost/127.0.0.1:9000 of 60000 ms

java.util.concurrent.TimeoutException: Request timed out to localhost/127.0.0.1:9000 of 60000 ms

at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:47) [async-http-client-1.9.30.jar:na]

at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:48) [async-http-client-1.9.30.jar:na]

at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) [netty-3.10.4.Final.jar:na]

at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) [netty-3.10.4.Final.jar:na]

at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) [netty-3.10.4.Final.jar:na]

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

18:36:21.313 [DEBUG] i.g.h.a.w.WsActor - Websocket ‘gatling.http.webSocket’ failed to open: Request timed out to localhost/127.0.0.1:9000 of 60000 ms

`

slandelle · February 29, 2016, 5:59pm

Hi,

Frankly, the only way to investigate this is to provide a sample application to reproduce.

Cheers,

Constantin_Gerstberg · March 2, 2016, 10:58am

Hi,

thanks for your quick response.
Took me a while to get feedback on this but unfortunately, i cannot provide the code for now.

Regards,
Constantin

Topic		Replies	Views
Gatling reports incorrect performance results for Websocket app Gatling (Open-Source)	2	120	February 26, 2016
ConnectTimeoutException: connection timed out: Gatling (Open-Source)	2	447	March 20, 2018
Actual WebSocket connections were shown more than expected. Gatling (Open-Source)	0	98	May 4, 2019
Connection timed out Gatling (Open-Source)	3	199	February 19, 2015
How many concurrent WebSocket connections are possible in gatling? For me it is failing to make event 50 concurrent connections. Please help. Gatling (Open-Source)	3	218	May 5, 2019

Strange WS connection timeouts when ramping users

Related topics