Hi,
i’ve run into a (to me) strange problem while load-testing the WebSocket endpoint of a SockJs-server:
Whenever i set the scenario to spawn all users at once, everything works fine (there has been only one rare case in which the server did not accept any connection at all).
However, when spawning 10% of the user immediately and ramping up the rest over time, in most (~50%-80%) of the runs, 3-8% of the ws-open requests time-out and subsequent actions for the involved connections fail.
Some additional observations:
-
Which connections are affected, seems rather random
-
It seems to happen less likely when the ramp duration is <= 5 seconds
-
The server does not report any errors (unless they are buried within the debug log …)
-
The client reports a timeout of the HTTP-Upgrade request after ~60 seconds
-
Disabling the firewall on my machine did not change anything
-
I regularly see gatling.http.action.ws.CheckTimeout’s while running the simulation, though their numbers does not match the one of failed connections (see below)
-
I’ve run multiple successful load-tests using 500 users for ~3 mins resulting in ~5k requests/s, so i do not think file handles or the like at the server side are causing the issue
-
I’ve run a mostly equivalent load-test using HTTP request-response instead of WS messages and did not have any problems in about 20 runs
Here’s the simulation code (client and server simply ping-pong mostly the same json content):
`
val numAtOnceUsers = 10
val numRampUsers = 90
val rampUpDuration = 9.seconds
val numberOfMessagesSentPerUser = 10
val pingPongTimeout = 2.seconds
val timeBetweenPings = 0.1.seconds
val data = Data(numberOfTargetIds = 1, messages = Set("foo"))
val httpConf = http
.wsBaseURL("ws://localhost:9000")
val incrementalUserId = new AtomicInteger
val scn = scenario("WsLoadSimulation")
.exec(
_.set("clientId", incrementalUserId.getAndIncrement())
)
.exec(ws("Open WebSocket")
.open("/session/websocket?clientId=${clientId}")
)
.repeat(numberOfMessagesSentPerUser, "i") {
feed(data.pingPongFeeder, Feeds.oneExpression)
.exec(
ws("Send Ping Message")
.sendText("${ping}")
.check((wsAwait within pingPongTimeout until 1).message is "${pong}")
)
.pause(timeBetweenPings)
}
.exec(ws("Close WebSocket")
.close
)
setUp(
scn.inject(
atOnceUsers(numAtOnceUsers),
rampUsers(numRampUsers) over rampUpDuration
)
) protocols httpConf
`
Environment
- Gatling 2.1.7
- play2-sockjs 0.5.0-SNAPSHOT
- Play framework 2.5.0-M2
I cannot change the Play version at this point because i cannot downgrade the play2-sockjs (plugin) version. Given that everything works fine when not ramping up users, i do not think the server is to blame, too.
CheckTimeout example log:
INFO] [02/29/2016 18:32:20.012] [GatlingSystem-akka.actor.default-dispatcher-9] [akka://GatlingSystem/deadLetters] Message [io.gatling.http.action.ws.CheckTimeout] from Actor[akka://GatlingSystem/deadLetters] to Actor[akka://GatlingSystem/deadLetters] was not delivered. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings ‘akka.log-dead-letters’ and ‘akka.log-dead-letters-during-shutdown’.
Netty timeout log
`
18:36:21.308 [DEBUG] c.n.h.c.p.n.r.t.TimeoutTimerTask - Request timed out to localhost/127.0.0.1:9000 of 60000 ms for NettyResponseFuture{currentRetry=0,
isDone=false,
isCancelled=false,
asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,
nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,
content=null,
uri=ws://localhost:9000/session/websocket?clientId=64,
keepAlive=true,
httpHeaders=null,
exEx=null,
redirectCount=0,
timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,
inAuth=false,
statusReceived=false,
touch=49009365} after 60007 ms
18:36:21.308 [DEBUG] c.n.h.c.p.n.c.ChannelManager - Closing Channel [id: 0x529af2fe, /127.0.0.1:64137 => localhost/127.0.0.1:9000]
18:36:21.308 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Aborting Future NettyResponseFuture{currentRetry=0,
isDone=false,
isCancelled=false,
asyncHandler=com.ning.http.client.ws.WebSocketUpgradeHandler@1ae11f07,
nettyRequest=com.ning.http.client.providers.netty.request.NettyRequest@5fe3923f,
content=null,
uri=ws://localhost:9000/session/websocket?clientId=64,
keepAlive=true,
httpHeaders=null,
exEx=null,
redirectCount=0,
timeoutsHolder=com.ning.http.client.providers.netty.request.timeout.TimeoutsHolder@402ef78,
inAuth=false,
statusReceived=false,
touch=49009365}
18:36:21.308 [DEBUG] c.n.h.c.p.n.h.Processor - Channel Closed: [id: 0x529af2fe, /127.0.0.1:64137 :> localhost/127.0.0.1:9000] with attribute INSTANCE
18:36:21.313 [DEBUG] c.n.h.c.p.n.r.NettyRequestSender - Request timed out to localhost/127.0.0.1:9000 of 60000 ms
java.util.concurrent.TimeoutException: Request timed out to localhost/127.0.0.1:9000 of 60000 ms
at com.ning.http.client.providers.netty.request.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:47) [async-http-client-1.9.30.jar:na]
at com.ning.http.client.providers.netty.request.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:48) [async-http-client-1.9.30.jar:na]
at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556) [netty-3.10.4.Final.jar:na]
at org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632) [netty-3.10.4.Final.jar:na]
at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369) [netty-3.10.4.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
18:36:21.313 [DEBUG] i.g.h.a.w.WsActor - Websocket ‘gatling.http.webSocket’ failed to open: Request timed out to localhost/127.0.0.1:9000 of 60000 ms
`