Problem with first message on websocket

Hi,

We’re struggling with a strange problem with websocket requests on version 2.0.0-SNAPSHOT.

If the server sends a message right after the connection opening, sometimes Gatling seems to never receive this message.

We tried with vert.x and node.js, the vert.x code:

`

vertx.createHttpServer().websocketHandler { ws →
ws.writeTextFrame(“ack”)// send a message just after the opened connection
println “websocket connected and ack sent”
}.listen(8080, “localhost”)

`

The gatling script:

`

class WebsocketAck extends Simulation {

val httpConf = http.wsBaseURL(“ws://localhost:8080”)

val scn = scenario(“WebSocket”)
.repeat(100) {
exec(ws(“Connect WS”).open("/"))
.exec(
ws(“Receive ack”)
.check(wsAwait.within(5 seconds).until(1))
)
.exec(ws(“Close WS”).close)
}

setUp(scn.inject(atOnceUsers(1)).protocols(httpConf))
}

`

Often the first run successes but not the others. When we add a timer before to send the ack message, the problem disappears:

`

vertx.createHttpServer().websocketHandler { ws →
vertx.setTimer(2000) {ws.writeTextFrame(“ack”)} // schedule the send at 2s
println “websocket connected and ack sent”
}.listen(8080, “localhost”)

`

Do you know if it’s a problem with our scripts or if it’s a deeper problem with gatling (may be gatling reads messages at the start of the connection and the listener is added too late) ?

Thank you for your help

Damn!!!
I’m 99% sure that you’re being bitten by: https://github.com/netty/netty/issues/2179

There’s a good chance that this issue has been incidentally fixed in Netty 4 but it’s definitively here in Netty 3.
The problem is that Norman Maurer is only willing to investigate this if I can provide a pure Netty based test case (not through AHC) and I just couldn’t find time until now.

Do you thing you could lend a hand and provide me with a server so I can reproduce, build the client part of the test case and add to the issue, please?

thank you for your fast answer.

I’m not really an expert in Netty (it’s why I use great frameworks like gatling and vert.x :slight_smile: ) but I will try to go deeper for the full Netty test case.

When I use vert.x as a client for websocket, I don’t have the problem and vert.x uses Netty 4, so that seems to be another clue that Netty 3.9 is bugged.

For interested persons, temporary I use a workaround with a proxy which adds latencies on the first message, it’s far to be ideal:

`

import org.vertx.groovy.core.streams.Pump

vertx.createHttpServer().websocketHandler { wsClient → // websocket from client is opened
vertx.createHttpClient(host: “localhost”, port: 8080).connectWebsocket(wsClient.path) { wsServer →
Pump.createPump(wsClient, wsServer) // forward client websocket requests to the server

boolean firstMessage = true

wsServer.dataHandler { buffer →
if (firstMessage) {
// add 100ms of latency for the first message
vertx.setTimer(100) { wsClient.writeTextFrame(buffer.toString()) }
firstMessage = false
} else {
wsClient.writeTextFrame(buffer.toString())
}
}
}
}.listen(9090, “localhost”)

`

You run it like that:

$ vertx run server.groovy

I should be able to build the client side with raw Netty.
It’s just that I don’t have a convenient websocket app so I can reproduce the issue. I’m used to Spray and Jetty, but I’m not sure those would behave the same.

And I don’t currently have time to play with Vert.x, even if I would love to. If you could just drop me a simple maven project where I just have to launch some class so I get a running websocket server, it would be great!

I joined the server.

`
$ mvn package

$ java -jar target/echo-1.0-SNAPSHOT-jar-with-dependencies.jar

log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See Apache log4j 1.2 - Frequently Asked Technical Questions for more info.
server started at ws://localhost:9090

`

The server return a “ack” message right after the connection opening. During my tests that failed 90% of cases.
If you query “ws://localhost:9090/delay”, a delay of 2 seconds is added and there is no more error.

Say me if you have problems for running the server.

ack-ws-server.tar.gz (1.56 KB)

That’s great, thanks a lot!
I’ll work on this on monday.

Actually, there were 2 problems:

  • We don’t buffer incoming messages and we didn’t provide a way to set up checks on open, so incoming messages could be lost. Fixed: https://github.com/excilys/gatling/issues/2039
  • The Netty issue I was referring to. Thanks to your sample, I’m able to reproduce (get ~1% lost messages) with AsyncHttpClient. Next step for me is to build a pure Netty client test case and hassle Norman to get this fixed…
    Stay tuned

Test case done: https://github.com/netty/netty/issues/2179#issuecomment-50352114

Thanks a lot.

The check on close method was definitely a solution we thought about.

With this check, I fall from 97% of errors to 6%. But all my tests are in the same machine and the server just answers the ack, I will test in a more real environment.

But I’m still interested in a fix on the netty part of the bug.

Well, now it’s in the hands of the Netty guys… Not much I can do except ship the fix when they’re done :frowning: