Unexpected double-executions of a request, one of which succeeds and one of which times out

Hi,

I am using gatling (3.0.0 stable) for the first time. I’m fairly happy that my simulations are running ok and producing sensible statistics, except that sometimes I appear to get get two executions of a given request, one of which succeeds and one of which times out. I’m not clear whether this is a realistic behavior, and I am misunderstanding how gatling interacts with my service, or whether it is some sort of bug in gatling.

For example, I have a request called “moc” which appears exactly once in my one and only scenario. When I run with 50 users, I expect to see a total of 50 executions of this request. Instead, the report statistics list 97 executions, of which 50 are OK and 47 are KO. All the failures are due to a RequestTimeoutException after 60000 ms. Looking in the simulation log, I can see two entries for moc for each of 47 of the users. Both entries for a given user have identical start times, with the first succeeding after a few seconds and the second timing out after 60 seconds. Like this for users 1 and 2 (other requests not shown):

REQUEST 1 moc 1540870329502 1540870334553 OK

REQUEST 1 moc 1540870329502 1540870389503 KO i.g.h.c.i.RequestTimeoutException: Request timeout to … after 60000 ms

REQUEST 2 moc 1540870449092 1540870450545 OK

REQUEST 2 moc 1540870449092 1540870509093 KO i.g.h.c.i.RequestTimeoutException: Request timeout to … after 60000 ms

It is not clear whether two requests are actually being made, or whether there is just one request but gatling is not properly registering the successful response and so is also recording the request as failed after the timeout.

This doesn’t happen for most of the other types of request. The “moc” request is notable in that it is slow to handle (typically at least one second including round-trip latency of around 300 ms between my desk in Sydney and my test server in the UK, and sometimes several seconds) but other than that I can see nothing special about it. The few moc requests that succeed without a paired failure are just as slow as the others.

My service is pretty non-standard, being a layer 4 load balancer (http://gobetween.io/) sitting in front of some C++ processes built around simple-web-server (https://github.com/eidheim/Simple-Web-Server), all running on Windows, so perhaps unexpected behavior is to be, um, expected.

If anyone can explain this, I would be very grateful. Alternatively, if anyone can tell me how to exclude these timeout cases from my statistics, that would be useful too.

Thanks very much.

Roger Austin

Hi,

Is there any way you can share a way to reproduce on our side, maybe privately?

Cheers,

Thanks for you offer, Stéphane.

However, I have identified the problem by a process of elimination.

The issue appears to be that my recorded session includes some websocket upgrade requests. Gatling records these requests like this:

val headers_116 = Map(
“Accept” → “text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8”,
“Accept-Encoding” → “gzip, deflate, br”,
“Connection” → “keep-alive, Upgrade”,
“Origin” → “https://”,
“Pragma” → “no-cache”,
“Sec-WebSocket-Extensions” → “permessage-deflate”,
“Sec-WebSocket-Key” → “29s/erceIP3DO/TX6aEMeA==”,
“Sec-WebSocket-Version” → “13”,
“Upgrade” → “websocket”)

http(“request_116”)
.get("/iws-local/socket")
.headers(headers_116)
.check(status.is(101))

However, looking at the documentation (https://gatling.io/docs/3.0/http/websocket) it is clear that websocket requests need to use the ws method, presumably because they work with connections in a very different way from http.

My guess is that gatling is (sometimes) attempting to reuse a connection that has been appropriated by the websocket upgrade, in order to do a subsequent http request, and is then getting confused about what is happening on that connection.

When I remove these websocket upgrade requests, the problem goes away.

Presumably it would be easy for the recorder to recognize the “Upgrade:websocket” header and do something different with such requests, but it is not clear what that something different should be. Perhaps it could emit, say:

ws(“request_116”)
.connect("/iws-local/socket")

And also put an appropriate entry in the httpProtocol value

.wsBaseUrl(“wss://”)

It would then be up to the developer to build on this to code websocket scenarios by hand.

Cheers,
Roger

Yes, you guessed correctly.

Please see: https://github.com/gatling/gatling/issues/3594 and https://github.com/gatling/gatling/issues/3595

Thanks!