I am using gatling (3.0.0 stable) for the first time. I’m fairly happy that my simulations are running ok and producing sensible statistics, except that sometimes I appear to get get two executions of a given request, one of which succeeds and one of which times out. I’m not clear whether this is a realistic behavior, and I am misunderstanding how gatling interacts with my service, or whether it is some sort of bug in gatling.
For example, I have a request called “moc” which appears exactly once in my one and only scenario. When I run with 50 users, I expect to see a total of 50 executions of this request. Instead, the report statistics list 97 executions, of which 50 are OK and 47 are KO. All the failures are due to a RequestTimeoutException after 60000 ms. Looking in the simulation log, I can see two entries for moc for each of 47 of the users. Both entries for a given user have identical start times, with the first succeeding after a few seconds and the second timing out after 60 seconds. Like this for users 1 and 2 (other requests not shown):
REQUEST 1 moc 1540870329502 1540870334553 OK
REQUEST 1 moc 1540870329502 1540870389503 KO i.g.h.c.i.RequestTimeoutException: Request timeout to … after 60000 ms
REQUEST 2 moc 1540870449092 1540870450545 OK
REQUEST 2 moc 1540870449092 1540870509093 KO i.g.h.c.i.RequestTimeoutException: Request timeout to … after 60000 ms
It is not clear whether two requests are actually being made, or whether there is just one request but gatling is not properly registering the successful response and so is also recording the request as failed after the timeout.
This doesn’t happen for most of the other types of request. The “moc” request is notable in that it is slow to handle (typically at least one second including round-trip latency of around 300 ms between my desk in Sydney and my test server in the UK, and sometimes several seconds) but other than that I can see nothing special about it. The few moc requests that succeed without a paired failure are just as slow as the others.
My service is pretty non-standard, being a layer 4 load balancer (http://gobetween.io/) sitting in front of some C++ processes built around simple-web-server (https://github.com/eidheim/Simple-Web-Server), all running on Windows, so perhaps unexpected behavior is to be, um, expected.
If anyone can explain this, I would be very grateful. Alternatively, if anyone can tell me how to exclude these timeout cases from my statistics, that would be useful too.
Thanks very much.