Running the same simulation on Gatling v2.1.7 with the same (or even much higher) load per simulation, on the same machine/s we do not get these 401 responses.
The requests that get these responses are heartbeat requests sent once every 2 seconds per user.
The users are pre-authenticated and requests are sent with oauth from that point on.
This is also not as a result of load on the worker linux machine, the load average on the machine is around 0.3 at 500 concurrent users.
To verify, I ran three different copies of the same simulation on the same machine as different processes, at the same time, reaching a combined 1500 concurrent users and did not get any 401 status responses.
This is also not a result of a specific duration. I can load the users much faster and much slower and still get to this problem when reaching ~510 concurrent users on this scenario on the same simulation.
This reproduces every time on both Gatling v2.2.2 and v2.2.3.
In addition, the requests that fail look exactly the same as the ones that return the expected response, and there is nothing in the scenario that happens when we reach ~500 users that could possibly cause a backend change that would result in a behavioral change. Even if there was such a thing, as I said before, this works without a problem on a much higher load on Gatling v2.1.7.
We are not sure what else could have changed. Possibly Gatling configuration defaults or other
Below are two example requests the first returns the expected response and the second (that occurred only a couple of seconds later) occurred just after the 510 concurrent agents mark was reached.
Please help urgently.
Any assistance would be greatly appreciated.
Non failing request: