I found some really strange GraphiteDataWriter behavior in Gatling-2.1.2. It seems to fail in case of certain simulation setups, specifically for simulations with long duration.
If I use this setup:
setUp(
Scenarios.scn.inject(rampUsersPerSec(1.0) to (200.0) during (1800)
, constantUsersPerSec(200.0)
during (86400))
.exponentialPauses
).protocols(httpProtocol)
the GraphiteDataWriter works, but if I change duration to this:
setUp(
Scenarios.scn.inject(rampUsersPerSec(1.0) to (200.0) during (1800)
, constantUsersPerSec(200.0)
during (172800))
.exponentialPauses
).protocols(httpProtocol)
it pushes a few datapoints at the beginning of the scenario and then stops.
I can reproduce the issue so it doesn't seem to be a coincident. With other scripts I was able to run 48 hour runs without problems, so it seems to be a combination of simulation load and duration. Could it be something memory related?
Weird, I fail to see how run duration and graphite could be related.
Your issue could be related to the supervisor strategy we introduced: if the GraphiteDataWriter failed to reconnect to Graphite more than 5 times in 5 seconds, it stops. Are you sure you don’t have connectivity issues between the Gatling host and the Graphite one?
That was my first guess as well, but it works fine if I change the set up to half the duration. Tried it dozens of times and as soon as I increase the duration it fails after a sending just a few data points....
I have made a Wireshark capture of both setups and I don't see any connection resets or errors. The client just stops sending packets after a minute or so in the 48 hour setup, really strange. When I compare the captured streams I can't see any obvious differences. Would you like me to share the capture files?
The issue was actually that virtual users scheduling was not properly lazy. The consequence was that the heap is quickly full under such heavy and long load, causing permanent GC to freeze the JVM until it finally dies in OOM (I guess you killed the JVM before it happened).