Strange behaviour GraphiteDataWriter Gatling-2.1.2

Hi Guys,

I found some really strange GraphiteDataWriter behavior in Gatling-2.1.2. It seems to fail in case of certain simulation setups, specifically for simulations with long duration.

If I use this setup:

setUp(

        Scenarios.scn.inject(rampUsersPerSec(1.0) to (200.0) during (1800)
          , constantUsersPerSec(200.0)
            during (86400))
          .exponentialPauses

      ).protocols(httpProtocol)

the GraphiteDataWriter works, but if I change duration to this:

setUp(

        Scenarios.scn.inject(rampUsersPerSec(1.0) to (200.0) during (1800)
          , constantUsersPerSec(200.0)
            during (172800))
          .exponentialPauses

      ).protocols(httpProtocol)

it pushes a few datapoints at the beginning of the scenario and then stops.

I can reproduce the issue so it doesn't seem to be a coincident. With other scripts I was able to run 48 hour runs without problems, so it seems to be a combination of simulation load and duration. Could it be something memory related?

My GraphiteDataWriter look like this

   graphite {
      light = false
      host = "host"
      port = 2113
      protocol = "tcp"
      rootPathPrefix = "gatling2"
      writeInterval = 10
      #bucketWidth = 100
      #bufferSize = 8192
    }

cheers

Daniel

Weird, I fail to see how run duration and graphite could be related.

Your issue could be related to the supervisor strategy we introduced: if the GraphiteDataWriter failed to reconnect to Graphite more than 5 times in 5 seconds, it stops. Are you sure you don’t have connectivity issues between the Gatling host and the Graphite one?

That was my first guess as well, but it works fine if I change the set up to half the duration. Tried it dozens of times and as soon as I increase the duration it fails after a sending just a few data points....

But would there be a difference in the initial calls to Graphite in a simulation with a longer duration, somehow causing the TCP connection to fail?

Absolutely not, that’s why it’s so weird.

Except for Graphite, are you sure that the HTTP requests are sent as expected (with expected load)?

Hi Stephane,

I have made a Wireshark capture of both setups and I don't see any connection resets or errors. The client just stops sending packets after a minute or so in the 48 hour setup, really strange. When I compare the captured streams I can't see any obvious differences. Would you like me to share the capture files?

Cheers

Daniel

Hi Daniel,

Found it (and fixed it): https://github.com/gatling/gatling/issues/2502

The issue was actually that virtual users scheduling was not properly lazy. The consequence was that the heap is quickly full under such heavy and long load, causing permanent GC to freeze the JVM until it finally dies in OOM (I guess you killed the JVM before it happened).

Thanks a lot for reporting!
Cheers,

Stéphane

Nice!

It was starting to worry it was me doing something wrong :slight_smile: Is release 2.1.3 still due for today?

Cheers

Daniel

Hi Daniel,

We’ll release 2.1.3 no later than tomorrow.
Like always, we’ll announce when it’s be available on the ML :wink:

Cheers,

Pierre