2.2 Bug Report - Something appears wrong with user injection

What follows is the periodic update of my running simulation.

Total simulation was supposed to be about 310 seconds. As of this moment, it had gone 1210s, and some scenarios still had waiting users.

Which version do you use exactly?
Can you come up with a more simple reproducer, please?

`
import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.core.structure.PopulationBuilder

class InjectionProblem extends Simulation {
val perHour : Double = 60.0;
val from : Double = 1.0 / 3600.0;
val to : Double = perHour / 3600.0;
setUp(
scenario( “Simple Test” )
.exec(
http( “Simple Request” )
.get( “http://localhost/” )
)
.inject(
rampUsersPerSec( from ) to ( to ) during ( 10 seconds ) randomized,
rampUsersPerSec( to ) to ( to ) during ( 300 seconds ) randomized
)
)
.protocols(
http
.acceptCharsetHeader(“ISO-8859-1,utf-8;q=0.7,;q=0.7")
.acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,
/*;q=0.8”)
.acceptEncodingHeader(“gzip, deflate”)
.acceptLanguageHeader(“fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3”)
)
}

object Config {
// …
val httpConf = }

`

I was working on an older build. I just re-tested on the latest build, and at least this time, it stopped, albeit about 12 seconds early. But it also stopped with the report saying 37% complete:

`

rampUsersPerSec( 0.00028 ) to ( 0.017 ) during ( 10 seconds )

I can’t say rampUsersPerSec was really designed with super low double values in mind.

rampUsersPerSec( 0.017 ) to ( 0.017 ) during ( 300 seconds )

I have to check how rampUsersPerSec behaves when from and to are equals, but this sure is weird.

rampUsersPerSec currently accept doubles and less than 1 values. I’m afraid this causes double precision issues (and we don’t want to go with BigDecimal because of the overhead).

If the issue really is with double precision, I’m afraid we’ll have to enforce ints instead of doubles.
Obviously, what you’re doing here is not load testing. What you actually want is “1 user per minute” or “1 user per hour” and this could be a new DSL element.

The From and To values being the same was because constantUsersPerSec didn’t work. It was a work-around.

To put some context around what I am trying to do:

We have a RESTful service that has a dozen or more active clients. Each client has several specific use cases. Which means when all is said and done, the simulation may include 50-100 concurrent scenarios. We need to be able to play “what-if” - What happens to the rest of the system if client X suddenly does 10x the volume? Or what happens when we add some new behavior to some client, and ramp it up to expected volumes? Does the system keep up with it?

To be able to simulate this, I broke each use case into its own scenario, with its own injection profile, and merged them all into one simulation. Then we have a global multiplier so we can apply a universal increase to the transaction rates. Currently, we run with 1x, 2x, 3x, 5x, and 10x, so we can prove to the business that our software is able to scale.

The injection profiles of each scenario are formed based on the results of analyzing production logs. So in some cases, yes, the number of transactions per second may be pretty low. But sometimes those transactions are pretty heavy, so I can’t just omit them. And I really need one formula for all of them, to keep the code simple. I really don’t want a different DSL for high-volume, and another for low-volume. Imagine what my code would look like then…

Honestly, I’m okay with the precision of a Double. The precision of the Double isn’t the problem… It appears to work as intended, at least in rampUsersPerSec(). I would ask that you keep in mind small fractions so that it continues to work in the future, though… :slight_smile:

The issue is that not all of them were actually injected. Or, as you saw before, sometimes there were too many injected (apparently). So, it reports that it will inject (as an example) 50 users, but it only manages to inject 35 before the end of the scenario. That may be because of the “randomized” parameter to the injection profile. That is the bug that I am reporting. I assume that if the system thinks it is going to inject 50 users, it will manage to get all 50 users injected before the anticipated end time of the scenario. That is a fair assumption, yes?

Play with the perHour value, and see what you get. In my experiences, it is pretty consistently not injecting all of the requests before the end of the simulation.

xxxUsersPerSec implementation is more simple than what you seem to think. It’s just a linear interpolation. It doesn’t truncate and pile up the fractional rest until it reaches 1. Probably feasible, if you feel like contributing… :slight_smile:

Then, ware you sure you want ~50 scenarios for this, and not ~50 chains instead, with a randomSwitch in front of it? Gatling reports are not really suited for such a high number of scenarios.

The >100% issue is caused by https://github.com/gatling/gatling/issues/2437. I’m considering reverting it… :frowning:

xxxUsersPerSec implementation is more simple than what you seem to think. It’s just a linear interpolation. It doesn’t truncate and pile up the fractional rest until it reaches 1. Probably feasible, if you feel like contributing… :slight_smile:

Not sure it’s worth the effort. And neither are you, or you would have implemented it already… :slight_smile: But we could, if we really wanted to.

I looked at it. I don’t see any calculations that might induce rounding errors at small rates. That’s good.

Since I’m using randomized, I looked also at the PoissonInjector. Wow, that was hard to wrap my head around. I ended up translating it into an Excel spreadsheet so I could visualize the outcome. I expected the number of users to be fixed, and only the durations between them to be randomized, I didn’t expect the actual user counts to be randomized, also. But I like it. And again, I don’t see any calculations that might inadvertently inject rounding problems, such as doing a division before doing a multiplication. So it looks good to me.

Then, ware you sure you want ~50 scenarios for this, and not ~50 chains instead, with a randomSwitch in front of it? Gatling reports are not really suited for such a high number of scenarios.

That may be true. And eventually, I do intend to reduce the number of scenarios. For now, I’m doing a “naive model” that just uses a separate injector per service, so I can control the rate that everything comes in. Later, I’ll actually analyze the production traffic and construct a client simulation that emulates the way the client actually works. That is, as time allows.

As for the reporting, I tried to make things better by forcing each client’s requests into a Group named after the client, with a sub group for the usage scenario. I am still building this, so I have not been able to run them all concurrently, yet, to see how it looks. We shall see how it turns out.

The >100% issue is caused by https://github.com/gatling/gatling/issues/2437. I’m considering reverting it… :frowning:

And the <100% at scenario termination?

I think I see something that might shed light on the problem. What follows is the last screen update before my scenario completed. Notice the number of “done” users across scenarios…

`

I looked at it. I don't see any calculations that might induce rounding
errors at small rates. That's good.

Yeah, but as long as you try to inject < 0.5 usersPerSec, Gatling will
round it to 0, not 1 user every 2 seconds.

The >100% issue is caused by
Core: Use ThreadLocalRandom for PoissonInjection · Issue #2437 · gatling/gatling · GitHub. I'm considering
reverting it... :frowning:

Reverted

And the <100% at scenario termination?

Same thing. The original implementation used a standard Random, so it could
be replayed and we could compte the exact number of users before hand. But
Random is synchronized. I tried to switch to ThreadLocalRandom, but you
lose the reproducibility. It would require adapting the ConsoleDataWriter
algorithm. Giving up for now...

Is that because of the use of Integer math? What if all the variables were Double throughout?

Actually, I’m testing it, and I’m not seeing what you describe. Not exactly, anyway. Take the following code:

`
class InjectionProblem extends Simulation {
var rampTime = 30
var duration = 600
var r = http( “Simple Request” ).get( “http://localhost/” )
setUp(
scenario(“one”).group(“one”) { exec( r ) }.inject(
rampUsersPerSec( 1.0/60.0 ) to ( 10.0/60.0 ) during ( rampTime seconds ),
rampUsersPerSec( 10.0/60.0 ) to ( 10.0/60.0 ) during ( duration seconds )
),
scenario(“two”).group(“two”) { exec( r ) }.inject(
rampUsersPerSec( 1.0/60.0 ) to ( 20.0/60.0 ) during ( rampTime seconds ),
rampUsersPerSec( 20.0/60.0 ) to ( 20.0/60.0 ) during ( duration seconds )
),
scenario(“three”).group(“three”) { exec( r ) }.inject(
rampUsersPerSec( 1.0/60.0 ) to ( 30.0/60.0 ) during ( rampTime seconds ),
rampUsersPerSec( 30.0/60.0 ) to ( 30.0/60.0 ) during ( duration seconds )
)
)
.protocols( Config.httpConf )
}

`

When I execute it, at first, it is working great. Then about a third of the way through the scenario execution time, scenario three stops injecting. Half way through, scenario two stops injecting. Scenario one, which is 0.1666 users per second, or one user every 6 seconds, keeps going, and is going at approximately the right rate, and injects its last user at approximately the right time.

`

More research:

I tested with 600/minute, 800/minute, and 1000/minute injection rates (all at the same time), and it worked fine. I added another scenario at 10/minute, and that scenario worked, but the other three stopped injecting at about 14%.

I tested without the .randomized switch, and tested it four times. Each time injected NEARLY the same number of users before the scenarios stopped, but there was a slight variance from run to run.

Here is the code I used to test:

`

import scala.concurrent.duration._
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.core.structure.PopulationBuilder
import com.cigna.common._

class InjectionProblem extends Simulation {

val rampTime = 30
val duration = 600

val r = http( “Simple Request” ).get( “http://localhost/” )

// Iterator.range( from, exclusive-to )
val list : List[PopulationBuilder] = List( 10, 600, 800, 1000 ).map( n => {
val x : Double = n // * 5
val name : String = “%03.0f”.format( x )
scenario( name ).group( name ) { exec( r ) }.inject(
rampUsersPerSec( 1.0/60.0 ) to ( x/60.0 ) during ( rampTime seconds ),
rampUsersPerSec( x/60.0 ) to ( x/60.0 ) during ( duration seconds )
)
}).toList

setUp( list ).protocols( Config.httpConf )

}

`

I tackled this regression.
Thanks!

Confirmed. That’s awesome, thanks!