Clarification on constantUsersPerSec and throttling

Hello i am very new to Gatling and i am trying to simulate 500 requests per second for writes and selects to Cassandra

setUp(
scnInsert.inject(constantUsersPerSec(500) during (30 seconds)).protocols(cqlConfig),
scnSelect.inject(constantUsersPerSec(500) during (30 seconds)).protocols(cqlConfig)
)
I thought inject(constantUsersPerSec(500) during (30 seconds) ) would be enough for hitting this rps. The attached image is almost the same for select.
However i saw from the reports that looks a bit “unstable”. Sometimes it does not hit the target of 500 requests per second or it goes way above it (See attached )
My question is why it does not hit the target way below and why it goes above ? (Note running it locally in my pc)

So i am reading online about throttling and i am a bit confused about it. Why would i need throttle on top of ConstantUserPerSec?
Is there any point for instance specifying more ConstantUserPerSec e.g 100 and having throttle of rps 50?
Why i do not just specify 100 ConstantUserPerSec?.

Thank you in advance,
Aristoula

Imagine a factory where workers are scheduled to start working every 5 seconds. There is a big line of people standing at the time clock, and every 5 seconds, one of them punches in. As soon as they clock in, they get to work doing their job…

When you say “constantUsersPerSec()” you are simulating that same situation. Every so many milliseconds, a new virtual user is started.

However, the time required to complete the scenario will vary from user to user. It might take longer to resolve a DNS name for some users than others. It might take longer to establish a TCP/IP connection for some connections than others, depending on the congestion of the network. If the network is really congested, some requests might have to re-send some packets, slowing everything down. And that’s not even counting the fact that the server has to do work with every request, and how long that work takes may vary from request to request.

Now, if you have a new virtual user starting every 2 milliseconds, but the user takes 150-200 milliseconds to do its work, then the number of active users will vary over time, as will the requests per second. There will be periods when lots of users all complete in the same second, hence the RPS spikes.

As for why are there not an equal number of periods below 500 RPS? That’s a trick of math. The average RPS is not the average of the individual RPS values, which is what the question implies. It’s total requests divided by total seconds. If you calculate it that way, I imagine you will probably find that the overall RPS is very close to 500 RPS, assuming that the server is not overloaded.

Now, if the workload does not vary much, then your graph should eventually level off and be mostly flat. The more variation in the time required to complete the task, the more jagged the graph will be. So the fact that you see such a jagged graph may actually be an indicator of a problem with the service being tested.

My suggestion is to do a smooth ramp test. Ramp from 0 to 500 RPS over a long period (like, an hour). Then sustain it at 500 RPS for another hour. Then look at the graph. The graph will clearly indicate when the server begins to be overloaded. When you test above and beyond the saturation point, you are bound to see anomalous results. If that saturation point is lower than your production targets, then you need to either fix a performance issue, or scale out your hardware.

Hope that helps.

Thank you very much for the very detailed explanation and the suggestion we will definitely apply the ramp and and when we start seeing this jagged pattern then we will know that we are reaching our limits.

So this jagged pattern (attached screenshot above for inserts )is happening for inserts and selects running in parallel

setUp(
scnInsert.inject(constantUsersPerSec(500) during (30 seconds)).protocols(cqlConfig),
scnSelect.inject(constantUsersPerSec(500) during (30 seconds)).protocols(cqlConfig)
)

Now for the case where i am simulating the case where a user does an insert and then a select (same record) the rps graph is more flat.

val scn = scenario(“Two statements”).repeat(1) {
feed(feeder)
.exec(cql(“prepared INSERT”)
.execute(insert)
.withParams("${id}", “${str}”)

.consistencyLevel(ConsistencyLevel.ANY))
.exec(cql(“simple SELECT”)
.execute(“SELECT * FROM test_table WHERE id = ‘${id}’”)
.check(rowCount.is(1)))
}

setUp(scn.inject(constantUsersPerSec(500) during (30 seconds))).protocols(cqlConfig)

Attached screenshot for sequential inserts after selects.

95% Percentile for insert (parallel scenario) : 9ms

95% Percentile for insert (sequential scenario) : 5ms

Anything obvious i am missing and i cant explain why this is happening?

If you have two users trying to operate on the same record, whichever one arrived later has to wait for the first one to complete before the second one can begin. If you want to avoid that, make sure you have enough records that every virtual user is operating on a different record. If you can make that happen, I’d do that anyway, as it allows you to blow past any database record caches, or even operating system disk caches, and you can measure what real-world worst-case (non-error) behavior looks like.