Imagine a factory where workers are scheduled to start working every 5 seconds. There is a big line of people standing at the time clock, and every 5 seconds, one of them punches in. As soon as they clock in, they get to work doing their job…
When you say “constantUsersPerSec()” you are simulating that same situation. Every so many milliseconds, a new virtual user is started.
However, the time required to complete the scenario will vary from user to user. It might take longer to resolve a DNS name for some users than others. It might take longer to establish a TCP/IP connection for some connections than others, depending on the congestion of the network. If the network is really congested, some requests might have to re-send some packets, slowing everything down. And that’s not even counting the fact that the server has to do work with every request, and how long that work takes may vary from request to request.
Now, if you have a new virtual user starting every 2 milliseconds, but the user takes 150-200 milliseconds to do its work, then the number of active users will vary over time, as will the requests per second. There will be periods when lots of users all complete in the same second, hence the RPS spikes.
As for why are there not an equal number of periods below 500 RPS? That’s a trick of math. The average RPS is not the average of the individual RPS values, which is what the question implies. It’s total requests divided by total seconds. If you calculate it that way, I imagine you will probably find that the overall RPS is very close to 500 RPS, assuming that the server is not overloaded.
Now, if the workload does not vary much, then your graph should eventually level off and be mostly flat. The more variation in the time required to complete the task, the more jagged the graph will be. So the fact that you see such a jagged graph may actually be an indicator of a problem with the service being tested.
My suggestion is to do a smooth ramp test. Ramp from 0 to 500 RPS over a long period (like, an hour). Then sustain it at 500 RPS for another hour. Then look at the graph. The graph will clearly indicate when the server begins to be overloaded. When you test above and beyond the saturation point, you are bound to see anomalous results. If that saturation point is lower than your production targets, then you need to either fix a performance issue, or scale out your hardware.
Hope that helps.