Look back at the code I sent before. Notice the loop. Because of that loop, your virtual user will never exit (unless forced to do so by using maxDuration. Alternatively, you can replace the “forever” with “during( time )” - whatever works for you.
So first you start up one user. That one user will step through your scenario. When it finishes, it will loop back and do it again. There will never be more than one active user. Then you ramp up a second user. Now there are two that are doing the action at the same time, but each will be in a different part of the process. Repeat until you have 50 concurrent users.
Let’s do some math. With 50 concurrent users (with pauses, a 4 transaction scenario over 9-10 seconds, means 20-25 transactions per second. Unless your system is badly misconfigured, your system ought to be able to do that without even breathing heavy. If the system can’t do at least 100-200 TPS, then something is probably not optimized.
What typically happens with resource-constrained systems is, as you give it more work to do, it takes longer to get the work done, but the overall throughput remains relatively constant. So if you have 100 users trying to do 100 things per second, and that happens to be the limit of what the system can do, then if you give it 200 users trying to do 200 things, it will happily do it, but instead of being able to process each request in a second, it will take 2 seconds each. End result is still 100 transactions per second.
My suggestion is, ramp up your scenario (with the “forever” loop) from 1 user to, say, 1000 users, at a rate of 1 user per minute. That will take about a day (16.66 hours). Set maxDuration to 17 hours. Maybe do it over the weekend. Then look at the graph that Gatling produces. Unless your system is insanely performant, you should see responses per second level off at some point. At that point, look at how many users were active at once. That is your maximum concurrent user load before response times start increasing. Do this exercise, and you will see what I mean.
Armed with that, you can do a long-running scenario where you ramp from 1 to X users over X-1 seconds, and then let it run for a few hours. This gives it time to “bake” and get through a GC run or two (if that applies to your application).
Before you do that run, I suggest tweaking the Gatling configuration so that the first report category is 1 second, the second is 3 seconds, that way you will be able to tell if 95% were under 1 second, and how many were more than 3 seconds.