True random distribution of 100 000 client requests

Hi,
I am tasked to identify request spikes when 100 000 clients make a request at a random time during 2 hours.
The average load is easy to test, but statistics dictate that it is a very high probability of seeing peaks much higher than the average with truly random clients.

I am using Gatling 1.3.x

My approach is to ramp all clients immediately, but each client makes a random pause between 0-2 hours as the first step in the test.

val scn = scenario(“Pickup data from “+users+” users, each with one random pickup over a period of “+period+” seconds”)
.pause(0 seconds, Duration.create(period, TimeUnit.SECONDS))
.feed(csv(feedFilename))
.tryMax(3) {

bootstrap.pause(500 milliseconds)
.exec(http(“PickupMobilePeriodicTicket_request_1”)

…the rest of the scenario here.

//We ramp all the users at once, then wait randomly in parallel

List(scn.configure.users(users).ramp(1).protocolConfig(httpConf))

Now, it seems that I am running out of HTTP connections pretty fast.

-When does Gatling actually open a HTTP connection? Is it before the initial pause? I was hoping it would be at the first POST.
-I thought HTTP connections were shared in Gatling 1.x, should I expect 100.000 parallel HTTP connections in my test?

Any alternative approaches are welcome.

-Karl Ivar Dahl

Hi,

First of all, 1.3 is pretty old (1 year), it would be a good thing that you upgrade. We won’t investigate possible issues in such old versions: we have fixed many things since, and same thing goes for the dependencies such as Netty and AsyncHttpClient.

Yes, before 1.5, connections were always shared.
Connections are opened when need, not in a preemptive way.
Connections are automatically closed if they stay idle for too long.

So no, you shouldn’t expect 100.000 parallel connections.

Cheers,

Stéphane

Thanks Stéphane,
I’m aware this is an old build, but since I have had no problems with it so far I am holding out until 2.0.

I have looked closer at the issue myself, and you are absolutely correct.
I bumped the heapspace to 1200m and this solved the immediate problem.

So while few threads are used by Gatling, a significant amount of resources are still needed for 100.000 parallel clients (I’m not complaining, I’m impressed!)

I would like to propose more explicit support for running “true random” distribution of scenarios over a period of time. Poisson distribution suggests that this will reveal a lot of gotchas on already load tested systems, for example, the attached Gatling report for 1000 random request shows double the load than the expected average in periods.

Will it be a more obvious way to run such a test in Gatling 2?

Thanks again,
Karl Ivar

Glad you could fix your problem.

Generally speaking, connections cost memory. This is even more true in HTTPS.
In you case, as you share connections, this could not be the real problem. Hard to tell without profiling your scenario. This could be the amount of data saved in the sessions.
Anyway, we’ve done some interesting improvements there in Gatling 2.

Regarding pause distribution, Gatling has exponentially distributed pauses since 1.2.0 but it involves using a different keyword, pauseExp instead of pause: https://github.com/excilys/gatling/wiki/Structure-Elements#wiki-pause

Hm, there is no attachment :wink:
It’s not really standard to test a system like this. Normally there always is a rampup/breaktest first, with a gradual rampup to some point well above expected maximum peak load, to find out where the limits of the system are, before we start doing more elaborate things.
This particular scenario will start quite suddenly, and if the system under test doesn’t handle it, you have no real way of determining what resource limits you are running up against. Sure, your test may match reality better, but it’s predictive value is probably going to be limited to a simple yes/no binary answer to the question “can the system handle this?”

No attachment? Hmm, I can see the attachment in my browser. I’ll attach the image here as well. Maybe your browser has problems with PNG files?

Anyway, I do “warm up” my system in front of this test but removed it for the example. But I believe a gradual ramp is not so important as the average load is well below max. The interesting point is to see that there are peaks with very high probability that will pass max when running randomly distributed requests, and for some test scenarios this is probably a more realistic test.

Thanks,
Karl Ivar

The actual scenario is 100.000 mobile phones that I know will retrieve a QR code over the span of 2 hours. The actual pickup time is randomized within this interval on the phone.
I skimmed the wiki article on exponential distribution, but don’t quite understand how I can use pauseExp to achieve the same effect.
Should the pauseExp have a mean of 1 hour?
Do I still have to ramp all clients at once, or should I ramp up on the average load?

Thanks for the interest,
Karl Ivar

No, it was just the tablet being too good at hiding little icons :slight_smile:

As for the spikes you’re predicting - if you already know what the system can handle, and what resource is going to be the limiting factor, then this type of test could be used to confirm the theory. But you should already know what will happen.
However it does leave the question what kind of clients we are looking at here. Real humans respond differently to response time delays than programs do. If a sudden spike could cause the clients to retry, or humans to hit reload, then the resulting follow on load could take something down that looked fine in your test scenario.

As an aside - what exactly is measured in that ‘transactions’ graph? I haven’t seen anything like loadrunner transactions in gatling scripts, so what are we measuring there?

2013/11/8 Karl Ivar Dahl <kidahl@gmail.com>

@Floris: LoadRunner’s transactions are Gatling’s groups.
See https://github.com/excilys/gatling/wiki/Reports#wiki-transactions

You are correct, I have performed a test with a ramp-up that ultimately takes down the system. I know this is at approx double the average performance I am now testing.What I am looking for now, is to see if the peaks on medium load will be enough to cause errors, or if the peaks are so short that the system recovers and pending requests are served before timeouts occur.

Human behavior is difficult to predict, I have taken this into account by 3 retries on failure with a pause between (which probably should be pauseExp instead of pause :slight_smile:
So an unexpected peak load can definitively cause large ripples in the system due to there retries, that is what makes this “medium-stress” test so interesting.

The graph is generated by Gatling 1.3.x.
I believe Transactions are the number of received responses over time, where requests over time are outgoing requests.

-Karl Ivar

So in gatling transactions really mean ‘responses’. Whereas in LR a transaction is a set of requests related to a single user action.

Confusing stuff.
Groups aren’t used much, are they? Is there documentation on the subject?

Yes, you pass the mean. But of course, you have to be sure that your users behavior can be simulated this way :slight_smile:

Of course, the users are always very mean to my system :wink:

Jokes aside, in this case I know the clients pick a random time in the 2-hour interval. The client is in this case an APP that take this initiative without user intervention (background scheduled job)

-Karl Ivar

IIRC, JMeter (or maybe JMeter plugins) uses the same term.
But I agree, I don’t like it and finding a better name for gatling 2 would be a good idea. “responses”, as you suggest, sounds a good one.

However, I don’t like LR’s meaning of “transaction” either :slight_smile:
“transaction” has a meaning of ACIDity that has nothing to do with a sequence of requests.
Maybe that’s me again being more of a developer than a tester, but we have to reserve this term for future contexts were it would really make sense, like JDBC support.

Myeah, the LR naming has flaws too. I don’t claim it’s perfect. Just that having a name mean two different things for two different tools brings confusion.

Group as a name isn’t great either. A LR vuser group is a set of users all running the same script. What gatling calls a scenario, I believe. Whereas a LR test scenario includes one or more vuser groups that may or may not run concurrently. (But likely do…)

It seems to me there isn’t much overlap between communities, or there would have been a lot less naming conflicts :wink:

Finding proper meaningful names is surely something difficult. :slight_smile: