How to achieve a certain load using Gatling?

I have my scenario set up and now I want to achieve a load of 10,000 requests per second. How do I do that? How do I ramp up and how many users should I use?
I’m not exactly sure how the users and ramp feature works…

I know gatling does a lot of the calculating for you, but the basic math isn’t hard.

Let’s say 1 user needs 60 seconds on average to complete your scenario, and that it will perform 1 hit in that time.
In order to put 1 hit per second on your server in that case you need 60 users. Agreed?

If that same user does 10 hits, it only takes 60/10 = 6 users to achieve the same result. If you want to put 10.000 requests/sec on that same site using the same scenario, it will take 60.000 users.

Plug in your own numbers here.

As for rampup: it depends on what type of test you’re doing. If you know the limits of your application well, I usually go for about 30 minutes. But if you do not, or you think it’s likely you will reach the limits of the application, a break test will need to be performed first to get a ballpark figure. In that case taking it slow is advisable because fast ramping will make it harder to see when exactly the first bottleneck is reached.

Now… 10.000 hits/sec is quite a lot of load. Are you sure that is realistic? Do you feel the infrastructure you are using is supposed to handle that much?

@Floris: thanks for the help!

Hi Floris, Lira,

It may be worth considering whether concurrent users is the best way to model this.

Gatling allows us to drive throughput at the user level. It can generate for example 2000 users per second assuming they did 5 requests each on average producing the 10k rps.

You need to step back and ask a couple of questions before determining whether user throughput or user concurrency is the right model:

Is my system an administrative type system like a call center with a fixed number of users?

If the site is slow responding does this block or delay the arrival of the next user session to your site?

What is the average session length in pages?

If the answers are: no, no and <7 then user throughput may be the best model for your site.

Thanks
Alex

I know gatling does a lot of the calculating for you, but the basic math isn’t hard.

Let’s say 1 user needs 60 seconds on average to complete your scenario, and that it will perform 1 hit in that time.
In order to put 1 hit per second on your server in that case you need 60 users. Agreed?

If that same user does 10 hits, it only takes 60/10 = 6 users to achieve the same result. If you want to put 10.000 requests/sec on that same site using the same scenario, it will take 60.000 users.

Plug in your own numbers here.

As for rampup: it depends on what type of test you’re doing. If you know the limits of your application well, I usually go for about 30 minutes. But if you do not, or you think it’s likely you will reach the limits of the application, a break test will need to be performed first to get a ballpark figure. In that case taking it slow is advisable because fast ramping will make it harder to see when exactly the first bottleneck is reached.

Now… 10.000 hits/sec is quite a lot of load. Are you sure that is realistic? Do you feel the infrastructure you are using is supposed to handle that much?

The term ‘users per second’ doesn’t really exist unless you’re talking about the rampup stage.
‘sessions per second’, or ‘hits per second’… now that we can talk about.

A ‘user’ really boils down to a thread running a loadtest script, in technical terms. This models well to the reality that many sites use session management to keep track of people’s browsers doing requests on the site.

My main customer right now has about 1.8 million unique visitors logging into the site daily… This is definitely not a ‘call center’ with ‘a fixed number of users’. Nor do we use 1.8 million virtual users to model the load on the site. What we do do is analysis: How long on average is a user logged in? What does that mean for the life time of the session cookie?

If a user spends on average 8 minutes on a certain part of the site, and we know that the first page* is requested 40 times per second during peak hours, then we need 8*60 (seconds) * 40 (hits/sec) = 19.200 threads to model the load on the site.

Furthermore, in order to model the reality that for the vast majority of hits on that page the session has expired or the user has logged out, we need 38 minutes / 8 minutes (session time) = 4.75 unique accounts per thread; 91.200 accounts, or 96.000 to account for a limitation in the test tool which hands out a fixed number of accounts to each virtual user up front.

Now as you may (correctly) notice, the term ‘user’ in most of the loadtest tools is actually a bit of a simplication. It’s not even going to remotely match to the actual number of people using the site. And it does happen that inexperienced people confuse the ‘so many users’ terminology in the loadtest tool with the number of unique visitors, and then set things up to use, say, 1.8 million users rather than the 20.000 or so that is needed to accurately model peak load.

However … from the mathematical point of view, I don’t really see how you could do away with the underlying concept of using multiple threads to drive the load. And any syntactical sugar you are going to overlay that with will always hide details you may need access to to get your load modeled correctly.

… and a few more things:

  • Session length in pages doesn’t even factor into all of these equasions. How many hits are done in a session isn’t even relevant - the only thing that matters is how many hits/sec you are doing in the first step (which should have most of the load), and how long the session stays active.

  • If you set your script/scenario up correctly the site becoming slower will not affect the load, until it starts overrunning the average session length in the above calculations.
    At that point it becomes important to have some kind of notification mechanism to tell you that the load numbers are going to be off by some significant fraction. Some tools / scheduler options will try and increase the load (threads) to compensate for increasing responsetime, but that will never work; If the responsetime increases, the target site is already overloaded, and adding more threads will just add more responsetime rather than doing anything constructive.

So basically everything you said up there is missing the mark; Sorry :wink:

The term ‘users per second’ doesn’t really exist unless you’re talking about the rampup stage.
‘sessions per second’, or ‘hits per second’… now that we can talk about.
A ‘user’ really boils down to a thread running a loadtest script, in technical terms.

With Gatling, virtual users and sessions are basically the same thing, and a Gatling user doesn’t boil down to a thread.

Of course you need threads to achieve concurrency. But basically you can’t have more concurrent threads (ie running at the same time) than you have cores. Above that number, you’re actually time slicing.

Gatling is message oriented, meaning that virtual users are just lightweight messages, not heavyweight threads.
They don’t block threads, neither while pausing (actually scheduled) nor while expecting a response (actually registered on a non blocking IO callback).

Hi Floris,

Thanks for this - it’s a good discussion point.

2 points or parts to this:

user implemented as thread.

So Gatling and another tool called Iago (“i” at the start) (and another internal tool I developed at my last job) model users as messages or tasks passed around a thread pool. This plus async netty underneath means we could given the right conditions (ie. an extreme example to explain) run a test with 1 million users and only a handful of threads in the load injector.

(when I first looked at Gatling was when I was looking at AHC/netty for my internal tool and missed that it used scala actors rather than threads).

The reason why I was writing that tool was because I got tired of load testing engagements where the load generators got overloaded or more load generators were needed than we had servers in the SUT - “there must be a better way” kind of logic.

how to model the test with users / sessions / average session length / pacing calculations etc.

“As Carl Sagan once said, “Extraordinary claims require extraordinary evidence.” … The more disruptive, shocking, or expensive your conclusions and recommendations are, the more backup data you need and the more effort you want to expend in making an airtight case.”

So I will need to spend some time to explain this clearly to you.
In the mean time I would recommend you watch the following:
https://www.youtube.com/watch?v=99RABfKNfcY#t=935

In terms of off the mark - I wasn’t saying what you had proposed was wrong (although it may have appeared that way). What I asked were some questions to determine how to model this system. Depending on the workload you may, given Gatling’s capabilities, choose a ramped then fixed number of users looping with pacing, or inject the users at the rate they may arrive to the system. I think you can do it both ways regardless of the answers, but I would argue there are some systems that lend themselves to one injection method and some with the other. This may be new to many people, but it’s not new and predates when I came across it in my perf testing engagements.

Thanks
Alex

With “heavyweight threads” you mean "heavyweight java threads, I presume.

Because linux kernel threads (processess, really - see http://linuxprograms.wordpress.com/2007/12/19/linux-kernel-support-for-threads-light-weight-processe/ ) actually are fairly lightweight. I’ve heard of people run hundreds of thousands of processes on a single machine without much overhead. The JVM is what causes a lot of the ineficciency here.

And really, to me, that gatling is using ‘messages’ rather than ‘threads’ is just an implementation detail. In practice it’s very much the same thing: A series of sequential steps performed by a single entity, which is almost entirely self-contained and maintains it’s own state.

Conceptually I will call such a thing a ‘thread’ even though the purist will call it an actor, the processor only sees a thread, the kernel only sees a java process, the JVM only sees a java thread, and gatling calls it a ‘user’.

See? Many guises, same thing.

  1. I’ve never had problems with overloaded load generators unless we’re talking about really heavyweight loadtest scripts - say, citrix or trueclient.
    Whether you use threads, processes, actors, or purple wombats is kind of irrelevant. The math is the same. Session length, throughput, latency, response time - all of those are merely variables in the equation.

  2. Ok. That movie … Sorry, I can’t watch that at work and haven’t gotten around to it at home. And your ‘explaining this clearly to me’ … well, I guess I’ll have to wait with bated breath,.

I don’t see what your Carl Sagan quote has to do anything with this. None of the things I’ve said are extraordinary in any way, shape, or form. Calculating how much load your script will put onto the system under test is not rocket science. Just a matter of big spreadsheets and some carefully architected loadtest scripts.

  1. Now we’re talking about how to do performance testing.

Well, it depends a bit on what your goals are and what questions you mean to get answers to.

One thing I always ask new people to read when they join our team is “Thinking Clearly About Performance”: http://queue.acm.org/detail.cfm?id=1854041

Reading that will give you a starting point, a methodology that you can apply to how you design certain types of test scenarios.

One of the things I’ve found over my years as a performance engineer is that there really is little point to gunning for maximum realism when it comes to ramping up the load.
All systems have a breaking point, and all you’re interested in when you’re doing tests where the rampup matters is when that point is reached, and why that point is reached. What exactly the curve looked like isn’t really all that important. Nor is what happens afterwards, unless you for some reason are specifically interested in that.

Sorry to necro this thread, but since my talk was referenced I thought I might take a second to give my take on why Little’s Law is useful but not sufficient here, and why requests per second or arrival rate is the method most sites should be using to specify the amount of load to simulate. The tl;dr is that you should be using arrival rate because that’s how you’ll actually measure things, and it’s useful to have a direct, unchanging correlation between what you measure in production and what you test.

First of all, yes, you’re right that the equation is simple, and it’s fairly profound and useful that it works even in the presence of all the potential moving parts a web application has. Great. Unfortunately it describes a universe at steady state, and it’s rare that the services we test are performing at a steady state. Even in the case of completely artificial traffic, we have the huge variable of new code to deal with. Of course new code is exactly what we’re usually worried about in performance testing, so it may seem odd to call that out as a problem for Little’s Law, so maybe a real life example will help:

  1. Let’s say I have a dead simple web service that exposes one RPC to the world and I initially test it across a range of expected inputs and load shapes. I arrive at the conclusions that we can handle 5,000 concurrent users with an average latency of 50ms and a p99 of 150ms. If we’re lucky this will actually meet or exceed our expected performance and we can go back to figuring out how to build the perfect smoothy in the company cafeteria.

  2. Engineers being engineers, they go and change something and want to know if it’s going to work before they push to production. (At least that’s the scenario that would happen in the ideal world!) We re-run our full range of tests and come back with being able to handle 5,000 concurrent users with an average latency of 75ms and the p99 is a more worrisome 500ms. Someone decides to push to production anyway because our requirement was <100ms average and the site immediately falls over. What happened?

It turns out that each concurrent user is usually modeled with a thread, lightweight or otherwise. By thread here I mean simply the idea of a bounded unit of work which proceeds serially. Threads are a great abstraction for dealing with executing code, where serial execution is generally expected, but they’re inadequate for modeling user behavior, which in the case of a web application is often mediated by a browser and is by no means serial. So, back to our real world case here: when the time interface someone stuck in front of this simple time service suddenly got noticeably slower for a small (< 10% let’s say) percentage of the actual users, they got impatient and hit reload in their browser. That causes a disproportionate increase in your traffic, slowing things down for other people and then they start hitting the oh-so-convenient reload button because “it feels slow”. You may still only have 5,000 actual humans using the site, but they are suddenly generating many more requests. Fundamentally they represent the “open” part of an open system, ie they generate work independent of your site’s ability to handle it.

There’s a pretty serious problem buried in there, which is most consumer-facing websites do not measure their load in concurrent users! This is a problem because you actually need to have conversations with people where you don’t sound like you’re from Mars and they’re from Venus (or vice versa). If you’re a router, would you rather specify the load you can handle in concurrent users or packets per second? The backplane is generally measured in Gbps, you have disks that like to talk about IOPS, databases that can handle so many transactions, etc etc. The systems world thinks in terms of work per unit time, and while you may have some intuition about a given user’s hopes and dreams, what you really see from them is HTTP requests, which thankfully are easier to model. IOW, model things in terms of RPS and you won’t be surprised when your site falls over because it exceeds the expected RPS by a factor of two.

Yes, you can translate concurrent users into arrival rate – for a fixed latency / response time. Unfortunately we are constantly changing the latency/response time with every release. That’s the last thing we want to ‘fix’ in our equation, because it’s exactly the thing that changes. You can run a battery of tests to determine the new response curve for your site and have convenient charts to help you translate back and forth, or you can simplify your life and talk about RPS / arrival rate – which has the significant benefit of actually being directly measurable in production.

Bonus round: if arrival rate is what people should be using to simulate load, why aren’t people doing it? The answer is that their tools don’t make this easy, and we’re a fairly conservative (aka lazy) crowd. The tools don’t make it easy because they all use threads to model concurrent users, and if you try to specify load in terms of arrival rate, the number of threads required is potentially unbounded. There’s a simple solution to this, but it requires the load generator to maintain a queue of work, instead of having the simplifying assumption that each thread is an independent unit of work. I also think it may have something to do with the fact that business folks who used to be buying this sort of software can easily understand “virtual users” and selling 10,000 virtual users is an awesome business model when your customers are on the internet and might get flooded with 100K users on a peak day. Also many performance engineers are, sadly, engineers in name only and don’t actually understand the systems they test except through the medium of pretty graphs. For them virtual users is probably a comforting abstraction…

Hi James,

Thanks for your insights.

I mostly agree.
Yet, as I was about to reply to Floris is another thread, there’s still some use cases where thinking with RPS only is not sufficient.
For a long time, developers have seen the server memory as a kind of unlimited silver bullet storage where you could store state that should have been in the client (but we didn’t have all the javascript technologies back then) and in the database (let’s store all these rows in the user server session, SQL paging is hard and we’ll save some database roundtrips this way, yay!).
This was and is very very wrong. Still, many legacy systems are built this way and when testing this kind of system, you have to account for how many server user spaces you’ll have.
I’ve seen some people having issues with their tool not being able to properly generate enough virtual users and trying to compensate with lowering think time in order to achieve the same global RPS.

Then, modern systems should be as stateless as possible, and with those, I agree that the number of concurrent v.u. shouldn’t be part of the equation.

Cheers,

Stéphane

I don’t disagree at all that you need to model users, but don’t do so at the level of the load generating architecture. That can/should be a higher level concern, although I’d personally refer to it as session management instead of the over-broad “virtual user” concept.

Keep in mind that I haven’t worked on a site that actually used sessions in years, because they don’t scale to the level of traffic I’ve been concerned with for a while now. So maybe there’s something in there that is worth driving all the way down to which load generator you pick, but I would be very skeptical of that absent some strong proof.

Keep in mind that I haven’t worked on a site that actually used sessions in years, because they don’t scale to the level of traffic I’ve been concerned with for a while now.

Lucky you…

Hi Stéphane,

It’s a good point.

If I work that logic through though, “users per second” injection can only produce equal or more unique user session (objects) than static closed loop vusers, most easily seen when the SUT starts to slow down significantly.

The number of session objects in memory at any one time is typically a function of

The arrival rate of new users or whatever request triggers session object creation,
The total duration the user interacts with the system and
The idle timeout of the session object container.

There is no concurrent user input parameter there.
The first one is most easily / naturally modelled by an open workload.
The second 2 are not related to whether you choose open or closed.

However, that is not to say that you should not validate that the test produces the number of session objects compared with what you expect from production measurements or other sources.

Given you have designed the DSL so that the scenarios are orthogonal to the injection method it should be easy to demonstrate this (or prove it wrong).

So if anything consideration for session objects leads to modelling an open workload to guarantee that the rate of creating session objects is maintained uncoordinated with the system being tested.

WDYT?

thanks,
Alex

Hi James,

Thanks for your insights.

I mostly agree.
Yet, as I was about to reply to Floris is another thread, there’s still some use cases where thinking with RPS only is not sufficient.
For a long time, developers have seen the server memory as a kind of unlimited silver bullet storage where you could store state that should have been in the client (but we didn’t have all the javascript technologies back then) and in the database (let’s store all these rows in the user server session, SQL paging is hard and we’ll save some database roundtrips this way, yay!).
This was and is very very wrong. Still, many legacy systems are built this way and when testing this kind of system, you have to account for how many server user spaces you’ll have.
I’ve seen some people having issues with their tool not being able to properly generate enough virtual users and trying to compensate with lowering think time in order to achieve the same global RPS.

Then, modern systems should be as stateless as possible, and with those, I agree that the number of concurrent v.u. shouldn’t be part of the equation.

Cheers,

Stéphane

I’m not sure why there appears to be an impression that I am advocating using the “virtual users” fallacy as an input into your load calculations.

Quite the reverse: I believe the number of ‘virtual users’ is an end product of your calculations; Something that you use to achieve the amount of requests per second that you need to put on various pieces of the system under test, but is otherwise not really all that relevant.

A few notes, though.

  • Using ‘queues’ as a model for your load generator seems a bit like putting the cart before the horse.

Sure, it’s possible, and if you’re entirely stateless I’m sure you can achieve a lot of throughput that way. But the reason you’re doing that in the first place is because a) threads are heavy and slow (except that they aren’t) and b) the virtual user model to driving the load is flawed (except that that is only part of the truth!)

Let’s not forget that the way your operating systems’ scheduler works with timeslicing means that a thread is fundamentally a process which is fundamentally a queue of work units that get interleaved with other units of work from other processes on a fixed number of processor threads.

So what you’re actually saying on a fundamental level is that you feel that your implementation of queues is better than the OS’s implementation of queues :wink:

And maybe you are right. But maybe you aren’t. I know the linux kernel has been capable of running quite impressive numbers of concurrent threads in the past, and there is no reason why it should not be able to do so again. So performance should not be the only reason to go to a queue based model.

The other point is that if you drop the threaded model you somehow still will need the ability to keep track of various bits of state across requests. A login button still needs you to keep track of things like session cookies and POST data, for example. If the site you’re testing uses sessions you need to somehow simulate that behaviour. Which means that at the end of the day you’re still doing something that looks an awful lot like ‘threads’, even though you’re calling it ‘queues’ or ‘messages’ or whatever other fancy term you’re going to attach to your internal per-session state machinery.

So what I am basically saying here is: How you implement things from a technical standpoint should not be conflated with the model you use to create your simulation of the real world user. Threads are a useful abstraction.

Taking that further, I maintain that ‘the virtual user’ actually is a useful abstraction for something that is happening in the real world. There is a user. And he is doing a bunch of things on your site, and he is doing them in sequence.

That one of those things may be visiting a page that causes the browser to spawn 30 threads trying to fetch all the resources - that is something the loadtest tool should be able to figure out and just ‘Do The Right Thing’ in response. Even if that means spawning another 30 threads. Or generating 30 messages. Or whatever mechanism you’re using to make things happen concurrently.

And that one of those things happens to be hitting a reload button that causes the browser to resubmit a request your site is still processing?
Well, that, bluntly, is a problem for the loadtest tool to figure out. The person who writes the script should be able to just say “hey, do this request, and if it doesn’t finish within , resubmit it!”, or perhaps even “if any of the requests take longer than X time, resubmit them!”.

There is no fundamental reason why the high level model of the load can’t be single threaded and sequential. Except that ‘emulating browsers is hard’.

Well, welcome to the real world.

  • Dealing with a changing environment.

One of the things I noticed in James’s email was that he objects to using threads because ‘latency’ and ‘responstime’ influence the amount of load you’re generating. And he put the blame for that squarely on the shoulders of the

One thing I have been doing (for years now) is to set a semi-random fixed amount of session pacing to my loadrunner users. This means that every virtual user will always take (on average) the same amount of time to restart it’s iteration, (almost) independantly of what the site’s response times are doing, and completely independently of the number of requests that’s being performed in that time.
This takes away the impact of changing response times and makes load calculations (using RPS or arrival rate as the input) very much simpler. (And it correlates nicely with the average users’s session duration, now that I mention it.)
Not to mention: Static. The arrival rate of users on the site doesn’t change nearly as often as the responstimes might, and since I’ve taken away the latter variable I can leave the number of virtual users in a test unchanged across releases.

  • Using ‘arrival rate’ to model the load

Yes. That’s how everyone should do their load calculations. Every number should be in terms of requests / sec, for as long as possible. The conversion to ‘users’ only happens when I get to the point where we have decided on a session length and need to start running tests. :wink:

But please don’t conflate that with “we should do away with the concept of a virtual user entirely”.
Because that, imho, is just simply the wrong way to go about it.