Gatling scaling

I have had a case where we cannot not have super good hardware to generate node, in which case I needed multiple less powerful hardware to generate load that I needed.

Or as in consuming lots of CPU because you have a crappy NIC?
Or even bandwidth? (most datacenters use 1 Gbps or 10 Gbps for internal links these days I believe)

There are several possible reasons to scale out beyond “generators have limited resources”.

  1. Load balancers.
    If you have a loadbalancer set to be IP-sticky then you’ll need at a minimum more than one IP just to make sure the load gets spread across the backend machines properly. This can be done with IP spoofing, but often it’s simpler (and less likely to upset network security people) to just have more than one generator.

  2. More complicated protocols require quite a lot more resources / CPU power.
    Ajax/Trueclient, Citrix, “web services”, …

This may not yet apply to gatling, but I guarantee you, once you start making the client simulation closer to what modern browsers do you’re going to need more horsepower.

The difference between hitting URL’s with raw data and simulating clicks on buttons can actually be quite large in terms of resource consumption, especially if that means running the half-a-million lines of javascript certain sites dump into your browser nowadays.

  1. Network bandwidth.
    As Stéphane points out, if you are going to send really large amounts of data you can exceed the total capacity of the nic. This is really a variant of the “generators have limited resources” theme.
    It happens less and less though. :wink:

  2. Seperation of responsibilities.
    Seperating the generator from the controller, the controller from the scheduler, the scheduler from the user interface etc is quite simply just good engineering.
    If you want to offer your tester live views of what is going on inside the test you don’t want the gui code to be running in the same JVM as the generator itself.
    Nor do you want multiple users to be able to stomp all over each other’s tests or analysis by accident - while you do want to offer a view of who ran what tests when. With a single monolithic app you simply can’t do that.

“Scaling out” really is just a tiny subset of the things you can start doing once you have a properly architected set of components that talk to each other, rather than a monolithic application.

LR has 5 components:

  • Generator (LR agents)

  • Controller (controls a single test/one user simultaneously, can be used standalone to get a live view on whatever test you’re running)

  • Scripting interface (vugen, desktop app)

  • Analysis tool (also a desktop app)

  • ALM - aka “Performance Center” - Web interface useable by multiple people to run multiple tests simultaneously using any number of controllers and generators. Also offers live test views and tracks all the test assets (scripts, scenario’s and what not)

The latter thing is pretty complicated stuff but it’s really just a management interface around the first three items.

As for “why gatling”…
Well. LR is pretty good stuff … but it’s proprietary, expensive, stagnant, buggy, and inflexible. It doesn’t support the whole agile model our customers want to move to very well, and adding the things we need to that means waiting for the supplier to start building the required functionality (and paying through the nose for it).

Never mind the bugs :wink:

… and I completely forgot item #5

  1. Measurement quality.

Quite simply put: If you are going to use just one generator, you can’t be 100% sure that a bad response time is the result of the system under test being slow, or the generator being slow. Especially when generators are virtualized it isn’t necessarily guaranteed that that generator has all of the resources on the physical hardware to itself. Nor can it be guaranteed that all of the components in that system are problem-free. A faulty nic can cause very large problems and it would be very easy to blame those on the system under test.

I’ve had some pretty interesting discussions with certain senior loadtesters over on the LR boards some time back. James Pulley for instance believes that all test farms should have at least 5 generators:

  • 2 regular load generators

  • 1 generator that runs only a fraction of the load, to measure the impact of the test load on responstimes for the first two machines.

  • 1 generator out of order or in maintenance.

  • 1 generator ‘spare’

That’s a bit on the extreme side, though. :wink:

  • 1 on all these points

Quote of the day: “Floris Kraak knows his performance test stuff!”

Agreed, I learned something. I can use these points to reinforce my request of more load generators :slight_smile:

2) More complicated protocols require quite a lot more resources / CPU
power.
Ajax/Trueclient, Citrix, "web services", ...
This may not *yet* apply to gatling, but I guarantee you, once you start
making the client simulation closer to what modern browsers do you're going
to need more horsepower.
The difference between hitting URL's with raw data and simulating clicks
on buttons can actually be quite large in terms of resource consumption,
especially if that means running the half-a-million lines of javascript
certain sites dump into your browser nowadays.

Agreed on all points except this. If I start making the client simulation
closer to what browsers do, I might as well use Selenium (and then I'm
*really* going to need more horsepower). If I need a load testing tool to
figure out that my JS is slow, I've used a cannon to crack a nut, this is a
job for profiling tools on the browser.

I agree on most of your points, but I have some comments

There are several possible reasons to scale out beyond “generators have limited resources”.

  1. Load balancers.
    If you have a loadbalancer set to be IP-sticky then you’ll need at a minimum more than one IP just to make sure the load gets spread across the backend machines properly. This can be done with IP spoofing, but often it’s simpler (and less likely to upset network security people) to just have more than one generator.

Makes sense. However, most of the time, faulty load balancer configuration will end up routing all traffic from a single ip to one node. At the same time, it can easily be verified that all servers are being hit. I don’t know about you guys, but we tend to use Big Ip LTM in our projects, and seldom have to modify the config. In addition, I believe one should be able to trust commercial grade enterprise hw/sw, considering that they perform lots of testing too on their side (using IXIA and the like).

  1. More complicated protocols require quite a lot more resources / CPU power.
    Ajax/Trueclient, Citrix, “web services”, …

This may not yet apply to gatling, but I guarantee you, once you start making the client simulation closer to what modern browsers do you’re going to need more horsepower.

The difference between hitting URL’s with raw data and simulating clicks on buttons can actually be quite large in terms of resource consumption, especially if that means running the half-a-million lines of javascript certain sites dump into your browser nowadays.

Well, in my opinion, going down this pathway, will give you trouble. Until now, people have been focusing on writing applications that are unit testable. I believe it’s about time people start writing apps that are performance testable - and drop using application frameworks that masks the HTTP protocol. If you’re using JSF or Wicket or all kinds of portal-systems, you’ll get in trouble. IMHO, REST-based (single page) applications are the way forward from a performance testing perspective.

  1. Network bandwidth.
    As Stéphane points out, if you are going to send really large amounts of data you can exceed the total capacity of the nic. This is really a variant of the “generators have limited resources” theme.
    It happens less and less though. :wink:

I’ve never had any issues during performance testing of regular web applications. Streaming and the likes might be a different story though.

  1. Seperation of responsibilities.
    Seperating the generator from the controller, the controller from the scheduler, the scheduler from the user interface etc is quite simply just good engineering.

If you want to offer your tester live views of what is going on inside the test you don’t want the gui code to be running in the same JVM as the generator itself.

Well, I believe it is a trade-off. If you gui uses lets say reactive programming techniques, then you should be ok.

Nor do you want multiple users to be able to stomp all over each other’s tests or analysis by accident - while you do want to offer a view of who ran what tests when. With a single monolithic app you simply can’t do that.

Agreed, however from my experience, most shops only have a few performance testers, and those guys should be able to talk to each other.

“Scaling out” really is just a tiny subset of the things you can start doing once you have a properly architected set of components that talk to each other, rather than a monolithic application.

Agreed.

  1. Measurement quality.

Quite simply put: If you are going to use just one generator, you can’t be 100% sure that a bad response time is the result of the system under test being slow, or the generator being slow. Especially when generators are virtualized it isn’t necessarily guaranteed that that generator has all of the resources on the physical hardware to itself. Nor can it be guaranteed that all of the components in that system are problem-free. A faulty nic can cause very large problems and it would be very easy to blame those on the system under test.

I’ve had some pretty interesting discussions with certain senior loadtesters over on the LR boards some time back. James Pulley for instance believes that all test farms should have at least 5 generators:

  • 2 regular load generators

  • 1 generator that runs only a fraction of the load, to measure the impact of the test load on responstimes for the first two machines.

  • 1 generator out of order or in maintenance.

  • 1 generator ‘spare’

That’s a bit on the extreme side, though. :wink:

Agreed. Faulty servers are troublesome.

LR has 5 components:

  • Generator (LR agents)

  • Controller (controls a single test/one user simultaneously, can be used standalone to get a live view on whatever test you’re running)

  • Scripting interface (vugen, desktop app)

  • Analysis tool (also a desktop app)

  • ALM - aka “Performance Center” - Web interface useable by multiple people to run multiple tests simultaneously using any number of controllers and generators. Also offers live test views and tracks all the test assets (scripts, scenario’s and what not)

The latter thing is pretty complicated stuff but it’s really just a management interface around the first three items.

I had to use a performance center setup recently. It’s a nasty beast, and at least my installation was full of bugs and caused me lots of trouble. Mega hard to debug what was going on, too.
I’d prefer a simpler architecture with one controller and a bunch of generators. Fewer components, less trouble.

Of course, it would be really nice to have all the features you mention. It comes at a price though, and that is less other features.

I also said “Citrix”, remember? Things like that (or RDP, as another example) tend to be quite heavy.

Anyway, to get back to your point: HP has been trying to do that for years, with various degrees of success. Remember that “Ajax protocol” I was talking about as an example of something with horrible performance? That was one attempt.
The latest attempt has been TrueClient - and that’s pretty much an embedded firefox instance used as a load generation tool.

For some reason there is this idea floating around in upper management layers that performance testing has to be cheap, done by the cheapest people, using smart tools that do all the thinking for them. HP has been making steps towards unifying their functional test tool with loadrunner over the years, precisely for this reason.

Personally, I don’t really agree. Performance is and always has been a complicated game, that require knowledgeable people. But that opinion isn’t shared by everyone.

I agree on most of your points, but I have some comments

There are several possible reasons to scale out beyond "generators have
limited resources".

1) Load balancers.
If you have a loadbalancer set to be IP-sticky then you'll need at a
minimum more than one IP just to make sure the load gets spread across the
backend machines properly. This can be done with IP spoofing, but often
it's simpler (and less likely to upset network security people) to just
have more than one generator.

Makes sense. However, most of the time, faulty load balancer configuration
will end up routing all traffic from a single ip to one node. At the same
time, it can easily be verified that all servers are being hit. I don't
know about you guys, but we tend to use Big Ip LTM in our projects, and
seldom have to modify the config. In addition, I believe one should be able
to trust commercial grade enterprise hw/sw, considering that they perform
lots of testing too on their side (using IXIA and the like).

It's been years since I last had to do this type of testing, but it does
happen that when a big institution buys for half a million worth of
hardware, they want someone to verify with certainty that that hardware
performs it's job correctly.

You're correct, though. Aside from the odd configuration error problems
with network hardware are very rare. I've only really seen it happen once,
with a slightly cheaper cisco router that had the "quality of service"
configuration enabled without configuring it, and traffic coming in at 100
mbit on one side with a 1Gb interface on the other end. File transfers
larger than about 60kb would somehow end up getting queued for ages without
dropping packets, causing very odd behaviour (and breaking TCP flow control
in the process.)

2) More complicated protocols require quite a lot more resources / CPU
power.
Ajax/Trueclient, Citrix, "web services", ...
This may not *yet* apply to gatling, but I guarantee you, once you start
making the client simulation closer to what modern browsers do you're going
to need more horsepower.
The difference between hitting URL's with raw data and simulating clicks
on buttons can actually be quite large in terms of resource consumption,
especially if that means running the half-a-million lines of javascript
certain sites dump into your browser nowadays.

Well, in my opinion, going down this pathway, will give you trouble. Until
now, people have been focusing on writing applications that are unit
testable. I believe it's about time people start writing apps that are
performance testable - and drop using application frameworks that masks the
HTTP protocol. If you're using JSF or Wicket or all kinds of
portal-systems, you'll get in trouble. IMHO, REST-based (single page)
applications are the way forward from a performance testing perspective.

Honestly, I've have this standing policy that I will not allow myself to
get involved in tests using Citrix, RDP, RMI, or anything else that doesn't
look at least vaguely like text based HTTP traffic. That has served me well
over the years, as practically every performance test using loadrunner I've
seen so far using any of the above protocols has been a spectacular failure.

In other words: I completely agree, but that doesn't stop dumb management
folks from purchasing software from outside sources that breaks that rule
(and invariably performs poorly ..).

(I might make an exception for something modern, extremely well documented
and open, like Google Protocol Buffers - but even then I would seriously
consider switching away from loadrunner for such a test - the language
choice just doesn't agree with such a project ..)

3) Network bandwidth.
As Stéphane points out, if you are going to send really large amounts of
data you can exceed the total capacity of the nic. This is really a variant
of the "generators have limited resources" theme.
It happens less and less though. :wink:

I've never had any issues during performance testing of regular web
applications. Streaming and the likes might be a different story though.

It gets rarer as the network bandwith increases.
We do have to test things with limited bandwith and varying degrees of
latency though, due to the prevalence of wifi and mobile networks nowadays
(not to mention the existance of offices in Australia.)
But that doesn't really affect how many generators you use ..

4) Seperation of responsibilities.
Seperating the generator from the controller, the controller from the
scheduler, the scheduler from the user interface etc is quite simply just
good engineering.

If you want to offer your tester live views of what is going on inside the

test you don't want the gui code to be running in the same JVM as the
generator itself.

Well, I believe it is a trade-off. If you gui uses lets say reactive
programming techniques, then you should be ok.

I would be extremely careful with that. Since gatling is running inside a
JVM, garbage collection pauses will occur to deal with the extra data
required with that fancy GUI you want to use.
Garbage collection pauses that in my view already can have too much
influence on your test results - the system under test often uses software
in a JVM as well, and we really do have an interest in what the impact of
the GC collects is on that end.

Nor do you want multiple users to be able to stomp all over each other's

tests or analysis by accident - while you *do* want to offer a view of who
ran what tests when. With a single monolithic app you simply can't do that.

Agreed, however from my experience, most shops only have a few performance
testers, and those guys should be able to talk to each other.

True. The organisation I work for is a bit of an exception.
For most people a simple generator/controller split should be good enough.

"Scaling out" really is just a tiny subset of the things you can start

doing once you have a properly architected set of components that talk to
each other, rather than a monolithic application.

Agreed.

5) Measurement quality.
Quite simply put: If you are going to use just one generator, you can't be
100% sure that a bad response time is the result of the system under test
being slow, or *the generator* being slow. Especially when generators are
virtualized it isn't necessarily guaranteed that that generator has all of
the resources on the physical hardware to itself. Nor can it be guaranteed
that all of the components in that system are problem-free. A faulty nic
can cause very large problems and it would be very easy to blame those on
the system under test.
I've had some pretty interesting discussions with certain senior
loadtesters over on the LR boards some time back. James Pulley for instance
believes that all test farms should have at least 5 generators:
- 2 regular load generators
- 1 generator that runs only a fraction of the load, to measure the impact
of the test load on responstimes for the first two machines.
- 1 generator out of order or in maintenance.
- 1 generator 'spare'

That's a bit on the extreme side, though. :wink:

>>>> Agreed. Faulty servers are troublesome.

It doesn't have to be the server's fault. A flaw in the test script could
overload the generator, too. Or another VM, intruding on the generator's
CPU/IO/memory/... resources. Or a backup that starts running. Or .. any of
a million things that can happen that isn't necessarily the fault of faulty
hardware. And it doesn't even have to be very blatantly obvious, either.

LR has 5 components:
- Generator (LR agents)
- Controller (controls a single test/one user simultaneously, can be
used standalone to get a live view on whatever test you're running)
- Scripting interface (vugen, desktop app)
- Analysis tool (also a desktop app)
- ALM - aka "Performance Center" - Web interface useable by multiple
people to run multiple tests simultaneously using any number of controllers
and generators. Also offers live test views and tracks all the test assets
(scripts, scenario's and what not)

The latter thing is pretty complicated stuff but it's really just a
management interface around the first three items.

I had to use a performance center setup recently. It's a nasty beast, and
at least my installation was full of bugs and caused me lots of trouble.
Mega hard to debug what was going on, too.
I'd prefer a simpler architecture with one controller and a bunch of
generators. Fewer components, less trouble.

Yeah. Quite nasty. Poorly architected too, in some ways - that applet based
GUI they're using for instance uses only a single thread for both GUI
processing and getting updates back from the server, so sometimes
keystrokes or clicks just simply drop into a black hole while it's waiting
for the server to reply. And that's just *one* of the half a million nasty
little issues waiting to bite you in the rear.
I don't envy the people who have to maintain that thing. I really don't.

Of course, it would be really nice to have all the features you mention.
It comes at a price though, and that is less other features.

That's just a matter of investing more time ..
But I really wouldn't start trying to rebuild ALM entirely either. Just the
generator/controller split would be a very good first step.

I also said "Citrix", remember? Things like that (or RDP, as another
example) tend to be quite heavy.

Anyway, to get back to your point: HP has been trying to do that for
years, with various degrees of success. Remember that "Ajax protocol" I was
talking about as an example of something with horrible performance? That
was one attempt.
The latest attempt has been TrueClient - and that's pretty much an
embedded firefox instance used as a load generation tool.

For some reason there is this idea floating around in upper management
layers that performance testing has to be cheap, done by the cheapest
people, using smart tools that do all the thinking for them. HP has been
making steps towards unifying their functional test tool with loadrunner
over the years, precisely for this reason.

Cheap: Yes, why not? The "More expensive == better" mantra is long dead.
Cheapest people: Cheapest, is not necessarily correlated with less bright;
even if that where the case, where's the knowledge transfer from the
not-so-cheap people? This is an exercise for HR and those ultimately
responsible for making the hiring decisions, after interviewing.
Smart tools: A tool is only as good as the person that interprets it, smart
or otherwise.

Personally, I don't really agree. Performance is and always has been a
complicated game, that require knowledgeable people. But that opinion isn't
shared by everyone.

A cynic might take this as a statement to satisfy one's salary requirements
- but that opinion isn't shared by everyone :wink:

I think the current discrepancy between salary levels in different countries is pretty destructive in the long run.

One of the effects of that is that it causes work to drift to cheaper countries even when the more expensive people back home are better qualified and/or more experienced. Sure, there are lots of bright people over in india; But there has been a time when all that management over here was looking at was the difference in compensation (about a factor 5, I believe) and ignored everything else, including not just experience, but things like cultural differences and simple time zone lag as well.

Indian culture has always been very much a “yes sir!” culture - while a good performance engineer is a critical thinker, capable of grasping the entire architectural picture and calling the architects a bunch of idiots if necessary.

That perhaps may be changing - time and experience will give these people the neccessary education at some point - but especially in the heydays of outsourcing that just wasn’t a requirement.

I understand your cynicism. There are plenty of not-so-qualified people over here, too.
But my point is that cost should not be the only factor. And the thought that you can ‘dumb down’ performance testers by making the tools smarter is imho really misguided.

That has long come back to haunt a lot of companies in places as simple as call centres. I totally agree with cost not being the only factor, however it is squarely “our” fault for not tailoring the message for the right audience i.e. to be able to put through that initial savings will be eroded at an exponential pace once the complexity of a particular project widens (one can argue that letting the complexity widen is a bad thing, but topic for another discussion). My approach to this is, is to demo a “smart” tool (of any sort) to a tech semi-literate and ask him to draw conclusions. The results, often speak for themselves.

I disagree on one particular point:

It’s “our” fault that we let people with zero knowledge of IT run IT shops.

It’s “our” fault that we allow management level salaries to balloon to ridiculous proportions, causing the profession to attract the greediest people in our society, rather than the best.

We’ve communicated the message often enough, but the fact is that many of the people making the decisions simply have no clue about what this job entails in the first place. Letting them get into that position in the first place is the problem. Not the content of the message.