Dynamic Control of Concurrent Users

Hi All,

I’m wondering if there is a possibility of introducing a new way of ramping up the load in Gatling. My background before Gatling was working with Ticketmaster (I’m a .NET Software Developer) during a time when they needed to performance test their software for an Olympics event. We used another company product that had a very rich dashboard experience when running load/performance testing.

The key thing that I’ve found from that experience and coming to Gatling, is the way the load is being introduced. I see there is a way to inject users over time, and throttle the individual HTTP requests, but the driver in throttling in my experience is ‘Users’. Whenever I was running a performance test, or talking to a business about load, it would always drive from the amount of concurrent users that were being loaded on to a server farm. I was surprised that Gatling doesn’t allow me to throttle based on concurrent users because if a system begins to stress, requests will begin to queue on a server either very slowly or quickly, causing a server crash etc, it’s at this point you can lose exactly how many concurrent users your infrastructure can manage. Even after realising you can only handle a certain amount of users, you may introduce a queuing product that gives web site visitors a ticket, time in a queue etc. and again, the thinking is all about concurrent users in/on the web servers.

I would love the ability to say always keep the concurrent users at 100, and then implement a ramping injection with a new setting of say 200 max concurrent users. It would be even nicer if I can drive the ramp by concurrent users so stating the following:

Ramp up to 100 concurrent users over 5 minutes.
Ramp up to 200 concurrent users immediately.

The second feature I would like is to be able to dynamically adjust this ramp up ‘during’ a test. Once I’ve established a ramp up in code, I find that Im constantly stopping/rerunning based on a new ramp up to see where my server ‘sweet’ spot is. I would much love the ability to kick the test off, and in realtime control the load so that I can slowly increase the load by X amount and monitor server metrics, let things stabilise, maybe even tweak something, then slowly turn up the load, or even decrease the load if something happened environmentally.

Has this thing been considered before? I might be outside of the bounds of what the product is intended for. I’d love to hear your feedback.

Thanks.

Hi Martin,

What you’re referring to is a closed model.

It’s true that Gatling doesn’t emphasize such model as it usually doesn’t match any behavior you would actually experience on your system.
What you usually experience is an open model, where new users keep on coming and hitting your website, whatever the number of users that are already connected, and no matter how your website starts lagging because it can’t handle the load.

You can use simple closed model with Gatling, but nothing as elaborate as what you describe, at least for now.
There’s no plan for now, we could change mind if there was real community demand for it, but still, IMHO, using such a model is wrong. I’m surprised that’s how things are done at Ticketmaster. People I know working for ticketing companies in Europe definitively don’t do that.
I’d love to hear why you guys went with such a model. Is it because the tool you use emphasize it, or was it really intended?

Cheers,

Hi Stéphane,

Ticketmaster varies greatly from team to team, from different Agile practices and tools used, depends entirely on the individuals skill levels. The team I was in were new to performance testing and were influenced greatly by the consultants from the product they chose (am I allowed to mention the name of it here?). I didn’t know of any other teams in the US that were doing performance testing and using anything else. Our architect at the time came across Gatling and asked us to look in to to save on the crazy costs of the ‘other’ product.

When the consultants showed us the product, I remember it being highly customisable with different views on bandwidth, transactions (HTTP requests), and users. Thinking back now, I think it became the norm to think of a ‘user’ because we judged our error % rate on a successful purchasing of tickets, which would be a complex scenario. This meant that to make a successful purchase may have taken 50 HTTP requests, and we would fail the entire scenario if a single HTTP failed. Of course, looking at errors by HTTP request would look much nicer on a report, such as 0.0001% of HTTP Transactions failed, as opposed to 1% of User Scenarios failed. It was quite difficult understanding how best to sell the report to non technical people and what we found was using ‘Users’ and ‘Scenarios’ was far easier than describing HTTP transactions and reasons why some would fail and what the possible user responses would be.

So the tool itself we were using perhaps was used in other ways with different companies but thats how we came to use it. Perhaps it was wrong? I’m not sure. It was my first experience in performance testing.

I completely understand what you mean by a closed and open model and yes I do imagine that the majority of sites are of the open model you described. However in ticketing web sites, its more suited to a closed model with queueing systems, as you would experience in a real life situation of selling tickets at a stadium or something. Some of the proxies were developed with dynamic queues also that could increase capacity and decrease capacity if a part of the infrastructure become laggy, or nodes needed to be taken out of a cluster, repaired, and placed back in. You wouldn’t stop the entire system, you just decrease its capacity of how many ‘users’ you allow in the system at one point. And even for this, users were marked with cookies to count users, the systems were not monitoring how many HTTP requests/sec/minute a single user was doing. So perhaps I’m user biased from multiple influences :).

Hi Martin,

Ticketmaster varies greatly from team to team, from different Agile

practices and tools used, depends entirely on the individuals skill levels.
The team I was in were new to performance testing and were influenced
greatly by the consultants from the product they chose (am I allowed to
mention the name of it here?).

We can play riddles. Does it start with an L or an S?

I didn't know of any other teams in the US that were doing performance
testing and using anything else. Our architect at the time came across
Gatling and asked us to look in to to save on the crazy costs of the
'other' product.

I'll go with an L [?]

When the consultants showed us the product, I remember it being highly
customisable with different views on bandwidth, transactions (HTTP
requests), and users. Thinking back now, I think it became the norm to
think of a 'user' because we judged our error % rate on a successful
purchasing of tickets, which would be a complex scenario. This meant that
to make a successful purchase may have taken 50 HTTP requests, and we would
fail the entire scenario if a single HTTP failed.

What you call transaction we call group
<http://gatling.io/docs/2.1.5/general/scenario.html?highlight=group#groups-definition>
in Gatling.
In order to fail the test, you use a assertions
<http://gatling.io/docs/2.1.5/general/assertions.html?highlight=assertion>.

Of course, looking at errors by HTTP request would look much nicer on a
report, such as 0.0001% of HTTP Transactions failed, as opposed to 1% of
User Scenarios failed. It was quite difficult understanding how best to
sell the report to non technical people and what we found was using 'Users'
and 'Scenarios' was far easier than describing HTTP transactions and
reasons why some would fail and what the possible user responses would be.

So the tool itself we were using perhaps was used in other ways with
different companies but thats how we came to use it. Perhaps it was wrong?
I'm not sure. It was my first experience in performance testing.

I guess both make sense (and you can have both at the same time):

   - business people would want to know how many sales the system could
   perform
   - IT people would want to know where to investigate failures and slow
   response times

I completely understand what you mean by a closed and open model and yes I
do imagine that the majority of sites are of the open model you described.
However in ticketing web sites, its more suited to a closed model with
queueing systems, as you would experience in a real life situation of
selling tickets at a stadium or something. Some of the proxies were
developed with dynamic queues also that could increase capacity and
decrease capacity if a part of the infrastructure become laggy, or nodes
needed to be taken out of a cluster, repaired, and placed back in. You
wouldn't stop the entire system, you just decrease its capacity of how many
'users' you allow in the system at one point. And even for this, users were
marked with cookies to count users, the systems were not monitoring how
many HTTP requests/sec/minute a single user was doing. So perhaps I'm user
biased from multiple influences :).

That's interesting. I think your approach is quite unique. I'm aware of a
ticketing company in Europe, and I'm sure they don't work think way.
Closed model typical use case is call centers.

So you're right, as your system uses a closed model, that's what you have
to use for load testing.

With Gatling, for now, you'll have to "recycle" the virtual users: wrap
your scenario into a loop, and first clear all the virtual user's state
(but the loop index). Perfectly doable.
We might want to add first class support for closed model if there was
enough community demand for it, eg TicketMaster adopting Gatling :wink:

Cheers,

*Stéphane Landelle*
*Lead developer*
slandelle@gatling.io

328.png