Capacity Testing: stop load when thresholds are out of range

Hello,

I read the old topic Need a special scenario configuration for Capacity Planning benchmarks
And have similar need with some differences.

I have plans to load system slowly (same as in topic above), but I don’t have exact requirements for the pick.
I’m going to run test in pretty similar environments.

What I’d like to have is:

  • Getting metrics from jmx and analyse is threshold exceed (e.g. cpu is loaded over 80% of time, garbage collector is taking most of cpu, heap is too large), let’s say I have def isThresholdExceeded: Bool {/*some calculus here*/}
  • In scenario use this as indicator to stop whole Simulation.
  • When simulation is stopped, report max (or last) of concurrent users in simulation. = This will be a capacity for me.

What I have in my mind, is that isThresholdExceeded should be additional user, that will work each n seconds to check and somehow it should be able to stop simulation and get the numbers of users out.
But I’m stuck a bit how to write this.

Note: Gatling 2.m3a api, custom actions for not-http protocol.

Best Regards,
Dmytro

A few comments;

  1. ‘concurrent users’ is a really poor way to measure load.

Every scenario will be different, slight perturbations in the clickflow or set of used API’s will change how many ‘concurrent users’ the application can handle. Make your user sessions longer, and the backend will suddenly be able to handle more users, without actually being any faster.

The best metric is really ‘user interactions / second’ - something quite similar to, but not necessarily identical to, the number of hits / second on your backend.
And even there, you should make a distinction between different user interactions - not all clicks are born equal.

  1. “Stop whole simulation” means “not testing if the system will recover gracefully after you reach the point of overload”. If this is intentional, fine. But I suspect that it isn’t …

  2. I could talk about JMX and getting metrics from different types of servers but … let’s not.

Floris,

  1. I have custom actions, for actor-based system, and concurent-users count corresponds to important product measurement. Nothing related to hits or http.2) If you can suggest how, I’d like to when isThresholdExceeded, have some time tearDown, with doing nothing for one minute. This will allow to see nicer system trends.
  2. For me important to keep heap size in limited size, more N Mb for us means “Huston we have huge problems” this is regression, that we want to keep eye on.

This maybe important. I’m not loading live system, actually system under test lives in sbt as jvm-fork, test is executed in the same environment “in all meanings”, tests last 5-7 minutes and performance metrics that I collect are extremely important and on re-execution have low deviation. This allows to compare performance metrics and getting clear answer “are we getting worse or better” in short time from CI.

I still need a suggestions how to get what I want. :slight_smile:

Regards,
Dmytro

  1. So you mean to say “actually there is some action that we run once for each user session that corresponds to the actual load figure we want to look at”.

In a sense this if fair enough, but what I am trying to get across is that thinking about your load model in terms of ‘concurrent users’ tends to be rather misleading, especially if you use that metric to talk about your tests to other people.

  1. Not really. You might call me the resident ‘loadrunner troll’ who is mostly here to help gatling mature a little faster. I’m sure Stéphane could answer that question quite elegantly, though.

Floris, Stéphane,

If you have other vision how to measure and keep under regression actor-based system, I’m open for ideas.

Not one action, though


val scn =
        scenario("instance creator capacitor")
          .feed(instanceIdFeeder)
          .exec(createInstance(data))
          .exec(waitStatus(10 seconds, "component" -> "active"))

createInstance, waitStatus - custom actions.
as you see entity “Instance” in only created in this scenario. More Instances in product, higher system load. Instance kind-a Actor, actors keep communication with each other.


setUp(
        scn.inject(
          ramp(1 user) over (5 seconds),
          constantRate(0.5 usersPerSec) during (30 minutes)

Thus I’d like to get max of concurrent users, or actually run instances at the time when threshold is reached and do some teardown.

This test might be longer comparing to others I have, but I can always increase rate, but it will be a bit risky to overload system too quickly. Or even after some measurements on capacity I can jump in scenario having some “initial capacity” (instances already in system). I hope I can keep it lasts for 15 mins max.

P.S. How to summon Stéphane to this talk?

Regards,
Dmytro

Floris,

In a sense this if fair enough, but what I am trying to get across is that thinking about your load model in terms of ‘concurrent users’ tends to be rather misleading, especially if you use that metric to talk about your tests to other people.

Missed this from first reply.
I’ll talk to business people in a terms of “Common Instances”.

Despite fact that we want regression for this, we do need to know how much each “Instance” costs.
We have EC2 instances (vms) with hourly/monthly cost. If we want to support 1million Instances how many nodes of type c1.medium we need…

Let’s say c1.medium capacity is 10K. - I hope tests answer will be near 12K (20% is a calculus constant, to support more heavy instances).
Will need 100 vms.

Next capacity planning goes in…
Let’s pay montly for 60 vms… per our projected needs.
40 left on demand.
What if 50 and 50, 55 and 45… and so on. And thus we have preliminary budgeting for month.
And provision even more on Black Holidays.

Certainly, final calculus will be more complicated. But preliminary capacity is the most important part of this, that I should get.
The required test is the begging to get proper plans, but my vision, I’ll be able to customise it incrementally.

Something like this.

Regards,
Dmytro

Hi there,

Finally got some time to read this one (usually, when each one of you two post something here, things tends to get pretty complex, so the 2 of you together…).

@Dmytro. I’m still not sure if you should build this inside Gatling, or outside (have a kind of daemon process that can decide to kill Gatling).
Anyway, you’ll have to hack.

If you want to shutdown Gatling from an actor, you can send a termination message to the controller, just like it’s being done in SingletonFeed when the Feeder is empty.

Regarding the tear down, it depends on what you’re trying to do. If it’s not expensive and doesn’t depend on the ActorSystem, you can register a termination hook on the ActorSystem.

Cheers,

Stéphane

Stéphane,

Probably, I didn’t get your idea deeply enough.

Here is what I get

      val fakeThresholdFeeder = Range(0, 5).map(i => Map("fake" -> i)).iterator //playing around. thresholdFeeder will do real measurements
      val scn =
        scenario("instance creator capacitor")
          .feed(fakeThresholdFeeder)
          .feed(instanceIdFeeder)
          .exec(createInstance(manifest))

Of cause in sceario there are more than 5 actors and rest argues like:
https://gist.github.com/dmakhno/43bb4b06b219091b1bb3

Can you please point me, how should I handle these exceptions?

Luckily report is built as expected for 5 actor. And this seems pretty close what I need, pretty easy except this exceptions.

P.S. … Will share more, now trying to get max_concurent users from simulation via code. Seems I should get global from somewhere.

Regards,
Dmytro

Hi,

The Iterator you built is finite and only has 5 elements, so of course you get a NoSuchElementException.
What do you want? An infinite one? Then use Iterator.iterate and then map.

Hi,

Sorry, I understood you suggestion to play with feeder.

Here is my snippet that overrides fakeThresholdFeeder

object Monitor {
  def isThresholdReached: Bool = ???
}

class ThresholdFeeder extends Feeder[Int] {
  override def hasNext: Boolean = ! Monitor.isThresholdReached
  override def next(): Map[String, Int] = Map(
    "fake" -> 0
  )
}

And this allow to stop running gatling.
This is what I understand when you mentioned “just like it’s being done in SingletonFeed when the Feeder is empty.”

Having finite fake feeder, allow me to try this quickly.
Am I moving wrong way, and misused your suggestion?

Regards,
Dmytro