constantUsersPerSec vs foreach loop

Let’s say that I have a simple scenario like this one.

    ScenarioBuilder simpleScenario = scenario("Create show and reserve seats")
            .feed(showIdsFeeder)
            .exec(http("create-show")
                    .post("/shows")
                    .body(createShowPayload)
            )
            .foreach(randomSeatNums.asJava(), "seatNum").on(
                    exec(http("reserve-seat")
                            .patch("shows/#{showId}/seats/#{seatNum}")
                            .body(reserveSeatPayload))
            );

    setUp(simpleScenario.injectOpen(constantUsersPerSec(usersPerSec).during(duringSec))
                    .protocols(httpProtocol));

A virtual user creates a Show with 50 seats and then reserves seat by seat in a serial way.
With usersPerSec=30 and randomSeatNums list length = 50 I’m getting 1500 req/s and pretty good results, 99th = 5ms.

Of course, this is not a very realistic scenario, mostly because of this foreach loop and sequential reservations. If I change the flow to sth like this:

    ScenarioBuilder reserveSeats = scenario("Reserve seats")
            .feed(reservationsFeeder)
            .exec(http("reserve-seat")
                    .patch("shows/#{showId}/seats/#{seatNum}")
                    .body(reserveSeatPayload));

        setUp(reserveSeats.injectOpen(constantUsersPerSec(requestsPerSec).during(duringSec))))
                .protocols(httpProtocol);

and set requestsPerSec = 1500, the results are really bad, 99th = 100ms and more. I thought that this is a problem with the number of connections, because, in the first scenario, I’m using the same connection to launch 50+1 requests. Even if I change the second scenario and enable shareConnections option, the 99th percentile is still not smaller than 75 ms. Correct me if I’m wrong but in both cases, after each second the outcome is pretty much the same, 1500 requests have been launched. I have a feeling that in the second scenario, with constantUsersPerSec(1500) all 1500 requests start at the same time (more or less) and that’s why I’m getting such bad results. In the first scenario, there is a sort of queue per virtual user (foreach loop) that drastically improves the overall performance. If that’s the case then the second scenario is also not very realistic, because in real life all the requests will be distributed more randomly across a single-second time window. Am I correct or maybe I’m missing something from the big picture? I wonder how constantUsersPerSec works within a single-second window and if there is a way to tune it.

case 1: constantUsersPerSec (30) + loop

Every second, you’re:

  • opening 30 TCP sockets
  • performing 30 TLS handshakes
  • closing 30 TCP sockets

case 2: constantUsersPerSec(1500), no loop

Every second, you’re:

  • opening 1500 TCP sockets
  • performing 1500 TLS handshakes
  • closing 1500 TCP sockets

If your OS is not properly tunes, you might end up with an ephemeral ports starvation.

case 3: constantUsersPerSec(1500), no loop, shareConnections

The 1500 users are distributed over the first second, meaning 1-2 user every millisecond.
You’ll most likely end up opening ~500 TCP connections and performing as many TLS handshakes, hence bad response times.
Once all the connections are open and the system stabilizes, I would expect the response time to be similar.

The design of your test really depends on what you’re testing. Are you testing internet edges, or services behind a load balancer?

1 Like

Thanks for the clarification. I’m testing a backend service behind a gateway/load balancer. So case 2 and TLS, in general, can be removed from the equation.

You’ll most likely end up opening ~500 TCP connections

that is still quite a lot, how do you calculate this? Is it configurable?

Once all the connections are open and the system stabilizes, I would expect the response time to be similar.

Unfortunately, after 15 minutes, the 99th percentile is still the same (~75ms). I expected some performance degradation when switching from case 1 to case 3, but the difference is too big IMHO.

I have a feeling that I’m still missing something.

I think that it is possible that the server is not able to handle that many connections (~500) at the same time and the requests are queued - maybe some internal pools are too small.

that is still quite a lot, how do you calculate this?

That’s just a rule of thumb. That’s something you can see with Gatling Enterprise.

I’ve tried to reproducer your issue (java 17, Gatling 3.8.3 with Java DSL, test app with a forced response time of 5ms) and constantUsersPerSec(1500) with shareConnections just worked like a bliss.

Could you please provide a way to reproduce your issue?

Ok, I think I know what is happening. The underlying service is a very simple service described here. Long story short a cinema show is an aggregate (a persistence actor) so it’s a point of contention on the application level.

With case 3 and a feeder constructed like below:

private List<String> showIds = IntStream.range(0, howManyShows)
            .mapToObj(__ -> UUID.randomUUID().toString()).toList();

Iterator<Map<String, Object>> reservationsFeeder = showIds.stream()
        .flatMap(showId -> {
            return IntStream.range(0, maxSeats).boxed()
                    .map(seatNum -> Map.<String, Object>of("showId", showId, "seatNum", seatNum));
        })
        .iterator();

there is a 100% chance that the first 50 (maxSeats) requests will hit the same aggregate within 33 ms time window. That’s not a very realistic scenario for me, it’s too pessimistic. When I spread the load more randomly with a refactored feeder:

final int chunkSize = 5;
final AtomicInteger counter = new AtomicInteger();

Iterator<Map<String, Object>> reservationsFeeder = showIds.stream()
        .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize))
        .values().stream()
        .flatMap(showIds -> {
            log.debug("generating new batch of seats reservations for group size: " + showIds.size());
            List<Map<String, Object>> showReservations = showIds.stream().flatMap(showId -> {
                return IntStream.range(0, maxSeats).boxed()
                        .map(seatNum -> Map.<String, Object>of("showId", showId, "seatNum", seatNum));
            }).collect(Collectors.toList());

            java.util.Collections.shuffle(showReservations);

            return showReservations.stream();
        })
        .iterator();

With chunkSize=5, maxSeats per aggregate = 50 and 1500 req/s, there is a 20% chance of hitting the same aggregate within 33 ms window and the 99th percentile is around 10 ms vs 5 ms for case 1 (with a 0% chance of hitting the same aggregate).

Case 3 with adjustable “collision” chance sounds more realistic, but as usual, it depends. If I put a queue solution in front of my service with partitioning based on showId, then case 1 is perfectly fine to simulate the load.

Another dimension is that case 3 feeder for capacity testing is more problematic because I need to create a priori all shows so that reservations can be handled correctly. Otherwise, I might face the race condition with create and reserve requests.