Request/Response logging misses last few requests?

Hi everyone. Thanks in advance for any help you can give.

I’m running a pretty simple gatling scenario - it sends a single request to an HTTP REST API endpoint. That request contains a unique UUID for each request (I use #{randomUuid()} to generate it). The test runs for 7 minutes then ends.

I’ve enabled logging of requests and responses to a file via logback.xml.

When the test run completes, the log contains X requests; however, the backend has always received X+Y (where Y is between 5 and 35 requests).

When I capture the uniquely-generated requestid’s from both the backend and the gatling log, I find that it’s the last Y requests that weren’t logged (it feels like flushing the log to disk is asynchronous and that the JVM is shutting down before doing that.

This prevents my automated tests from being able to compare the logs to the backend to validate functional correctness of all requests.

Is there any way to guarantee the log file gets ALL requests, including the last few sent by the scenario?

Hi,

What’s your Gatling version?

I’ve enabled logging of requests and responses to a file via logback.xml.

Indeed, writes are by default buffered but they are supposed to be flushed on run termination, cf gatling/Gatling.scala at main · gatling/gatling · GitHub

If really needed, you can edit your logback file and disabled buffering. But that might harm performance.

But then, please understand that this log is only meant for debugging and that neither DEBUG nor TRACE levels should be enabled when running a proper load test.

Do you let your run terminate properly? Do you use something like maxDuration to force it to stop early?

Then, do you have the expected number in the HTML reports?

Finally, is there any way to reproduce the behavior you’re describing, as required here?

Hi, slandelle - thanks for taking the time to respond. Apologies for not including a good reproduction on the first post - I assumed I was asking about a “stupid user” problem, not a potential defect. Keep in mind that I’m a novice/new Gatling user, so it’s really possible I’m just doing something obviously incorrect.

Answers to your specific questions:

  1. I’m using 3.9.0.
  2. The precise code is below. In short, I use injectOpen with a rampUsers followed by a constantUsersPerSec. I wrap that in a throttle with a reachRps followed by a holdFor. I allow this to run to completion. (where “completion” means that I get the test run summary output on stdout and the process terminates itself).
  3. The HTML reports return an incorrect number. It usually matches the number of requests in the log, but not always. It never matches the number of requests received by the backend (the backend is always slightly higher).

I’ve stripped a few things out of the reproduction below, but if you replace “XXXXXXX” in the reproduction with my endpoint, this stripped-down version does reproduce the problem (I’ve verified that with 4 runs, all of which “failed” in this way).

package aws.wwcs.sa;

import java.util.*;

import io.gatling.javaapi.core.*;
import io.gatling.javaapi.http.*;
import io.gatling.javaapi.jdbc.*;

import static io.gatling.javaapi.core.CoreDsl.*;
import static io.gatling.javaapi.http.HttpDsl.*;
import static io.gatling.javaapi.jdbc.JdbcDsl.*;

public class RecordedSimulation extends Simulation {
  private String runid = Long.toString(System.currentTimeMillis());

  private HttpProtocolBuilder httpProtocol = http
    .baseUrl("https://XXXXXXXX")
    .inferHtmlResources()
    .acceptHeader("*/*")
    .acceptEncodingHeader("gzip, deflate")
    .contentTypeHeader("text/plain")
    .userAgentHeader("PostmanRuntime/7.30.0");
  
  private Map<CharSequence, String> headers_0 = Map.of("Postman-Token", "18607c1a-f782-458f-a01c-e5bd60df31ba");

  private ScenarioBuilder scn = scenario("RecordedSimulation")
    .exec(
      http("place_order")
        .post("/Prod/order")
        .headers(headers_0)
        .body(StringBody("{\"now\": \"#{currentTimeMillis()}\", \"run\": \""+runid+"\", \"requestid\": \"#{randomUuid()}\" }"))
    );

  {
      System.out.println("Runid: "+runid);
	  setUp(
        scn.injectOpen(
                rampUsersPerSec(10).to(1000).during(120),
                constantUsersPerSec(1000).during(300)
                )
          .throttle(
            reachRps(120).in(120),
            holdFor(300)
        )
      ).protocols(httpProtocol);
  }
}

P.S. - My requirement is that I be able to prove that precisely all requests sent by the load test were handled by the backend (underneath the API endpoint there’s a queue that async’s the work - I’m checking to make sure all the async work happened in the backend - so verifying the responses back to Gatling is insufficient - that would just prove everything got posted to the queue).

To do that, I need some of the data elements that were sent in the requests (in this simplified example, I need the requestid’s that were actually sent).

If there’s a better way to achieve that goal in gatling, I’m all ears to learn the better way!

Do you let your run terminate properly? Do you use something like maxDuration to force it to stop early?

From throttle documentation:

Gatling will automatically interrupt your test at the end of the throttle, just like it does with maxDuration.

So what happens is that your test gets immediately interrupted after 420 seconds. It’s not a graceful interrupt. As a result, it’s expected that some requests might be missing because their responses have not been received and processed at the time of the interrupt.

If you want a graceful shutdown, don’t use maxDuration or throttle.
Instead of throttle, design your injection profile to meet your expected throughput.

        scn.injectOpen(
                rampUsersPerSec(0).to(120).during(120),
                constantUsersPerSec(120).during(300)
                )

Note: I regret having ever implemented throttle.

1 Like

I’m aiming for a target in requests per second, not users per second. “Throttle” was the only way I saw to specify and maintain a request rate rather than a user rate.

Or am I misinterpreting the documentation? Are those two synonymous in Gatling speak?

I would anticipate some variability in the requests per second produced by each user - primarily introduced by variability in response time from the server. So if I hold the user rate constant, I would expect to see the request rate vary (which isn’t representative of this particular server’s load - but I definitely understand why it IS representative of many other scenarios…probably most others).

I’ll give it a shot with this approach and see how much variability I see in rps with a constant ups.

Note: We’ve all implemented features we wish we hadn’t. If throttle is your only one, you’re batting very well.

Let’s elaborate on how throttle works (Gatling - Simulation).

It’s a limit to cap the normally generated throughput.
It’s the equivalent of JMeter’s Ultimate ThreadGroup.
It pushes all the extra traffic in an unbounded queue.
It doesn’t include any notion of fairness among virtual users nor requests.

As a result, it doesn’t make any sense unless your scenario only contains 1 single request. And still, you have to make sure your injection profile generates enough load to reach your target, and not too mach to not fill the unbounded queue and cause an OutOfMemoryError.

And in the case of a 1 request per scenario (1 user = 1 request), you might as well shape with your injection profile.

I’ll give it a shot with this approach and see how much variability I see in rps with a constant ups.

With one request per scenario, absolutely none.