Help to understand how to control page cache usage during load test with gatling injectors

Hi There,

I’m currently using gatling to perform load test to an application.
My setup launches several load injection pods in my k8s infrastructure and each load injector has the following resources:
Memory Limits: 8GB
CPU Requests: 1 CPU
JVM args:[’-server’, ‘-Xms3072m’, ‘-Xmx3072m’]

Basically my pods is being OOM Killed since page cache memory increases during all my test till we got without any memory available.

Like we can observe Heap (heap-usage.png) and Working Set is controlled during all my test (working-set-memory.png), but when looking at page cache memory the behaviour is this (cache-value.png).

I have requests tried to apply some customisations to avoid these issues:

  • Disable http Caching:
    “val httpProtocol: HttpProtocolBuilder = http
    .disableCaching
    .disableWarmUp
    .disableFollowRedirect”

  • Clean session in the end of each virtual user run:
    " .exec(_.reset)"

  • Force Http Cache Flush in the end of each virtual user run:
    “.exec(flushHttpCache)”

But got the same results.

Any idea on what can I tune to avoid this? Some gatling.conf configuration?

Thanks for your help,
Nuno Marcos

Honestly, this looks like a k8s question, not a Gatling one. You should try asking in a k8s group instead.

Note: FrontLine has a good k8s support.

Thanks @Stéphane,

Just to confirm, each possible cache used and generated by Gatling lives inside jvm?

Thanks,
Nuno

Stéphane LANDELLE <slandelle@gatling.io> escreveu no dia quinta, 18/03/2021 à(s) 15:36:

Is your failure at 5:25 specifically? Your working-set-memory image is showing a spike during that time which makes me wonder if that is the point of failure. I have previously encountered OOM errors but it only occurs when the origin (server) slows down, hence accumulating a huge counts of requests (if you’re using a open workload model). Since in my use case each request has it’s own session, it’s own map and objects, this will result in out of memory error.

The other issue I’ve encountered before is due to typo on my script, the users never terminated even when the request has completed, hence the user’s session is kept and not removed. By looking at your active users count you might be able to tell if something similar is happening to you.

I’m not saying this is what happened, but I think it will be interesting to still generate reports through your simulation.log to observe the behavior.

E.g. From the attached graph 13:23 is when I can tell the server is not handling the load well, once it started degrading there’s no going back. Since my timeout is set to 60 seconds this will then in turn take down my Gatling tool due to OOM.