I’m currently using gatling to perform load test to an application.
My setup launches several load injection pods in my k8s infrastructure and each load injector has the following resources:
Memory Limits: 8GB
CPU Requests: 1 CPU
JVM args:[’-server’, ‘-Xms3072m’, ‘-Xmx3072m’]
Basically my pods is being OOM Killed since page cache memory increases during all my test till we got without any memory available.
Like we can observe Heap (heap-usage.png) and Working Set is controlled during all my test (working-set-memory.png), but when looking at page cache memory the behaviour is this (cache-value.png).
I have requests tried to apply some customisations to avoid these issues:
Is your failure at 5:25 specifically? Your working-set-memory image is showing a spike during that time which makes me wonder if that is the point of failure. I have previously encountered OOM errors but it only occurs when the origin (server) slows down, hence accumulating a huge counts of requests (if you’re using a open workload model). Since in my use case each request has it’s own session, it’s own map and objects, this will result in out of memory error.
The other issue I’ve encountered before is due to typo on my script, the users never terminated even when the request has completed, hence the user’s session is kept and not removed. By looking at your active users count you might be able to tell if something similar is happening to you.
I’m not saying this is what happened, but I think it will be interesting to still generate reports through your simulation.log to observe the behavior.
E.g. From the attached graph 13:23 is when I can tell the server is not handling the load well, once it started degrading there’s no going back. Since my timeout is set to 60 seconds this will then in turn take down my Gatling tool due to OOM.