OOM using 3.3.1

Hello,
I uses gatling 3.3.1 to do perftest with tcv feeder batch mode. when I run the test, the memory keeps going up until 100%。

scala: object Recall {
val format = new SimpleDateFormat(“yyyy-MM-dd”)
val rnd = new Random
var flag = rnd.nextInt(5)
//val feeder1=jdbcFeeder(“jdbc:mysql://172.28.148.31:3306/test”, “root”, “”, “select url, cookie, body from test.test_data limit 1000000;”).circular
val feeder1=tsv("/cfs/mnt/recsdet/tm/iperf_test/url/test_data_0.csv").batch(20000).circular
val recall = feed(feeder1)
.exec(http(“Search”)
.get("${url}&forcebot=1")
//.body(StringBody("""${body}"""))
.header(“Cookie”, “${cookie}”)
.header(“Content-Type”,“application/x-www-form-urlencoded”)
.check(status.is(200)))
}

val httpProtocol = http
.baseUrl(“http://mjq.jd.local”)
.acceptHeader(“application/json;q=0.9,/;q=0.8”)
.acceptEncodingHeader(“gzip, deflate”)
.acceptLanguageHeader(“en-US,en;q=0.5”)
.doNotTrackHeader(“1”)
.disableFollowRedirect
.userAgentHeader(“Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0”)

val recall = scenario(“recall”).exec(Recall.recall)

setUp(
recall.inject(constantUsersPerSec(3000) during (300000 seconds))
).protocols(httpProtocol)
}

the JVM configures is :-XX:+UseG1GC -Xmx8G -Xms8G -XX:MaxDirectMemorySize=8G -XX:ParallelGCThreads=10 -XX:MaxGCPauseMillis=350 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1ReservePercent=15 -XX:ConcGCThreads=8 -XX:G1HeapRegionSize=10m -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv6Addresses=false -XX:+OptimizeStringConcat -XX:+ParallelRefProcEnabled

version is :jdk1.8.0_221

is something wrong?

You would have to provide a heap dump.
My best guess is that your system under load can’t withstand such load and virtual users just pile up in memory.

I tested two scenario. the first one is 2000 virual users. the memory is used only 3% and very stable , cpu ~30%。
the second scenario is 3k virual users. the cpu is stable used only 30%+, but the memory keeps growing utill 100% then killed by os.

here is the information:
NMT INFO(memory up to 70%+)

[admin@cfsslave-acf322bf gatling_stt]$ /export/servers/jdk11.0.2/bin/jcmd 85542 VM.native_memory scale=MB
85542:

Native Memory Tracking:

Total: reserved=5756MB, committed=4450MB

  • Java Heap (reserved=4096MB, committed=4096MB)
    (mmap: reserved=4096MB, committed=4096MB)

  • Class (reserved=1067MB, committed=50MB)
    (classes #6712)
    ( instance classes #6308, array classes #404)
    (malloc=1MB #20372)
    (mmap: reserved=1066MB, committed=49MB)
    ( Metadata: )
    ( reserved=42MB, committed=41MB)
    ( used=39MB)
    ( free=2MB)
    ( waste=0MB =0.00%)
    ( Class space:)
    ( reserved=1024MB, committed=8MB)
    ( used=6MB)
    ( free=2MB)
    ( waste=0MB =0.00%)

  • Thread (reserved=77MB, committed=7MB)
    (thread #76)
    (stack: reserved=76MB, committed=6MB)

  • Code (reserved=243MB, committed=24MB)
    (malloc=1MB #6892)
    (mmap: reserved=242MB, committed=23MB)

  • GC (reserved=225MB, committed=225MB)
    (malloc=41MB #34097)
    (mmap: reserved=184MB, committed=184MB)

  • Compiler (reserved=1MB, committed=1MB)
    (malloc=1MB #890)

  • Internal (reserved=2MB, committed=2MB)
    (malloc=2MB #3873)

  • Other (reserved=33MB, committed=33MB)
    (malloc=33MB #82)

  • Symbol (reserved=10MB, committed=10MB)
    (malloc=8MB #72510)
    (arena=2MB #1)

  • Native Memory Tracking (reserved=3MB, committed=3MB)
    (tracking overhead=2MB)

the jmap heap info sees attach.
the pmap info sees attach

jmap_heap.txt (1.55 KB)

pmap.txt (35.6 KB)

I did another test. I run 2 instances gatling in a docker with 2k virual users. The CPU is 50%+, MEM only used 6%+, and stable.
I run 1 gatling instance in a docker with 3k virual users. the memory keeps grouping until 100%.

The jmap is useless, to have to specify the -dump option: https://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html

Anyway, there’s a very good chance your virtual users are piling up in memory because you’re hitting a bottleneck : either the target system can’t withstand such load or you’re saturating the bandwidth.

here is the dump info。
https://drive.google.com/file/d/1xEP25Y0PHJlo2TDF-rE_0XxNYIjx2PsK/view?usp=sharing

I see the output “active” is increaing all the time.

this is the reason that the memory keeps increaing?

Absolutely. Your virtual users are piling up in memory (~1,260,000) because you’re hitting a bottleneck:

  • your injector machine is saturated: 100% CPU or bandwidth
  • your system under load can’t keep up with such load

it is very stange. I run two gatling instance in on docker with 2k virtual users. the memory is very stable.
but I run one gatling instance in one docker with 3k virtual users. the memory keeps growing until 100%.

It’s not “strange”, it’s exactly what I described.

And from what I saw in your heap dump, it’s not 2k virtual users, it’s 2k new virtual users per second.

But I system under test received 4K qps.

OK, back to basics!

As per this group’s terms:

Provide a Short, Self Contained, Correct (Compilable), Example (see http://sscce.org/)

I am not get you point.
the system under test works well.
the injector machine is not saturated. CPU used only 35%。

I change my scala script “constantUsersPerSec” to “constantConcurrentUsers” . I run one gatling instance in a docker only can send 3k qps.
But I run two gatling instances in a docker can send 6k qps totally.
whey one gatling instance can only send 3k not 6k?
is there somethings reach bottlerneck?

It’s counterproductive to play riddles without being able to reproduce your problem.
And any way, there are more than 1 million virtual users stuck in the heap dump you provided and that’s something only you can figure out and fix.

are you in Bejing? I am working at Beijing.

Another question. Has the feeder with batch.circular mode queue size config?

This mode with big file(3G+), the stack show a lot of threading in waiting status.

“GatlingSystem-akka.actor.default-dispatcher-25” #39 prio=5 os_prio=0 tid=0x0000000004bb4800 nid=0x82b4b waiting on condition [0x00007f3bb7822000]
java.lang.Thread.State: WAITING (parking)

https://github.com/gatling/gatling/issues/3944