Hi,
The problem is OOM error when using a very large input file for feeder.
Part of the code is:
val scn = scenario(“Capacity”).during(1 minutes){
feed(csv(“input.csv”).circular)
.exec(http(“request_1”)
.get("${rics}?format=json"))
}
The input.csv file size is 1.5Gb (when I used file of 100Mb there was no problem).
After starting the simulation, this occurs:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid58882.hprof …
Heap dump file created [6336544529 bytes in 145.833 secs]
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
…
I tried to setting in gatling.sh: -Xms2512M -Xmx4512M -Xmn200M, instead the default.
How much Java memory should I try to set in order to make this work (the machine has 40GB free memory)?
Is there any other workaround? The goal is to use all the 64 million inputs from the feeder.
Thanks
Honestly, dunno. I guess that’s a matter of trial and error.
What’s for sure is that built-in feeders have to load the full collection in memory in order to be able to use the random() strategy. Still, be use a Vector instead of an array so we don’t need consecutive memory.
Which version do you use?
I use 2.0.0-RC5.
I’ve configured Xmx to 12GB, seemed to work. Will try to fine tune and to test on other machines.
Mmm, that’s really a lot. Would you mind compressing your file and share it somewhere, please? There’s probably room for improvement.
Question that might sound weird: how many columns does your csv file have?
I can’t share this data, however I can share the file structure:
It consists of 64M rows, each row is about 20 characters (actually URIs).
And there is just one column.
I just investigated this, and sadly there’s nothing we ca do anything about it.
All this overhead is caused by:
- the JVM’s UTF-16 internal encoding usage: 2 bytes/char where you might have used only 1 in your file (ASCII, or UTF-8 where char falls within ASCII range)
- objects headers: char[], String and String[]
The only workaround would be to stream from the file during the Simulation, and that might hurt performance, so if you can afford the memory usage, you’ll have to cope with it.
Cheers,
Stéphane
Why not split up the file and run it from multiple machines? You can aggregate the results later easily with gatling.