Hiccups in throughput during GC Gatling-2.1.4

Hi Guys,

I am currently using Gatling-2.1.4 and am seeing throughput hiccups in my test that I can't account for when looking at the SUT resources monitoring.
In these hiccups it seems there is a congestion of requests for almost a minute and the a burst of requests, as if all congested requests are released at once.

I decided to monitor garbage collection during the test and found that these hiccups seem to occur when ConcurrentMarkSweep is started:

2015-03-11T13:20:29.189+0100: 12240.530: [GC2015-03-11T13:20:29.189+0100: 12240.530: [ParNew: 1699897K->15910K(1887488K), 0.0102969 secs] 2483657K->800713K(2936064K), 0.0105310 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-03-11T13:20:46.831+0100: 12258.174: [GC2015-03-11T13:20:46.831+0100: 12258.174: [ParNew: 1693614K->19629K(1887488K), 0.0108334 secs] 2478417K->805489K(2936064K), 0.0110423 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-03-11T13:21:04.347+0100: 12275.690: [GC2015-03-11T13:21:04.347+0100: 12275.690: [ParNew: 1697453K->23053K(1887488K), 0.0114411 secs] 2483313K->809948K(2936064K), 0.0116610 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-03-11T13:21:04.347+0100: 12275.702: [GC [1 CMS-initial-mark: 786894K(1048576K)] 812213K(2936064K), 0.0181288 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2015-03-11T13:21:04.378+0100: 12275.721: [CMS-concurrent-mark-start]
2015-03-11T13:21:05.316+0100: 12276.656: [CMS-concurrent-mark: 0.935/0.935 secs] [Times: user=3.99 sys=0.03, real=0.94 secs]
2015-03-11T13:21:05.316+0100: 12276.656: [CMS-concurrent-preclean-start]
2015-03-11T13:21:05.378+0100: 12276.730: [CMS-concurrent-preclean: 0.074/0.074 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2015-03-11T13:21:05.378+0100: 12276.730: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2015-03-11T13:21:10.410+0100: 12281.754: [CMS-concurrent-abortable-preclean: 5.021/5.024 secs] [Times: user=6.00 sys=0.17, real=5.03 secs]
2015-03-11T13:21:10.410+0100: 12281.754: [GC[YG occupancy: 611020 K (1887488 K)]2015-03-11T13:21:10.410+0100: 12281.754: [Rescan (parallel) , 0.0733107 secs]2015-03-11T13:21:10.488+0100: 12281.828: [weak refs processing, 0.6784310 secs]2015-03-11T13:21:11.160+0100: 12282.507: [class unloading, 0.0054061 secs]2015-03-11T13:21:11.160+0100: 12282.512: [scrub symbol table, 0.0052379 secs]2015-03-11T13:21:11.176+0100: 12282.518: [scrub string table, 0.0006159 secs] [1 CMS-remark: 786894K(1048576K)] 1397914K(2936064K), 0.8182159 secs] [Times: user=1.38 sys=0.00, real=0.81 secs]
2015-03-11T13:21:11.222+0100: 12282.573: [CMS-concurrent-sweep-start]
2015-03-11T13:21:11.613+0100: 12282.958: [CMS-concurrent-sweep: 0.385/0.385 secs] [Times: user=1.80 sys=0.16, real=0.39 secs]
2015-03-11T13:21:11.613+0100: 12282.958: [CMS-concurrent-reset-start]
2015-03-11T13:21:11.629+0100: 12282.971: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2015-03-11T13:21:15.098+0100: 12286.439: [GC2015-03-11T13:21:15.098+0100: 12286.440: [ParNew: 1700877K->36521K(1887488K), 0.0208330 secs] 2421269K->758600K(3088140K), 0.0210853 secs] [Times: user=0.20 sys=0.00, real=0.01 secs]
2015-03-11T13:21:21.395+0100: 12292.736: [GC2015-03-11T13:21:21.395+0100: 12292.736: [ParNew: 1714345K->50981K(1887488K), 0.0441100 secs] 2436424K->807146K(3088140K), 0.0443489 secs] [Times: user=0.20 sys=0.00, real=0.03 secs]
2015-03-11T13:21:28.489+0100: 12299.844: [GC2015-03-11T13:21:28.489+0100: 12299.844: [ParNew: 1728805K->69282K(1887488K), 0.0389792 secs] 2484970K->863283K(3088140K), 0.0392125 secs] [Times: user=0.42 sys=0.00, real=0.05 secs]
2015-03-11T13:21:34.223+0100: 12305.564: [GC2015-03-11T13:21:34.223+0100: 12305.564: [ParNew: 1747106K->43411K(1887488K), 0.0513797 secs] 2541107K->880284K(3088140K), 0.0516091 secs] [Times: user=0.41 sys=0.01, real=0.05 secs]
2015-03-11T13:21:41.521+0100: 12312.861: [GC2015-03-11T13:21:41.521+0100: 12312.861: [ParNew: 1721235K->69135K(1887488K), 0.0532480 secs] 2558108K->940436K(3088140K), 0.0534751 secs] [Times: user=0.19 sys=0.06, real=0.05 secs]
2015-03-11T13:21:47.896+0100: 12319.236: [GC2015-03-11T13:21:47.896+0100: 12319.236: [ParNew: 1746959K->70554K(1887488K), 0.0475770 secs] 2618260K->985707K(3088140K), 0.0478112 secs] [Times: user=0.36 sys=0.06, real=0.05 secs]
2015-03-11T13:21:47.943+0100: 12319.285: [GC [1 CMS-initial-mark: 915152K(1200652K)] 1014812K(3088140K), 0.0387782 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2015-03-11T13:21:47.974+0100: 12319.324: [CMS-concurrent-mark-start]
2015-03-11T13:21:48.568+0100: 12319.910: [CMS-concurrent-mark: 0.587/0.587 secs] [Times: user=2.97 sys=0.00, real=0.59 secs]
2015-03-11T13:21:48.568+0100: 12319.910: [CMS-concurrent-preclean-start]
2015-03-11T13:21:48.583+0100: 12319.929: [CMS-concurrent-preclean: 0.019/0.019 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2015-03-11T13:21:48.583+0100: 12319.929: [CMS-concurrent-abortable-preclean-start]
2015-03-11T13:21:50.865+0100: 12322.209: [CMS-concurrent-abortable-preclean: 2.279/2.280 secs] [Times: user=4.56 sys=0.00, real=2.28 secs]
2015-03-11T13:21:50.865+0100: 12322.210: [GC[YG occupancy: 924339 K (1887488 K)]2015-03-11T13:21:50.865+0100: 12322.210: [Rescan (parallel) , 0.0596033 secs]2015-03-11T13:21:50.927+0100: 12322.269: [weak refs processing, 0.1577407 secs]2015-03-11T13:21:51.083+0100: 12322.427: [class unloading, 0.0052564 secs]2015-03-11T13:21:51.083+0100: 12322.432: [scrub symbol table, 0.0064488 secs]2015-03-11T13:21:51.083+0100: 12322.439: [scrub string table, 0.0008509 secs] [1 CMS-remark: 915152K(1200652K)] 1839492K(3088140K), 0.2502080 secs] [Times: user=0.98 sys=0.00, real=0.25 secs]
2015-03-11T13:21:51.115+0100: 12322.460: [CMS-concurrent-sweep-start]
2015-03-11T13:21:51.880+0100: 12323.235: [CMS-concurrent-sweep: 0.775/0.775 secs] [Times: user=1.72 sys=0.03, real=0.77 secs]
2015-03-11T13:21:51.880+0100: 12323.236: [CMS-concurrent-reset-start]
2015-03-11T13:21:51.896+0100: 12323.247: [CMS-concurrent-reset: 0.011/0.011 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2015-03-11T13:21:53.865+0100: 12325.206: [GC2015-03-11T13:21:53.865+0100: 12325.206: [ParNew: 1748378K->49194K(1887488K), 0.0389598 secs] 2391045K->729750K(3088140K), 0.0391217 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
2015-03-11T13:21:59.381+0100: 12330.721: [GC2015-03-11T13:21:59.381+0100: 12330.721: [ParNew: 1727018K->48462K(1887488K), 0.0337259 secs] 2407574K->765085K(3088140K), 0.0338767 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
2015-03-11T13:22:04.084+0100: 12335.424: [GC2015-03-11T13:22:04.084+0100: 12335.424: [ParNew: 1726286K->38095K(1887488K), 0.0304618 secs] 2442909K->787772K(3088140K), 0.0306197 secs] [Times: user=0.19 sys=0.00, real=0.02 secs]
2015-03-11T13:22:09.709+0100: 12341.050: [GC2015-03-11T13:22:09.709+0100: 12341.050: [ParNew: 1715919K->58859K(1887488K), 0.0377903 secs] 2465596K->836668K(3088140K), 0.0380217 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
2015-03-11T13:22:15.819+0100: 12347.160: [GC2015-03-11T13:22:15.819+0100: 12347.160: [ParNew: 1736683K->46222K(1887488K), 0.0304851 secs] 2514492K->857827K(3088140K), 0.0306272 secs] [Times: user=0.14 sys=0.01, real=0.03 secs]
2015-03-11T13:22:21.585+0100: 12352.926: [GC2015-03-11T13:22:21.585+0100: 12352.926: [ParNew: 1724046K->52809K(1887488K), 0.0316830 secs] 2535651K->901235K(3088140K), 0.0318236 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
2015-03-11T13:22:27.569+0100: 12358.910: [GC2015-03-11T13:22:27.569+0100: 12358.910: [ParNew: 1730633K->54914K(1887488K), 0.0482743 secs] 2579059K->937662K(3088140K), 0.0485085 secs] [Times: user=0.36 sys=0.03, real=0.05 secs]
2015-03-11T13:22:33.148+0100: 12364.488: [GC2015-03-11T13:22:33.148+0100: 12364.488: [ParNew: 1732738K->43311K(1887488K), 0.0415049 secs] 2615486K->961837K(3088140K), 0.0417371 secs] [Times: user=0.17 sys=0.00, real=0.03 secs]
2015-03-11T13:22:33.179+0100: 12364.530: [GC [1 CMS-initial-mark: 918525K(1200652K)] 994134K(3088140K), 0.0358454 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]
2015-03-11T13:22:33.226+0100: 12364.566: [CMS-concurrent-mark-start]
2015-03-11T13:22:34.070+0100: 12365.410: [CMS-concurrent-mark: 0.844/0.844 secs] [Times: user=6.28 sys=0.00, real=0.84 secs]
2015-03-11T13:22:34.070+0100: 12365.410: [CMS-concurrent-preclean-start]
2015-03-11T13:22:34.070+0100: 12365.421: [CMS-concurrent-preclean: 0.011/0.011 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-03-11T13:22:34.070+0100: 12365.422: [CMS-concurrent-abortable-preclean-start]
2015-03-11T13:22:36.085+0100: 12367.440: [GC2015-03-11T13:22:36.085+0100: 12367.441: [ParNew: 1721135K->209664K(1887488K), 0.3272120 secs] 2639661K->1497292K(3321228K), 0.3274374 secs] [Times: user=1.08 sys=0.78, real=0.33 secs]
CMS: abort preclean due to time 2015-03-11T13:22:39.148+0100: 12370.502: [CMS-concurrent-abortable-preclean: 4.683/5.080 secs] [Times: user=28.69 sys=3.08, real=5.08 secs]
2015-03-11T13:22:39.148+0100: 12370.502: [GC[YG occupancy: 988695 K (1887488 K)]2015-03-11T13:22:39.148+0100: 12370.502: [Rescan (parallel) , 0.1891079 secs]2015-03-11T13:22:39.351+0100: 12370.692: [weak refs processing, 0.0292836 secs]2015-03-11T13:22:39.367+0100: 12370.721: [class unloading, 0.0055186 secs]2015-03-11T13:22:39.382+0100: 12370.726: [scrub symbol table, 0.0056457 secs]2015-03-11T13:22:39.382+0100: 12370.732: [scrub string table, 0.0007735 secs] [1 CMS-remark: 1287628K(1433740K)] 2276324K(3321228K), 0.2349840 secs] [Times: user=1.95 sys=0.00, real=0.23 secs]
2015-03-11T13:22:39.382+0100: 12370.738: [CMS-concurrent-sweep-start]
2015-03-11T13:22:40.210+0100: 12371.560: [CMS-concurrent-sweep: 0.822/0.822 secs] [Times: user=2.19 sys=0.13, real=0.83 secs]
2015-03-11T13:22:40.210+0100: 12371.562: [CMS-concurrent-reset-start]
2015-03-11T13:22:40.226+0100: 12371.575: [CMS-concurrent-reset: 0.014/0.014 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2015-03-11T13:22:42.679+0100: 12374.025: [GC2015-03-11T13:22:42.679+0100: 12374.025: [ParNew: 1887488K->203573K(1887488K), 0.1599230 secs] 3020180K->1470277K(3775312K), 0.1601034 secs] [Times: user=0.98 sys=0.50, real=0.16 secs]
2015-03-11T13:22:48.789+0100: 12380.141: [GC2015-03-11T13:22:48.789+0100: 12380.141: [ParNew: 1881397K->209664K(1887488K), 0.1307353 secs] 3148101K->1571994K(3775312K), 0.1309102 secs] [Times: user=1.17 sys=0.06, real=0.14 secs]
2015-03-11T13:22:55.914+0100: 12387.267: [GC2015-03-11T13:22:55.914+0100: 12387.267: [ParNew: 1887488K->209664K(1887488K), 0.1987091 secs] 3249818K->1745423K(3775312K), 0.1989306 secs] [Times: user=1.17 sys=0.64, real=0.20 secs]
2015-03-11T13:22:56.117+0100: 12387.466: [GC [1 CMS-initial-mark: 1535759K(1887824K)] 1748583K(3775312K), 0.0456396 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]
2015-03-11T13:22:56.164+0100: 12387.512: [CMS-concurrent-mark-start]
2015-03-11T13:22:56.617+0100: 12387.962: [CMS-concurrent-mark: 0.451/0.451 secs] [Times: user=2.97 sys=0.11, real=0.45 secs]
2015-03-11T13:22:56.617+0100: 12387.962: [CMS-concurrent-preclean-start]
2015-03-11T13:22:56.633+0100: 12387.987: [CMS-concurrent-preclean: 0.023/0.024 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2015-03-11T13:22:56.633+0100: 12387.987: [CMS-concurrent-abortable-preclean-start]
2015-03-11T13:23:00.164+0100: 12391.514: [CMS-concurrent-abortable-preclean: 3.522/3.527 secs] [Times: user=11.63 sys=0.45, real=3.53 secs]
2015-03-11T13:23:00.164+0100: 12391.515: [GC[YG occupancy: 1082511 K (1887488 K)]2015-03-11T13:23:00.164+0100: 12391.515: [Rescan (parallel) , 0.0675268 secs]2015-03-11T13:23:00.242+0100: 12391.583: [weak refs processing, 0.0672919 secs]2015-03-11T13:23:00.305+0100: 12391.650: [class unloading, 0.0056284 secs]2015-03-11T13:23:00.305+0100: 12391.656: [scrub symbol table, 0.0072029 secs]2015-03-11T13:23:00.321+0100: 12391.663: [scrub string table, 0.0008552 secs] [1 CMS-remark: 1535759K(1887824K)] 2618271K(3775312K), 0.1582888 secs] [Times: user=0.88 sys=0.00, real=0.16 secs]
2015-03-11T13:23:00.321+0100: 12391.673: [CMS-concurrent-sweep-start]
2015-03-11T13:23:01.336+0100: 12392.689: [CMS-concurrent-sweep: 1.015/1.015 secs] [Times: user=2.53 sys=0.09, real=1.01 secs]
2015-03-11T13:23:01.336+0100: 12392.689: [CMS-concurrent-reset-start]
2015-03-11T13:23:01.352+0100: 12392.698: [CMS-concurrent-reset: 0.009/0.009 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2015-03-11T13:23:02.852+0100: 12394.200: [GC2015-03-11T13:23:02.852+0100: 12394.200: [ParNew: 1887488K->164895K(1887488K), 0.0250032 secs] 2336075K->618063K(3775312K), 0.0252421 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
2015-03-11T13:23:09.040+0100: 12400.383: [GC2015-03-11T13:23:09.040+0100: 12400.384: [ParNew: 1842719K->153674K(1887488K), 0.0539350 secs] 2295887K->668238K(3775312K), 0.0541459 secs] [Times: user=0.61 sys=0.00, real=0.05 secs]
2015-03-11T13:23:25.603+0100: 12416.945: [GC2015-03-11T13:23:25.603+0100: 12416.945: [ParNew: 1831492K->108694K(1887488K), 0.0577164 secs] 2346056K->730536K(3775312K), 0.0579931 secs] [Times: user=0.33 sys=0.00, real=0.05 secs]

My maven-gatling-pluging jvmArgs look like this:

<jvmArg>-Xms3g</jvmArg>
<jvmArg>-Xmx6g</jvmArg>
<jvmArg>-XX:NewSize=2g</jvmArg>
<jvmArg>-XX:+HeapDumpOnOutOfMemoryError</jvmArg>
<jvmArg>-XX:+AggressiveOpts</jvmArg>
<jvmArg>-XX:+OptimizeStringConcat</jvmArg>
<jvmArg>-XX:+UseFastAccessorMethods</jvmArg>
<jvmArg>-XX:+UseParNewGC</jvmArg>
<jvmArg>-XX:+UseConcMarkSweepGC</jvmArg>
<jvmArg>-XX:+CMSParallelRemarkEnabled</jvmArg>
<jvmArg>-XX:+CMSClassUnloadingEnabled</jvmArg>
<jvmArg>-XX:CMSInitiatingOccupancyFraction=75</jvmArg>
<jvmArg>-XX:+UseCMSInitiatingOccupancyOnly</jvmArg>
<jvmArg>-XX:SurvivorRatio=8</jvmArg>
<jvmArg>-XX:MaxTenuringThreshold=1</jvmArg>
<jvmArg>-XX:+PrintGCDetails</jvmArg>
<jvmArg>-XX:+PrintGCDateStamps</jvmArg>
<jvmArg>-Xloggc:gc.log</jvmArg>

In my Gatling.conf:

allowPoolingConnections = false
allowPoolingSslConnections = false

My scenario is a open system simulation, soap calls only scenario:

  .exitBlockOnFail(
      randomSwitch (
         2.0 -> exec( Requests.AcceptPassenger_1A ),
         1.0 -> exec( Requests.UpdatePassengerInformation_1A ),
         6.0 -> exec( Requests.GetLocalDateTime_1A ),
         10.0 -> randomSwitch (
            90.0 -> exec( Requests.GetSeatMap_1A ),
            8.0 -> exec( Requests.GetSeatMap_COD ),
            2.0 -> exec( Requests.GetSeatMap_GAT )
         ),
         2.0 -> randomSwitch (
            90.0 -> exec( Requests.ListDocument_1A ),
            8.0 -> exec( Requests.ListDocument_COD ),
            2.0 -> exec( Requests.ListDocument_GAT )
         ),
         1.0 -> exec( Requests.ProvideBagTag_reprint_1A ),
         12.0 -> exec( Requests.ListOperationalEligibility_1A ),
         10.0 -> randomSwitch (
            91.0 -> exec( Requests.ListPassenger_1A ),
            9.0 -> exec( Requests.ListPassenger_COD )
         ),
         27.0 -> exec( Requests.ProvidePassengerHandlingInformation_1A ),
         9.0 -> exec( Requests.ProvidePassengerInformation_1A ),
         6.0 -> randomSwitch (
            90.0 -> exec( Requests.UpdatePassenger_1A ),
            8.0 -> exec( Requests.UpdatePassenger_COD ),
            2.0 -> exec( Requests.UpdatePassenger_GAT )
         ),
         3.0 -> exec( Requests.ProvideTravelDocuments_1A ),
         3.0 -> exec( Requests.GetEligibilityForMobileBP_AF ),
         3.0 -> exec( Requests.publishConfirmation ),
         1.0 -> exec( Requests.UpdateDocument_1A ),
         2.0 -> exec(Requests.TransferPassenger_1A ),
         2.0 -> randomSwitch (
           60.0 -> exec(Requests.ProvideAlternativeFlights_1A ),
           40.0 -> exec(Requests.ProvideAlternativeFlights_SSCOP_1A )
         )
      )
    )

Could the GC be the cause of the request congestion? Should I try different JVM Heap settings? (I see the same pattern with default GC settings)

PS: I can share the full gc.log with you through my dropbox if necessary.

Cheers

Daniel

There’s a Young collection happening every 40s or so. One was a bit long (0.81s) but well… Nothing in the Old.

I don’t see anything weird here.

Hi Stephane,

This post http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html describes Concurrent Mark Sweep is an old space garbage collector. The hiccups I see are during the CMS interval that is in the GC logs I posted.

Some other perhaps relevant details I forgot to mention earlier:

- Test is running at 200 tps
- Caching was enabled (by mistake, I'm trying now if disabling caching gives different results)

cheers

Daniel

Hi Daniel,

Sorry, my answer was stupid, I need sleep…

CMS STW phases are initial-mark and remark.

What’s ugly here is that the remark phases take quite some time:

[1 CMS-remark: 786894K(1048576K)] 1397914K(2936064K), 0.8182159 secs] [Times: user=1.38 sys=0.00, real=0.81 secs]

[1 CMS-remark: 915152K(1200652K)] 1839492K(3088140K), 0.2502080 secs] [Times: user=0.98 sys=0.00, real=0.25 secs]

[1 CMS-remark: 1287628K(1433740K)] 2276324K(3321228K), 0.2349840 secs] [Times: user=1.95 sys=0.00, real=0.23 secs]

[1 CMS-remark: 1535759K(1887824K)] 2618271K(3775312K), 0.1582888 secs] [Times: user=0.88 sys=0.00, real=0.16 secs]

I’m not sure how much pre-cleaning aborting is an issue.

The right place for all GC things is definitively the jClarity mailing list, you really should ask there.
As a developper and not a GC tuning rock star like the guys there, I’d first investigate the allocation rate, and if it’s weird, investigate what’s being allocated. Do you do things such as ton of big HTML pages parsing?

Cheers,