Error: "At offset=3230, failed to received stats for injectors List ..."

When I run my test I have the status Broken after 50 mins.
This is the error message which I see in the logs:
“At offset=3230, failed to received stats for injectors List(test private ip pool jvm 11/XX.XXX.XX.XX): timeout. Stopping”
What could be the issue here with the injectors?
Thanks for any advice

We can’t tell what happened with just these logs. We would need the full logs, or the account you used.

This can happen for the following reasons:

  • a network loss lasting more than 10 seconds
  • a load generator crash (you should have more info at the end of the logs)
  • the load generator’s I/O being saturated (check the Load Generator tab for CPU and IO events and the Connections tab for bandwidth)

Regards

These are the logs I have. Does it help?

[17:09:03,299]	Cloning git repository
[17:09:03,304]	Cloning into '/tmp/frontline-8559332366097267989'...
[17:09:04,744]	Start compiling project in '/tmp/frontline-8559332366097267989'
[17:09:04,744]	Start deploying pool test private ip pool jvm 11 with 1 instance
[17:09:22,074]	Pool test private ip pool jvm 11 deployed successfully: (test private ip pool jvm 11/1.2.3.4).
[17:09:23,096]	Compilation completed successfully.
[17:09:23,100]	Collected jar (/tmp/frontline-8559332366097267989/target/id-loadtesting-1.0.0-shaded.jar)
[17:09:23,102]	Packages successfully collected and instances successfully spawned. Proceeding with ssh checks.
[17:09:35,143]	Waiting for ssh to be up on test private ip pool jvm 11/1.2.3.4 on port 22: j.i.IOException: Process 'create injector working directory' exited with 255: ssh: connect to host 1.2.3.4 port 22: Connection refused. 59 remaining tries, please wait.
[17:09:35,673]	Connected over ssh to test private ip pool jvm 11/1.2.3.4.
[17:09:35,673]	All instances could be connected over ssh. Proceeding with Upload.
[17:09:36,218]	All uploads to test private ip pool jvm 11/1.2.3.4 successful.
[17:09:36,218]	All uploads successful. Proceeding with instance checking.
[17:09:37,047]	Instance test private ip pool jvm 11/1.2.3.4 check successful.
[17:09:37,047]	All instances check successful. Proceeding with starting injectors.
[17:09:37,187]	Injector on instance test private ip pool jvm 11/1.2.3.4 successfully booted.
[17:09:37,187]	All injectors successfully booted.
[17:09:38,202]	Couldn't connect over HTTP to test private ip pool jvm 11/1.2.3.4 on port 9999: j.n.ConnectException: Connection refused (Connection refused). 59 remaining tries, please wait.
[17:09:39,222]	Couldn't connect over HTTP to test private ip pool jvm 11/1.2.3.4 on port 9999: j.n.ConnectException: Connection refused (Connection refused). 58 remaining tries, please wait.
[17:09:40,242]	Couldn't connect over HTTP to test private ip pool jvm 11/1.2.3.4 on port 9999: j.n.ConnectException: Connection refused (Connection refused). 57 remaining tries, please wait.
[17:09:40,304]	Connected over HTTP to test private ip pool jvm 11/1.2.3.4.
[17:09:40,304]	All instances could be connected over HTTP. Proceeding with starting.
[17:09:40,310]	Injector test private ip pool jvm 11/1.2.3.4 successfully started.
[17:09:40,310]	All injectors could be started. Proceeding with Running.
[17:09:40,311]	Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: starting injection
[18:03:21,325]	Collecting stats from instance=test private ip pool jvm 11/1.2.3.4 at offset=3221 for offsets=([3220, 3221]) failed: j.n.ConnectException: Connection refused (Connection refused). Retrying.
[18:03:22,322]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3222])
[18:03:22,323]	Collecting stats from instance=test private ip pool jvm 11/1.2.3.4 at offset=3222 for offsets=([3220, 3222]) failed: j.n.ConnectException: Connection refused (Connection refused). Retrying.
[18:03:23,323]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3223])
[18:03:24,322]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3224])
[18:03:25,322]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3225])
[18:03:26,323]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3226])
[18:03:27,323]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3227])
[18:03:28,322]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3228])
[18:03:29,323]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3229])
[18:03:30,323]	Lag detected: Run 07fa2bbd-4465-45f5-bb38-27681ce9746e: requesting test private ip pool jvm 11/1.2.3.4 for offsets=([3220, 3230])
[18:03:31,323]	At offset=3230, failed to received stats for injectors List(test private ip pool jvm 11/1.2.3.4): timeout. Stopping
[18:03:31,323]	Injector on instance test private ip pool jvm 11/1.2.3.4 crashed. Gatling Enterprise will now try to check if the injector process is still running and kill it.
[18:03:41,332]	Failed to access crashed instance test private ip pool jvm 11/1.2.3.4 to get logs: j.u.c.TimeoutException: Task timed out after 10000ms
[18:03:41,875]	test private ip pool jvm 11: instances successfully stopped
[18:03:41,875]	Cleaning build directory /tmp/frontline-8559332366097267989```

Ahhh, so that’s a self-hosted installation.

Failed to access crashed instance test

Gatling Enterprise was not even able to ssh on the instance.
It means that either there was a total connectivity loss with the Load Generator or that the instance was shut down.

Aha, ok, so this is something that our infrastructure guys should have a look at, right?
It is not related to Gatling itself, right?
How can we prevent this from happening in the future?

so this is something that our infrastructure guys should have a look at, right?

Exactly.

It is not related to Gatling itself, right?

right

How can we prevent this from happening in the future?

Sorry, can’t say and it’s for your infrastructure people to figure out the root cause. The load generators can cope with a 10 seconds loss by default.

1 Like

@mrukavina FYI, another customer had this same issue and it turned out the load generators were being automatically quarantined by AWS Security Hub.
I recommend that you check if you don’t have something similar in place.

1 Like

Thanks, @slandelle, I will certainly check that with the team. Thanks for the heads up