WebSocket actors hang sporadically during the simulation

Samuel_Volin · September 27, 2016, 12:14am

We have a rather large simulation built that relies on the websocket API (which will be rebooted in gatling 3, but for now we’re on 2.2.2). This simulation is fairly big, and runs for a few minutes. Code snippets below have been slightly edited, but are accurate.

We have written our test components as objects so that when we call

`
GetSearchesForUser()

`

It runs from…

`

class GetSearchesForUser extends WebsocketRequest {

override val destination = “SearchesService”
override val messageType = “GetSearchesForUser”
override val data = “”“{“criteria”:null}”“”
}
object GetSearchesForUser {
val request = new GetSearchesForUser()
def apply() = {
request()
}
}

`

Which, in our WebsocketRequest, eventually calls`

exec(
ws(s"$messageType").sendText(session => {
constructRequest(session)
}).check(wsAwait.within(responseTimeout).until(1).regex(
TrackedResponseRegex
).saveAs(s"$messageType")
)

`

When running this simulation with one user, we find that the actor sometimes gets stuck between requests and checks.

`

…

ITSA / OpenSearchPage / GetSearchesForUser (OK=3 KO=0 )
ITSA / OpenSearchPage / GetSearchesForUser Check (OK=3 KO=0 )
ITSA / OpenSearchPage / SettingSaveRequest (OK=1 KO=0 )

---- ITSecurityAnalyst ---------------------------------------------------------
[--------------------------------------------------------------------------] 0%
waiting: 0 / active: 1 / done:0

slandelle · September 27, 2016, 5:34am

Can you share a sample that can be used to reproduce?
If not, you can first try to debug WsActor.
If you can’t find a bug here, and the issue is indeed in Gatling and no in your system under test, it means that you’ll have to dig into AsyncHttpClient, or even in Netty.

Samuel_Volin · September 27, 2016, 5:59pm

Unfortunately I cannot provide a sample. This is a rather large product. It’s why great lengths were made to build a massive gatling simulation.

Good advice looking at wsActor! I turned on debug logging in the logback.xml with
`

`
so I could get some logger.debug statements in wsActor. I ran this a few times with

Before a sample actor was hung, it printed the following (commentary in red):

`

10:41:57.663 [DEBUG] i.g.h.a.a.w.WsActor - Sending message check on WebSocket ‘gatling.http.webSocket’: TextMessage(3:::{
“destination”:“AuditLogService”,
“messageType”:“AddAuditLog”,
“data”:{
“eventType”:12,
“userId”:-100,
“personId”:-100,
“objectType”:11,
“objectName”:“Multiple Alarms Selected”,
“info”:“ids: 112,113,114”
},
“method”:“post”,
“_tracker”:“13f70147-74f3-4f75-bbd8-92f3fd0e3c92” ← _tracker looked for in the TrackedResponseRegex
}) ← The request sent before the check gets stuck
10:41:57.663 [DEBUG] i.g.h.a.a.w.WsActor - setCheck blocking=true timeout=2 seconds ← The corresponding AddAuditLog check looking for the TrackedResponseRegex
10:41:57.681 [DEBUG] i.g.h.a.a.w.WsActor - Check on WebSocket ‘gatling.http.webSocket’ timed out ← old check timing out
10:41:57.799 [DEBUG] i.g.h.a.a.w.WsActor - Received text message on websocket ‘gatling.http.webSocket’:3:::{“destination”:“TaskService”,“messageType”:“ActiveTasks”,“data”: …
… “method”:“subscribe”,"_tracker":“a4904910-4f6a-4450-bb6a-7dfd55ed9686”,“clientAddr”:“10.128.72.85”} ← subscribed response from old request, note the _tracker doesn’t match

`

We have a few requests that will send multiple responses when specified with “method”:“subscribe”. The request from “destination”:“TaskService”,“messageType”:“ActiveTasks” is among them. This request had already sent one response, and it’s check had passed. It’s check (defined in WebsocketRequest) uses .until(1), so it was expecting just one response to succeed it’s check. The server will send more responses, but they should be ignored because they don’t pass the new checks.

We ran this under different configurations where it hung in different places. Whenever the actor was hung, it was after it received another text message from a subscribe response.

Looking at wsActor onTextMessage, It should be looking at the current check (set in this case by the “destination”:“AuditLogService”,“messageType”:“AddAuditLog” request, which also just expects a response back with a matching _tracker). This leads to a few questions:

L156 implicit val cache = mutable.Map.empty[Any, Any]
What’s up with this? I don’t see a cache being used anywhere.
L158 check.check(message, tx.session) match {
Should compare the current AddAuditLog check to the message, which should be a failure and pattern match to L172. That doesn’t hang the actor, does it? I don’t think so…
L154 tx.check.foreach { check =>
Because we are using wsAwait and no other nonblocking checks, tx.check has only whatever checks haven’t timed out in the last two seconds. This certainly includes the current check just set by AddAuditLog. Is this correct? Could it also still have the passed check from A****ctiveTasks?
Am I wrong that multiple responses from our subscribe requests are not ignored by the new checks?

Monsieur Landelle, your assistance is wonderful. Thank you for all your help! It is immensely appreciated.

slandelle · September 27, 2016, 6:37pm

Are you sure you’re setting the check directly on the response, and not in a subsequent action?
Otherwise, you might miss the response if it arrives in between.

I’m afraid I won’t be able to help more than that. First I’m swamped. Then, Gatling is our means of making a living.
I hope you’ll understand.

Regards,

Samuel_Volin · September 28, 2016, 4:58pm

I rebuilt gatling multiple times, each time littered with more and more print statements until I found the bottleneck. The regex matcher was taking 4-6 minutes attempting to match the TrackedResponseRegex with an input that wouldn’t match.

I would leave this advice to anyone seeing their actors “hang” during a simulation: Make sure your regex is optimized for success AND failure cases. This article on optimized regex parsing helped immensely, and the 4-6 minutes running matcher.find turned into microseconds again.

Stephane, thanks again for your help. Gatling is wonderful, and your assistance is very gracious.

slandelle · September 28, 2016, 8:45pm

Thanks for your kind words!
I’d be very curious to know more about your use case and your regex.
4-6 minutes to perform a regex??? How did you manage to do that?
And how big is your payload? WebSockets are usually used either with binary payloads, so regex are of no use here, or JSON, but payloads are usually small.

Topic		Replies	Views
Problems with websockets tests after upgrading to 3.0-RC2 Gatling (Open-Source)	1	131	October 1, 2018
multiple "async" checks for websocket Gatling (Open-Source)	2	179	January 8, 2016
WebSocket wsAwait hangs indefinately, if connection closed Gatling (Open-Source)	6	144	June 27, 2018
websocket check (wsAwait) not working, timing out even when messages are received Gatling (Open-Source)	3	390	February 16, 2018
Websocket: cannot listen for pushed messages from the server if i have a running wsAwait with check Gatling (Open-Source)	10	202	September 10, 2014

WebSocket actors hang sporadically during the simulation

Related topics