Good design for high throughput plugin

Hello there

At Datastax, we have developped a Gatling plugin so that Gatling can speak CQL (Cassandra Query Language) and benchmark our products. It works well until a certain throughput is injected per Gatling instance. And at some point things go wrong.

When I inject more than 60k users/s, I see the number of active queries in console output grow to hundreds of thousands. No KO are reported. The problem is that this should not be possible. After a certain amount of active CQL queries, the DSE driver rejects new ones immediately. In my tests that limit is at 32k. Thus, the number of active queries should be always around 32k and the number of KO should grow very quickly.

It seems that some queueing effect is happening before the driver is called.

In a thread of this mailing list, I could read that the gatling HTTP connector was able to send up to 140k ops/s. So this suggests that the issue could be in the DSE plugin design. I have ruled out the “not powerful enough machine” hypothesis by using a beefy server with 48 cores and 128 GB RAM.

Is there a documentation about how to write an efficient Gatling plugin? I could not find any and mostly looked at other plugins for inspiration.

Now for the core design:

  • The DSE driver is completely asynchronous (based on Netty)
  • The plugin creates an Akka router with nbCores actors
  • All calls to DseRequestAction::execute() results in delegating the task to that router to free Gatling injector actor as fast as possible
  • The same goes for latency recording, in order to free the Netty threads as fast as possible

Do you think it is a reasonable design? Do you have any suggestion as to how I could investigate the whole issue?

Thanks in advance for you time