LLM inference - completions endpoint

Hello,
My task to is test the performance of a LLM. Along with total response time I need the get the time-to-first-token metric. The call is an HTTPS POST, per example ‘https://api.openai.com/v1/chat/completions’ with ‘stream’ option added to the request body ( which serves the response as chat.completion.chunk json pieces for each token back.
Is there any way to achieve this? Thanks.

According to the doc, it looks like this would be a SSE.
Then, you can check Gatling’s SSE support.

Thanks for the quick response. However, I need to pass a request body and doesn’t look like SseConnectRequestBuilder supports that ( I am at version 3.7)

Indeed, I failed to realize that.
Atm Gatling doesn’t support anything but a GET request without a body. The reason is that both the specification and the JavaScript EventSource object only allow this.

Contributions or sponsoring welcome.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.