LLM inference - completions endpoint

tcristi74 · October 2, 2023, 8:39pm

Hello,
My task to is test the performance of a LLM. Along with total response time I need the get the time-to-first-token metric. The call is an HTTPS POST, per example ‘https://api.openai.com/v1/chat/completions’ with ‘stream’ option added to the request body ( which serves the response as chat.completion.chunk json pieces for each token back.
Is there any way to achieve this? Thanks.

slandelle · October 2, 2023, 9:14pm

According to the doc, it looks like this would be a SSE.
Then, you can check Gatling’s SSE support.

tcristi74 · October 2, 2023, 10:08pm

Thanks for the quick response. However, I need to pass a request body and doesn’t look like SseConnectRequestBuilder supports that ( I am at version 3.7)

slandelle · October 4, 2023, 8:16pm

Indeed, I failed to realize that.
Atm Gatling doesn’t support anything but a GET request without a body. The reason is that both the specification and the JavaScript EventSource object only allow this.

Contributions or sponsoring welcome.

system · November 3, 2023, 8:17pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Gatling from a loadrunner perspective Gatling (Open-Source)	4	203	August 30, 2013
Response time for each request Gatling (Open-Source)	3	182	November 9, 2015
Individual Page Response time Gatling (Open-Source)	1	174	March 30, 2013
Large overhead on first request Gatling (Open-Source)	19	112	May 30, 2012
Output mean response times in realtime monitoring Gatling (Open-Source)	1	144	September 12, 2016

LLM inference - completions endpoint

Related topics