LLM - tokens per second ( instead of RPS)

tcristi74 · October 3, 2023, 4:54pm

When measuring the performance of an LLM model, the RPS is somehow irrelevant as we use tokens-per-second to measure capacity. The task if to figure out how many tokens-per-second a GPU can support
The no of tokens per req I can extract from response body. This number is nondeterministic and can vary slightly per each call. I am thinking to create a separate file where to track req name/timestamp start/end/no of tokens, but I am curious if anybody has run into this before and came up with an approach to actually display this into the report. Thanks.

system · November 2, 2023, 4:55pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LLM inference - completions endpoint Gatling (Open-Source)	4	587	November 3, 2023
Requests per second incorrect? Gatling (Open-Source)	7	140	April 16, 2014
Gatling requests per minute? Gatling (Open-Source)	1	165	July 23, 2015
Active users vs Requests Per Second Gatling (Open-Source)	1	753	July 13, 2020
Number of requests per second chart count Gatling (Open-Source)	4	356	December 7, 2021

LLM - tokens per second ( instead of RPS)

Related topics