LLM - tokens per second ( instead of RPS)

When measuring the performance of an LLM model, the RPS is somehow irrelevant as we use tokens-per-second to measure capacity. The task if to figure out how many tokens-per-second a GPU can support
The no of tokens per req I can extract from response body. This number is nondeterministic and can vary slightly per each call. I am thinking to create a separate file where to track req name/timestamp start/end/no of tokens, but I am curious if anybody has run into this before and came up with an approach to actually display this into the report. Thanks.

