Replaying production traffic logs with time awareness between each entry


I’m looking to use Gatling to essentially replay our production logs.
But it would be nice to be able to replay this traffic “as it happened” so to speak.
So the feeder, or something utilising a feeder, would have to take into account the relative time differences between each log/request entry.

I know Gatling doesn’t support this EXACTLY, but would there be some way to approximate this behaviour? Can any load generation tool pull this off?

Thanks :slight_smile:


What you want to do is very complicated.

First, you need to make sure that you have enough user threads to simulate the load. You may allocate 1,000 gatling virtual users, and only ever use 3 of them. But that’s better than not having enough.

Second, you need a feeder that allocates the requests from the logs. That feeder has to be smart, and not respond until it is time. If it has been 8 seconds since scenario start up, and the next log entry is 9.5 seconds from log start, then sleep for 1.5 seconds before returning a log entry. And, it must be thread-safe, only returning the “next” one to exactly one Gatling user.

If you build it, do share the end result so we can learn from it! :slight_smile:

I think you need some kind of parser that produces 2 things: a feeder of requests and headers (queue strategy) and a throttling profile.
Then, you can only replay idempotent requests (GET and the like).
And then, beware of memory usage if you end up with your whole access log in memory…

I’m trying to work out how best to implement this.

Implementing delays in the feeder, as suggested by John Arrowwood, seems like mixing concerns - a Feeder should just return results from a source without the side-effects of a blocking delay via Thread.sleep.

I’m wondering about using a custom throttler to do this. If I’m using a custom throttler, then I need to start my scenario with atOnceUsers(n), where n is the number of lines in the log file. The throttler is an upper bound, so conceptually wouldn’t it make more sense to have a way of injecting users with delays (to simulate users arriving according to the schedule in the log file) rather than creating n users and then throttling them (according to the log file schedule)?

So I tried writing a custom throttler, which uses scheduler.scheduleOnce to schedule every request according to the log file schedule.

It worked for about 1000 users (ie: the first 1000 requests in the log file) but with atOnceUsers(2000), I starting getting errors like:

Ignore message UserMessage

Can’t handle RequestMessage … in state Terminated
Can’t handle ResponseMessage … in state Terminated

io.gatling.http.ahc.AsyncHandler - Request ‘request_csv’ failed for user 6740039605054848212-1918 Closed
at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequest( ~[async-http-client-1.9.20.jar:na]

[ERROR] [05/06/2015 16:57:45.084] [] [ActorSystem(GatlingSystem)] exception while executing timer task
java.lang.IllegalStateException: HttpEngine hasn’t been started

IMHO, you didn’t get it right.
Turn your access log into a Feeder (possibility) that would produce url/requests and pause durations.

The down side is the pause durations will not respect response times. The pause between two requests is from beginning to beginning, or end to end, not the time from end of one to the beginning of the next, which is what you need to track if you want to faithfully recreate the original behavior.

It will need to do something like this:

The next request needs to be X milliseconds after the anchor/start time.
It has been Y milliseconds since we started.
X - Y is N
I need to pause for N milliseconds before I do the next request.

I’ve tried another attempt, this time by creating an InjectionStep that returns a Iterator[FiniteDuration]. The FiniteDurations are the time difference since log start.

This works but it means I’m injecting a new user for each request. My scenario is setup to execute a single request from the access log. However, because each new user starts a new network connection, this quickly generates ~2000 established TCP connections and overloads my server and I start getting Remotely closed exceptions in gatling. I’ve tried using .shareConnections but I still end up creating too many network connections for my server (I do notice however that I no-longer have a whole bunch sitting in TIME_WAIT). maxConnectionsPerHost didn’t seem to have an effect here either.

So I don’t think I’m modelling what I want. I think I want to model a max of about 500 users, sending requests on the same connection according to the delays present in the access log. So one way, is as you’ve pointed out, which is creating a feeder that produces urls and pause durations. But as John Arrowwood mentions, you need to know the response time of the user’s previous request and subtract it from the next request delay time, which sounds complicated and error prone.

Perhaps a better way would be to have a custom throttler handle this? If you pass the request to the throttler, along with the duration since start of the simulation for that request, the throttler could hold on to it until the right time and then release it. If and when I get time I might try and implement this.

Yes, you probably want to write a parser that returns both a feeder (for the requests) and a throttle profile.