How to test load of a page and not one call

Hi,

Just started to play with Gatling, great tool! I'm trying to write a
scenario to test our home page. When the call goes out: www.homepage.com,
there are about 27 http requests that get triggered to cdn server,
image server, app server, etc. I can't figure out how does it gets
handle in Gatling?

Thank You,
David

I'm a newbie myself, but does the recorder capture all those requests
for you? If so you could record a session and take a look at the
recorded result to see what happens

I used a recorder today, it recorded 66 calls, instead of 27 or 29
that I need and it did not record some of the calls that I see in
firebug or I get automatically fired with other tools, like WebLOAD
for example.
I'm hoping there are better solution to this issue then use of
Recorder

The recorded is just a proxy so it is strange if it is just not
recording requests - Maybe your browser was caching stuff from the CDN
and not actually sending those requests when you recorded?

Hi there,

Yes, the recorder is just a proxy, so it records the requests your browser sends.
Make sure you don’t have other pannels sending ajax requests (such as your emails getting refreshed, etc).
You can also add filters (ant or regexp) to only capture the requests your interested in.

Cheers,

Steph

2012/5/4 Perryn Fowler <pezlists@gmail.com>

Thank You Steph,

I'm going to use regexes and try to record what I need. How
about .followRedirect?
Would that solve my issue?

Thanks,
David

Hello David,

Hard to tell, as I don’t know exactly what requests are missing and what requests should be here.
Could you please elaborate?

Regarding followRedirect, it means that the engine will automatically reissue requests when receiving a 30X response status code, until getting a non 30X one. So yes, you get let requests in your scenario when using followRedirect.

Cheers,

Stephane

2012/5/4 David T <dtishkoff@gmail.com>

Hi Stephane,

Basically my main concern is not that some calls were not recorded,
but if there is function/method I can use to pull all embedded
resources for the page?

Thank You for helping me with this.
David

The response is no.

You have to define every request in your scenario, so the best way is to record them with the recorder.
The reason is that in order to do that, Gatling would have to become a real browser: not only parsing HTML, but also executing javascript that might trigger ajax calls. Too complex and too performance consuming.

Steph

2012/5/5 David T <dtishkoff@gmail.com>

Hi Stephane,

What do you think about supporting extraction and retrieval of just
the embedded resources defined in the html response with no evaluation
of javascript?

Do you know if this approach is much more expensive than:
1. checking the response meets expectations via regex/xpath
2. continue on and issue the expected hard-coded requests

note: I work with David, and this topic came up in a discussion about
creating and maintaining gatling simulations efficiently. Clearly,
there is a runtime cost to fetching embedded resources, but I haven't
quantified it for myself yet. From our team's perspective, the
runtime cost is less important than the maintenance cost of keeping
simulations aligned with the real world, which we generally don't have
control over.

To that end, I was thinking of implementing the equivalent of JMeter's
'Retrieve all embedded resources" feature for http requests:
http://jmeter.apache.org/usermanual/component_reference.html#HTTP_Request

JMeter's implementation retrieves embedded:
* images
* applets
* stylesheets
* external scripts
* frames, iframes
* background images (body, table, TD, TR)
* background sound

If there is not a philosophical or architectural problem with adding
this feature, I would like to attempt it and see if and how it works
out. I understand that gatling is designed to be different from tools
like JMeter and WebLoad and so that if the feature cannot scale and
perform, then it won't make it into the official codebase.

The dsl would look something like:
http("store")
  .get("/store")
  .headers(headers_standard)
  .retrieveEmbeddedResources(List("*.domain.com","*.cdn.domain.com"))
  .check(status.in(List(200 to 210)))

retrieveEmbeddedResources retrieves all embedded resources by default,
but also takes an optional whitelist of regexes for filtering the
resource requests. retrieveEmbeddedResources might also need a
facility for specifying checks.

Regards,
Stephen

Hi Stephen,

First of all, thanks for the pull requests.

Static resources retrieving raises a lot of questions:

  1. Why fetch them? It’s quite common that those are served by a cache or a CDN, so performance might only be an issue from a network or browser perspective.
  2. Real world behavior depends on cache headers and browser cache content. How will your solution behave? Have a conservative approach where users would have an empty cache? Would it support caching and cache expiration?
  3. Gatling current workflow is a sequence. It means that we have yet to implement a workflow where resource get fetched with a scatter/gather strategy. See https://github.com/excilys/gatling/issues/431. I have a solution in mind that would spawn actors for each scatter/gather execution. Your solution should be based on this mechanism, otherwise, your static resources will be fetched one after another, and that’s not real world behavior.
  4. Parsing html and css would be expensive for sure (many regexps). As it will be an approximation (not supporting resources fetched from javascript), why not only sample on the first user (meaning the first user would parse the response and change the actor’s behavior so that future users wouldn’t have to do the parsing). Quite complex, but definitively the best way.
  5. What would the reports look like? Like with follow redirect support: “Request name, resource 1”, “Request name, resource 2”, etc…?
    Cheers,

Stephane

2012/5/6 Stephen Kuenzli <stephen.kuenzli@qualimente.com>

Stephane,

You’re welcome for the minor fixes. Thank you for creating such a nice tool. It is a credit to yourself and the team that gatling is maintainable, powerful, efficient, and scalable.

You raise some good questions and some answers are not clear-cut:1. Why fetch embedded assets?
a) Large assets can sneak into the source repo and content management systems and kill performance. Alternatively, the repo and all CMSes could guard against this problem on submit instead of writing tests to check for the problem.
b) Some of the embedded assets are actually dynamically-generated responses, e.g. a thumbnail rendered for a profile photo
c) Some CDNs are better than others and it is useful to characterize their performance from time to time.

  1. How should caching be handled? I’d say an embedded request would be done as if the cache is empty with no specification of headers or caching behavior. If testing of cache-related headers is important for a particular simulation, it’s probably better to write the simulation that way.

  2. How should the new embedded requests into the workflow? I was thinking of appending to the workflow sequence after parsing the response. Scatter/gather is certainly desirable with some level of specifiable concurrency; 4-8 concurrent requests should simulate most modern browsers.

  3. What about optimizing the simulation by sampling the flow for one user and change future actors’ behavior? That sounds great from a simulation-writer’s perspective, but it seems like it might be a bit much to ask gatling to do this.

  4. What would reports look like? I think the redirect model looks fairly good, but might suggest:
    “Request Name”
    “Request Name - Embedded 1 - <embedded resource uri 1>”
    “Request Name - Embedded 2 - <embedded resource uri 2>”

“Request Name - Embedded 3 - <embedded resource uri 3>”

Regards,
Stephen

Hi,

For this case the server-side caching should be considered, so that we simulate the real traffic between the client and the server.

At least we need to be able to filter according to caching status.

First call : get all
Second call : get only non cached content

Without this, the http behaviour will be eager and not testing the reality.

Still this is an interesting feature to start implementing.

@dbaeli (Salut, BTW) I agree, if we implement this feature, we have to implement a simple caching behavior. Just like in existing cookies implementation, let’s drop timed expiration.

Here’s the points that still puzzle me:

  • I’m afraid having a limit on the number of concurrent requests used for fetching the page resources will be easily feasible (even if feasible). I’d rather not have this feature at first.
  • I’d rather not have to implement HTML and CSS parsing myself… Some recommendations? neko?
    Anyway, this is an interesting feature, but be aware it will take time to implement…

Cheers,

Steph

I don't really have a dog in this discussion since this isn't a
feature I would use but it does seem like there is a big potential
here of the team getting drowned by the complexities of implementing
this. What is the use case? Gatling is a load testing tool and in
general static content like CSS and images generate very little load
on origin servers. If the site has any traffic to speak of this
content should already be served by a CDN in which case adding it to a
load test is meaningless.

If, on the other hand, you are trying to somehow test the speed the
page actually loads then none of the features discussed in this thread
would help with that. Again assuming the page is complex at all there
is most likely AJAX content being loaded after the page is officially
loaded. So next people would be asking for Gatling to execute
javascript as well as parse HTML

So in short I think if load testing is the real goal then static
content from a web page is basically meaningless. If responsiveness
of the page is what you want to test then the team would have to
implement a complete browser within Gatling. I've never seen a tool
do that very well. Even ones that are specifically designed to. I
would rather have the Gatling team spend their time focusing on
improving the load testing features without bloating the tool with a
bunch of features that are outside the scope of load/stress testing.

But again that's mostly me being selfish because I wouldn't make use
of the features discussed in this thread.

Chris

Steph,

re concurrency limits:
It’s reasonable to skip limits in a first attempt; limits are not necessary for the feature to be useful.

re parsing:

nekohtml is probably a good bet, it seems to be maintained and is used by Selenium 2 for their headless webdriver.

JMeter uses/composes sourceforge’s htmlparser (http://htmlparser.sourceforge.net/) for its embedded resource extraction. The sourceforge htmlparser project shows a last update in 2006, so their HTML5 support is probably going to be lacking. The relevant JMeter class looks like HtmlParserHTMLParser:http://svn.apache.org/repos/asf/jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HtmlParserHTMLParser.java

Stephen

Sorry, finally find the time to continue with this thread.

I agree with Chris that static resources handling seems to be important mostly for applications that don’t follow classic technics such as cache headers, server caching, sprites or CDN.
Tools such as YSlow and SpeedTracer can easily point out such problems, without running a stress test.

I don’t think Gatling will ever be (well, they say “never say never”…) a javascript engine so even if we’re able to parse HTML and CSS (still not pretty sure how, handling HTML4, XHTML, HTML5, CSS2 and CSS3 might prove funky), this will be just an approximation, and I’m afraid such a feature will generate numerous issues (“why isn’t my resource fetched?”).

If someone can contribute the parsing engine with the sufficient tests, we’ll be able to build the actor stuff on top of it.
Otherwise, I’m afraid our hands are full with other features to implement, such as clustering, server monitoring and database persistence.

Cheers,

Steph

2012/5/10 Stephen Kuenzli <stephen.kuenzli@qualimente.com>