[GATLING 2] SSL handshake_failures

Stefan_Magnus_Landro · June 10, 2013, 10:43am

After upgrading to gatling 2 (the snapshot kindly created by Stephane - 2.0.0.20130605) we’re seeing quite a few SSL handshake failures.
We used to get som in gatling 1.5.1 too, but we get far more now, so it is probably a client issue.

Stefan

12:38:19.294 [WARN ] i.g.h.a.AsyncHandler - Request ‘Service call’ failed
javax.net.ssl.SSLException: Received fatal alert: handshake_failure
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1639) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1607) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1776) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1080) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884) ~[na:1.7.0_04]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758) ~[na:1.7.0_04]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.7.0_04]
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1225) [netty-3.6.6.Final.jar:na]
Wrapped by: java.net.ConnectException: Received fatal alert: handshake_failure to https://whereever.no/service
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:103) ~[async-http-client-1.7.18.20130605.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1417) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1293) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:913) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.6.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.6.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_04]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_04]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_04]

Excilys · June 10, 2013, 12:35pm

As you see with this stacktrace, this is pure regular Netty code. Could you provide some more information about your env so that I can open an issue there, please?
JDK, OS, application under stress and everything you think meaningful.

Excilys · June 10, 2013, 12:43pm

Have you tried upgrading your JDK?

Excilys · June 10, 2013, 12:50pm

Did you pass a custom trust or key manager?

Stefan_Magnus_Landro · June 11, 2013, 7:50am

Hi,

The certificate that is used by the server is issued using a certificate java doesn’t trust by default. We’ve not set up a custom trust store, however I don’t see any warnings - which is kinda weird, right?

We’ve seen this issue both on windows and linux and different versions of java (1.7.0_19 linux and 1.7.05-b06 windows)

Any ideas?

Excilys · June 11, 2013, 9:15am

I think I know why you see more of those: we have disabled AHC’s retry feature by default (see gatling.conf file: https://github.com/excilys/gatling/commit/ce9e33e5f32c9bf3e3c1e5769c310aba94be4a9b).

So I guess the problem was always here but was hidden.

Stefan_Magnus_Landro · June 11, 2013, 10:30am

Yep. That helped. Probably a threading issue.

Excilys · June 11, 2013, 10:49am

So, no longer handshake failures, nor 505, nor slowdowns?

Excilys · June 11, 2013, 11:27am

BTW, do your 505 problems also happen on HTTP?
I realized that zero-copy is disabled over HTTPS.

Stefan_Magnus_Landro · June 11, 2013, 3:09pm

I’ve only seen it happen using https

When I ran through Charles I used http so that might have been the reason why it worked

Stefan

Stefan_Magnus_Landro · June 11, 2013, 3:10pm

No more ssl handshake issues. The 505 thing is a different issue.

Excilys · June 11, 2013, 3:45pm

I’ll have to clean up this part of AHC. Stay tuned.
And thanks for your patience…

Stefan_Magnus_Landro · June 26, 2013, 1:08pm

I’ve been investigating the ssl issue a bit

I’ve looked at the ahc and netty code, but it’s hard to grasp what’s going on.

Apparently one can get ssl debug info using -Djavax.net.debug=ssl,handshake on both client and server.
I’ll give it a shot within the next few days.

Stefan

Excilys · June 27, 2013, 1:33pm

Hi Stefan,

Could you sum up the situation, please?

IIRC, you had 2 different kinds of problems:

requests being “mixed”, bodies being send from a given user actually being sent from another one
SSL handshake issues
I had you try different strategies: passing a file, and passing a byte array read from the file.

What is the outcome of the different combinations?
Are requests being also mixed over HTTP?

Cheers,

Stéphane

Stefan_Magnus_Landro · June 27, 2013, 1:57pm

See inline comments

Hi Stefan,

Could you sum up the situation, please?

IIRC, you had 2 different kinds of problems:

Yes

requests being “mixed”, bodies being send from a given user actually being sent from another one

No exactly right. It seems like the request is not flushed correctly after passing a raw data to http.post. When the connection is reused by the same user in the next request, we get a 505. However, if we introduce a pause of 1 sec between these two requests, we never get any 505s.

SSL handshake issues

We’re seeing quite a few of these, but after we increased the retry count, the situation has improved a lot

I had you try different strategies: passing a file, and passing a byte array read from the file.

What is the outcome of the different combinations?

File->505
Byte array->ok

We ended up introducing the short pause to keep the code clean

Are requests being also mixed over HTTP?

Yes indeed

Cheers,

Stéphane

Sorry for being unclear before. Hth.

Stefan

Stefan_Magnus_Landro · June 27, 2013, 2:00pm

Btw, requests are not being mixed when I pass them through Charles

Excilys · June 27, 2013, 2:05pm

AHC body parts code is eminently complex: strategies are different according to the part type (file or string or bytes) and the protocol (in HTTP, it tries to perform zero-copy and write directly into the socket from the file Channel, but it doesn’t over over HTTP).
I’ll try to clean up all this and come up with a prototype.

Thank you for summing up, I’m doing too many things and started losing track.

Stay tuned,

Stéphane

Stefan_Magnus_Landro · June 28, 2013, 10:07am

hhm, I tried reproducing the ssl handshake issue using a way simpler simulation without multipart requests - but couldn’t. Maybe the 505 issue is related after all? I’ll keep you posted.

Excilys · June 28, 2013, 12:16pm

I just produced a new timestamp: 2.0.0.20130628

You should no longer get the NPE.

Stefan_Magnus_Landro · June 28, 2013, 12:30pm

Perfect. Works like expected. Thanks.

Topic		Replies	Views
Help with Handshake failure!!!! Gatling (Open-Source)	0	126	October 18, 2017
Received fatal alert: handshake_failure using recorder Gatling (Open-Source)	1	148	December 19, 2016
Stress test on https site fails Gatling (Open-Source)	12	162	June 21, 2017
Gatling recorder encounters handshake error, but browser does not Gatling (Open-Source)	3	117	November 15, 2016
Handshake failure while using the Recorder Gatling (Open-Source)	11	182	April 26, 2016

[GATLING 2] SSL handshake_failures

Related topics