Thursday, November 22, 2012

Debugging a Communications Link failure issue for a long MySQL procedure

A long procedure executed from JBoss on MySQL instance fails every 2 hours in production environment. Find below the reason for the same.

There are three TCP related parameters in Linux OS which played role in this.

cat /proc/sys/net/ipv4/tcp_keepalive_time = 7200 seconds
cat /proc/sys/net/ipv4/tcp_keepalive_intvl = 75 seconds
cat /proc/sys/net/ipv4/tcp_keepalive_probes = 9

Description - The first two parameters are expressed in seconds, and the last is the pure number. This means that the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, and then resend it every 75 seconds. If no ACK response is received for nine consecutive times, the connection is marked as broken.


Conclusion - We need to know default settings in our environment and of course the procedure requires tuning :)

No comments:

Post a Comment