Wednesday, June 13, 2012

Separation of concerns

I came across a scenario where the login functionality is in the home page itself. It worked fine for 3+ years in production until we came across an outage due to issues in code that also got the home page down.

Now, the login page would be separated from the home page, and the login page would encourage user to bookmark it. This way the bandwidth requirement would also come down as the home page would not be accessed by regular users over time. 

Friday, June 8, 2012

Issue with generating random numbers on Linux using SecureRandom

I came across a sceanario where due to load, several Application servers had thread built up and finally crashed. Thread dumps revealed that many threads were stuck on following stack trace

ajp-10.10.48.19-8009-238" daemon prio=10 tid=0x00000000537f7000 nid=0x51c waiting for monitor entry [0x000000004547e000..0x000000004547fc00]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:308)
    - waiting to lock <0x00002aaaf2868f98> (a java.io.BufferedInputStream)
    at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453)
    at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:123)
    at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:118)
    at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
    at java.security.SecureRandom.generateSeed(SecureRandom.java:495)

We have had the similar symptoms as NIA. Under load the API to generate secure random number would take time. Random number generation is based on keyboard/ mouse activity. These are missing in servers. Hence the API takes time to generate a distinct random number. The API completes quickly when there is no load. But when there are many threads requesting for a random number, the API slows down.

To disable the API from generating a perfect random number and to make it complete fast we set a new Java following property in Java Options.

java.security.egd=file:/dev/./urandom

The 90th percentile transaction response time went down from 15.1 sec to 1.6 sec by just setting this property for a load of 50 concurrent users.

Here is the complete set of JAVA Parameters:

-Djboss.jvmroute.name=f2m56g2 -Dprogram.name=run.sh -server -javaagent:/AppData/AppDynamics/AppServerAgent/javaagent.jar=uniqueID=f2m56 -Xms4096m -Xmx4096m -XX:NewRatio=3 -Xss256k -XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:+UseAdaptiveSizePolicy -Dsun.rmi.dgc.client.gcInterval=3600000 -Djboss.partition.name=f2m56 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:+Disable ExplicitGC -XX:PermSize=384m -XX:MaxPermSize=384m -XX:+UseTLAB -Xloggc:ace2_gc.log -Dorg.jboss.logging.Log4jService.catchSystemOut=false -Djava.util.Arrays.useLegacyMergeSort=true -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.library.path=/jboss-4.2.3.GA/bin/native -Djava.endorsed.dirs=/jboss-4.2.3.GA/lib/endorsed -classpath /jboss-4.2.3.GA/bin/run.jar:/usr/java/jdk1.7.0_55/lib/tools.jar org.jboss.Main -Dorg.jboss.logging.Log4jService.catchSystemOut=false -Dcfg.system.property=/project/GEMSSmbImpl/config/env/common,/project/GEMSSmbImpl/config/env/GEMSCommonEnv.xml,/project/GEMSSmbImpl/config/env/GEMSOnlineEnv.xml,/project/GEMSSmbImpl/config/env/GEMSOnlineCfg.xml,/project/GEMSSmbImpl/config/Gems.properties


This is our env.:
OS version: Red Hat Enterprise Linux Server release 6.4 (Santiago)
JBoss version: jboss-4.2.3.GA
JDK version: jdk1.7.0_55

When you go to Linux prompt, you can actually list down the device /dev/urandom. You may have to find an equivalent urandom path, if your Linux version is different.

NOTE
One funny thing. We had to use file:/dev/./urandom. It would not work if we just set the property to /dev/random.


Java was waiting on input to the secure number generator code. On Linux, /dev/random is a “blocking” number generator meaning if it doesn’t have enough random data to provide, it will simply wait until it does. We can get more background information on /dev/random here - http://en.wikipedia.org/wiki//dev/random. Keyboard and mouse input as well as disk activity can generate the randomness, or entropy, needed but perhaps not fast enough for particular applications. A lack of random data would force the JVM to wait, for eternity if necessary.

Reference links -
http://tech.top21.de/techblog/20100317-why-java-s-random-number-generation-might-block-my-application-under-linux.html;jsessionid=B76AA3FB494E7E50B732CEB498B61668

http://www.ghidinelli.com/2011/01/11/cold-fusion-slow-start-messagebrokerservlet