Sunday, September 18, 2011

Log slow queries, prioritize & tune them

In my current assignment we collect queries that take more than a second to execute in our production environment We then parse these logs using a perl script to populate unique queries a database table. A charting tool is used to generate top 10 slow schema's and their top ten slow queries to keep our focus.

As a next step we tune the queries on following priority basis:

1. repeatedly run slow
2. examine lot of records
3. fetch lot of records
4. take more time to execute
5. take up locks for long time

We spend some time every day on the reports and many a times we learn new patterns which if attached would help to get rid of many slow queries.

For e.g. we had a release of a common component that several applications use. We found that the slow query count increased one fine day. The common component was actually firing these queries impacting performance of the applications using them. We of course tuned all of them and witnessed the sky scrappers become huts in 2 days :)

String concatenation

There are various ways of implementing String concatenation in Java. Here are three means which one would often hear and ponder on their differences with respect to performance implications of using them

String – Concatenation of two string objects creates a new object internally. Use String objects in places where concatenation requirement is very less. (String object is immutable, i.e. value stored in the String object cannot be changed)

StringBuffer – Is synchronized, which means it is thread safe and hence we can use it to implement threads for your methods

StringBuilder – Added in Java 5. Identical in all respects to StringBuffer except that is NOT synchronized, which means that if multiple threads are accessing it at the same time, there could be trouble (Not thread safe). For single-threaded programs, the most common case, avoiding the overhead of synchronization makes the StringBuilder very slightly faster.

Note: StringBuffer/StringBuilder is mutable.

Review default configuration settings

I learnt that its important to review the default server configuration settings. Some of the examples are:

1. Server log Level, Disable console logging, etc
2. Clients/Threads in Web, Application and Database tiers
3. JVM settings in an JEE Application server
4. Max File Descriptors - OS like Linux.

How do I decide which HTML object I need?

After reviewing many web pages I have come with some guidelines to select an HTML object and its impact on performance.

1. Criteria - If records are less than 5
Object type to use - Radio button/Check box
Example - Male/ Female

2. Criteria - Records rarely increase & is easy to scroll & choose
Object type to use - Drop down
Example - City, State, Country

3. Criteria - Records increase but still in range & limited filters to search
Object type to use - AJAX look up (AJAX call can be triggered after certain no. of characters are typed)
Example - Clinic names, Site names.

4. Criteria -Records are in thousands and can increase also, many filters to search
Object type to use - Use Lookups (popup screen)
Example - Search an item by item id, item name, store name, item type, etc

Web Reports - Some points to check for.

Some of the things one can check in web reports -

1. Mandate filters - To narrow the number of records fetched

2. Put appropriate Constraints – Date field can always be constrained although its period and any constraints on any other field would required help from a functional domain expert

3. A constraint would typically require Functional knowledge

4. Set default for certain filters e.g. Status filter defaulted to Open

5. Paginate depending on the context - need not be same for all screens

6. To reduce navigation show the latest/relevant records first

7. Leverage various Pagination design patterns

Points to ponder before writing a Stored Procedure

I have been in several debates and have read several articles on whether to use Stored procedure. I came up with set of points one should ponder before writing a Stored Procedure.

Basically a stored procedure is -
- Set of SQL statements stored on the server and are pre-compiled.
- Takes in certain arguments and processes that code with those arguments at execution time

Points to ponder before we use Stored Procedure

1. Ease to scale database compared to application tier?
2. Ease to write and maintain?
3. Ease to debug
4. Need for specialized skill set
5. Performance gain
6. Reduction in traffic between application and database tier
7. Reusability & transparency
8. Security

Developer - Query works fast in my environment but is slow in production?

Database perspective matters than the developers. :)

Generate a Query execution plan to understand the execution plan chosen by the database optimizer.

An execution plan (Explain plan) helps to understand:

1. Number of rows examined
2. Use of indexes
3. Missing indexes
4. Type of join
5. Information on sorting, etc

Avoid Full Table Scans

Full table scans are where the database will read the entire table without an index.
Some of the reasons why full table scans are performed:

1. no WHERE clause
2. no index on any field in WHERE clause
3. poor selectivity on an indexed field
4. too many records meet WHERE conditions
5. using SELECT * FROM
6. function used on the indexed column in the query

Tip - Generate Explain plans to understand how database optimizer treats the query

Software Library Management - Do’s & Don’ts

In software projects we make use of various types of software libraries. These libraries can be third party, application server related or project specific. Governance of Library management in such projects is very important. I have compiled a small list of do's and don'ts of software library management.

1. Avoid duplicates jar files
2. Avoid multiple version of same jar files
3. Remove unwanted jars
4. Use latest stable versions of library files
5. Use Server default Library NOT Application library for common shared libraries
6. Form a Change management to govern the library management and its usage.

In one of the projects am going through 250 + jar files to know which files to keep, update or delete. It's difficult as I do not know which jar files are used by which deployed application. After some googl-ing I came across a very handy utility named Jaranalyzer which helps to understand jar dependency. The tool needs to be fed with the folder that contains all the jar files and the output is either an XML or a dot file. The dot file can be passed through an open source graph visualization software to get a graphic representation of the dependency.

URLs to refer -
http://code.google.com/p/jaranalyzer/
http://www.kirkk.com/main/Main/JarAnalyzer#xml
http://www.graphviz.org/