Tuesday, May 3, 2011

Compress Traces - increasing time frame captured by trace files

When I enable a java server application traces in production environment - the traces only captured the last few minutes and this is not enough to diagnose some types of bugs.

In order to extend the time coverage by the traces I did several changes:
  1. reduce the number of messages to be traced i.e.,"less is more" : I reduce the log level of many trace message to finest log level, keeping the most important messages in info level. So the finest messages can still be used during development stage but not in production environments.
  2. compress the messages before writing them to the log file.

Here are the different compress algorithms I tested:
  1. The Dictionary : when tracing a line - we first split it to sub strings and then look for them in a hash table if it exist it return an integer associated with this string otherwise it insert this string as a key to a new unique integer. The hash table is also written to a file which act as a dictionary. The trace contain only integers.
  2. Zip : when tracing a string, we buffered it into memory and after we have 100 messages we zip them to a byte array. Then we write to the array size followed by the its content.
  3. The Dictionary & Zip : when tracing a string, we use method 1 to replace the string with an integer, then buffering the integer into memory. After we have 1000 integers we zip them to a byte array. Then we write to the array size followed by the its content.
I ran the same simulation against the server application in order to create identical logs and calculate the compression ratio of the different methods above.

Here are the results I got:

No compression Method-1 Method-2 Method-3
Trace file size (KB) 343 65 29 10
Compression ratio 1 5.27 11.82 34.3


X-axis is the number of the compress algorithm as described above.
Y-axis is calculated by (trace file with no compression) / (trace file with compression method x)
I plan to post soon:
  1. measurements of performance degradation of option 3
  2. measurements how time coverage improved by reducing the amount of trace messages.
to be continue...

Sunday, July 25, 2010

When sendsignal and Ctrl+break fail to generate Java thread dump in Windows

I handled a customer incident where it was looked like the application was hang. As our application is running as NT service - I suggested to use sendsignal in order to generate threads dump file. this worked fine in my local environment but on the customer environment the file was not generated. I then send a batch file that launch the application as console application instead than NT service and suggested to press Ctrl+Break in the application window in order to generate threads dump file.
This also failed.
The customer was using Remote Desktop application. I suspect that the failure is related to the fact that the "Ctrl+Break" was done from remote so I suggested to do it on the server's keyboard itself but this also failed.

Next, I thought to generate the threads dump from inside the application. I googled on this but did not find away. I consult a colleague of mine on this and he told me on com.ibm.jvm.Dump.JavaDump() see "IBM JVM 6.0 diagnostics guide".
Before implementing this into our application, I went to consult my team lead on this idea - she thought there must be a more standard way to generate threads dump. She did not want to add new functionality (dump generation) that was not necessary to our application activity. She asked me to try and find if such standard way exist.
She was right - there is such way and it worked for the customer.
I found it in the diagnostics guide above and it involves the use of three steps:
  1. generating system dump
  2. extracting JVM info from the system dump
  3. viewing the thread's stacks and monitors
How to generate system dump in windows
  1. install "userdump" from Microsoft site: http://www.microsoft.com/downloads/en/confirmation.aspx?familyId=e089ca41-6a87-40c8-bf69-28ac08570b7e&displayLang=en
  2. run "userdump -p" . it will list all running processes and their process id in the right column
  3. find the application process id
  4. run "userdump pid " . this should generate " process_name.dmp" file in "C:\kktools\userdump8.1\x86" folder (in case you are using x86 platform)

How to extract the JVM info from the system dump

  1. cd to C:\kktools\userdump8.1\x86
  2. run "jextract process_name .dmp". this should generate "process_name.dmp.zip" and "process_name.dmp.xml"

How to view the thread's stacks and monitors

  1. copy the "process_name.dmp.zip" to a local folder
  2. open CMD.exe
  3. cd to the local folder
  4. run "jdmpview -zip process_name.dmp.zip"
  5. execute "set logging file file_name"
  6. execute "set logging on"
  7. execute "info thread *"
  8. open the log file in editor and investigate



By the way, jdmpview has rich functionality and you can read on it in the diagnostics guide.

Sunday, May 30, 2010

How to create Window's dump files with unique names

We wanted to configure our server in Windows environment, so when one of its process crashes it will produce a dump file.

Our first try was Dr.Watson but we quickly found out that it creates all dump files with the same name i.e., in case there are several crashes only the last one will have a dump file. This was unacceptable.

After googling on this for a while I found this useful article that talks about NTSD (NT Symbolic Debugger).
This debugger has several advantages:
  1. it creates a unique dump file for each process that crashes
  2. it does not need a special installation as it is included in Windows installation (in C:\WINDOWS\system32)
Note: It might be understood from the article above that the NTSD version that comes with Windows Server 2003 does not support unique dump file names but it does.

Here is how to install NTSD:
  1. make sure your C code doe not catch exceptions (so the exception is handled by Windows code)
  2. open "regedit"
  3. go to "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug"
  4. choose "Debugger"
  5. set: "ntsd.exe" -p %ld -e %ld -g -c ".dump /ma /u c:\app.dmp;q"
this will create dump files with names like "app_2010-05-30_08-07-25-406_145C.dmp" in "c:\" folder

you can test the configuration by running CrashMe application.

Friday, May 21, 2010

Field experience: Java Util Logger preformance tunning

I needed to run one of our Java server in production environment with full traces enabled.
The server would crash under load, due to out of memory error. I've looked into the heap dump that was produced, with eclipse MemoryAnalayzer, and found that the memory outage was due to overflow in the incoming message queue. I concluded that running with traces damaged the throughput and caused the outage.

First, I wanted to measure the throughput without traces and found it is takes 0.18 ms to process a message. Then I run with full traces and measure that it took 20 ms to process a message, more than 100 times longer.

My next step was to review the code related to the logger initialization and formatter. I found that when the logger was initialized then it was using a ConsoleHandler (as well as FileHandler) by default while our server runs as NT service and have no console. This code was insert during development phase but was not needed in production. After removing this code, I measured 9.32 ms per message i.e., now it takes only 50 times longer.

Next I tried to change the way we format a trace message e.g., In the formater, we had code that calculated the name of the method from the call stack. Similar, we had code that calculated the class name from the logger record. both were designed to remove the burden form the programmer to write the class name and method each time he writes a trace message and let this information be appended automatically. After removing this code, I measure 1.94 ms per message i.e., now it takes 10 times longer. This code change means that the programmers will have to fix all existing message to include class and method names, and all future traces should be added with this information as well. This is a heavy task as hundreds of trace line should be manually fix but the impact of this change is an improvement by 500%. in case this change will enable running with traces in production than I think its worth the work.

Next, I replaced the time calculation that is appended to each trace line and put System.currentTime, but this change had a minor impact of 1.69 ms per message.
I had to decide if the burden of running a special tool to transfer the currentTime millisecond to date and daily time each time a trace is investigated is worth the minor throughput improvement and decided no to go with this change.

Here is a graph that summarize the results: the blue graph depict the improvements of average time of processing message due to code change and the pink graph is the average time of processing a message without traces.
the Vertical axis is the average time it took to process a message and the horizontal axis is the code changes I did:
  1. original code (name "base" in the graph)
  2. removal of tracing to console (named "console" in the graph)
  3. removal of class and method name from the formatter. (named "class and methods" in the graph)
  4. replacing Date with System.currentTime (named "currentTime" in the graph)


I've conducted a load test against our application with two set ups: one with the original logging we had and the other with logging version were I removed the logging to console and the class and method name from the formatter.
The result of the tests show that in the first set up the application was not available 100% of the time while in the second set up it was not available only 15% of the time. the maximum response time was ~10 minutes and the average response time was 24.6 seconds.

Friday, May 14, 2010

Java Logger with SimpleFormatter is ten time slower than OutputStream



I compared the time it takes to write a log file with several Java I/O classes and found that Logger takes 10 times longer than OutputStream.

I used OutputStream, RandomAccessFile and Logger to write a trace file of million lines. each line contained 250 characters and a prefix of the writing time. I repeated and collected the results 7 times.

The graph below summarize the results: vertical axe is the time it took to write the file in milliseconds and the horizontal is the sampling number.
Here is the I/O setup I used :
FileInputStream #1 - PrintWriter with auto flush enabled
FileInputStream #2 - PrintWriter with auto flush disabled
RandomAccessFile #1 - RandomAccessFile which write to file after each line
RandomAccessFile #2 - RandomAccessFile which store 10,000 lines in a buffer before writing to file
RandomAccessFile #3 - RandomAccessFile which store 200,000 lines in a buffer before writing to file
Logger #1 - Java util logger (JUL) with SimpleFormatter
Logger #2 - JUL with a Formatter that append the time to each message
Logger #3 - JUL with a Formatter that append the time and class name to each message

Monday, April 12, 2010

Java Notes API prints "PANIC: Insufficient memory."

Problem:
A Java application using the Java Notes API (for IBM Domino server) printed:

Thread=[1934:0002-193C]
Stack base=0x0032EB34, Stack size = 1248 bytes PANIC: Insufficient memory.

when it was reading and writing to a large size NSF file ~2GB .

Solution:
setting the JVM maximum heap to 200MB with "-Xmx200m"

Why this resolve the issue?
When launching a JVM without "-Xmx" parameter the JVM set it to a default value which on Windows platform is half of the real memory.
As the machine had 3.5 GB RAM , it means the JVM tried to reserve 1.7 GB for the process.
The Java Notes API uses the process shared memory to handle the nsf.
it PANIC when it could not allocate more memory, as 1.7 GB of process memory space was already reserved by the JVM.

What facts suggested the cause?
The domino server NSD contained the following:
COMMIT RESERVED TOTAL
PRIVATE: 25.1M 1.7G 1.7G
MAPPED: 174.9M 4.5M 179.4M
IMAGE: 56.1M 0.0K 56.1M
TOTAL: 256.1M 1.7G 2.0G
which indicated that the memory was reserved by the Java application and not by the Java Notes API

Wednesday, January 27, 2010

ant compile without debug information by default

I've wrote a build.xml for a java project and found out that the application trace contained exceptions stack trace with "(Unknown Source)" instead of line numbers.
I found out that ant compile without debug information by default.

The way to set debug information is by adding
debug="true" debuglevel="lines,vars,source"
to the javac task.