Tuesday, May 3, 2011

Compress Traces - increasing time frame captured by trace files

When I enable a java server application traces in production environment - the traces only captured the last few minutes and this is not enough to diagnose some types of bugs.

In order to extend the time coverage by the traces I did several changes:
  1. reduce the number of messages to be traced i.e.,"less is more" : I reduce the log level of many trace message to finest log level, keeping the most important messages in info level. So the finest messages can still be used during development stage but not in production environments.
  2. compress the messages before writing them to the log file.

Here are the different compress algorithms I tested:
  1. The Dictionary : when tracing a line - we first split it to sub strings and then look for them in a hash table if it exist it return an integer associated with this string otherwise it insert this string as a key to a new unique integer. The hash table is also written to a file which act as a dictionary. The trace contain only integers.
  2. Zip : when tracing a string, we buffered it into memory and after we have 100 messages we zip them to a byte array. Then we write to the array size followed by the its content.
  3. The Dictionary & Zip : when tracing a string, we use method 1 to replace the string with an integer, then buffering the integer into memory. After we have 1000 integers we zip them to a byte array. Then we write to the array size followed by the its content.
I ran the same simulation against the server application in order to create identical logs and calculate the compression ratio of the different methods above.

Here are the results I got:

No compression Method-1 Method-2 Method-3
Trace file size (KB) 343 65 29 10
Compression ratio 1 5.27 11.82 34.3


X-axis is the number of the compress algorithm as described above.
Y-axis is calculated by (trace file with no compression) / (trace file with compression method x)
I plan to post soon:
  1. measurements of performance degradation of option 3
  2. measurements how time coverage improved by reducing the amount of trace messages.
to be continue...