Wednesday, June 25, 2008

BTrace == DTrace for Java

Last week, I was trying to nail down a bug in SunWikis that was triggered by some kind of race condition. These kinds of issues are pretty nasty, especially if the application in question is a pretty complex beast, something that Confluence definitely qualifies as.

The debugger was no help because the timing had to be very precise for the issue to occur, and logging was totally useless because of zillions lines of logs that would be difficult to safely filter out. In many cases the information I needed was not present in logs anyway and since my guess was that the bug was coming from 3rd party code, logged data couldn't be easily expanded. DTrace, which I blogged about in my previous post, could have revealed some information, but I think that it would be very difficult to write a good D script that could give me some clues.

While waiting for my dev server to restart, I came across an interesting blog post that caught my eye. It mentioned this thing called BTrace, that I hadn't heard about before. It promised it to be a DTrace aimed at Java apps. With a fair amount of skepticism, I navigated to the BTrace website and started reading. Four things about BTrace earned my interest:
  • It's based on same or very similar principals as DTrace, but specialized for Java apps
  • No restarts necessary, you can observe a running app by connecting to a JVM identified by a PID
  • The BTrace programs are written in a language that is a subset of Java, and heavily based on annotations. This was a bit of a turnoff for me at first, until I found out that the programs don't need to be compiled manually but btrace takes care of that for me. Bravo!
  • I can call DTrace from my BTrace programs if I need to instrument native code or the OS


Armed with all this info and my eyes sparking, I wrote my first BTrace program and it worked as advertised! No restarts, no compilation, no hassle.

A few hours later, I had a program that was observing exactly those parts of Confluence that I needed. All of this while the application was running uninterrupted. I was able to snoop on how my code was interacting with Confluence and Confluence was in turn interacting with Hibernate in a way that I would have never dreamed of. All of this while concurrent events aimed at triggering the bug were happening.

Running BTrace programs is as easy as this:

$ ./bin/btrace \
-classpath hibernate-2.1.8-atlassian.jar:spring-hibernate2-2.0.6.jar \
6266 HibernateSessions.java

Where 6266 is the PID of my app/web server and HibernateSessions.java is my BTrace program.

Parts of the program looks like this (check out the javadoc comments for explanation of the code):

import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
import net.sf.hibernate.HibernateException;
import net.sf.hibernate.collection.PersistentCollection;
import net.sf.hibernate.engine.SessionImplementor;
import org.springframework.orm.hibernate.support.OpenSessionInViewFilter;

@BTrace
public class HibernateSessions {
/**
* A thread local variable used to filter out all the events that we
* are not interested it.
*/
@TLS private static boolean viaSso = false;
private static boolean printStack = false;

/** print a message when a thread enters doFilter method of SsoFilter */
@OnMethod(
clazz="com.sun.dse.wikis.auth.SsoFilter",
method="doFilter"
)
public static void enableProbing() {
println(str(currentThread(), " - Entered SSO Filter"));
viaSso = true;
}

/** print a message when exiting the doFilter method of SsoFilter */
@OnMethod(
clazz="com.sun.dse.wikis.auth.SsoFilter",
method="doFilter",
location=@Location(Kind.RETURN)
)
public static void disableProbing() {
println(str(currentThread(), " - Exited SSO Filter"));
viaSso = false;
}

/**
* print an message with a HibernateException is thrown, with detailed
* info about the current context
*/
@OnMethod(
clazz="net.sf.hibernate.HibernateException",
method=""
)
public static void onthrowreturn(HibernateException self, String s) {
println(str(currentThread(),
strcat(" - HibernateException was thrown: ",
strcat(s, strcat(" | collection: ", str(collection))))));
}

/**
* Print a message when Hibernate is attaching a collection to a session.
* This was crucial info for me to get in order to resolve the issue
*/
@OnMethod(
clazz="net.sf.hibernate.collection.PersistentCollection",
method="setCurrentSession",
location = @Location(Kind.RETURN)
)
public static void setSessionForCollection(boolean returnValue) {
if (returnValue == true) {
println(str(currentThread(),
strcat(" - a collection was attached to a session. Collection:",
str(collection))));
}
}


The result of this was a workaround for my code as well as a pretty detailed bug report that Atlassian confirmed to be a very similar to a confirmed bug that was reported recently. I could hardly achieve this without BTrace or without an in-depth knowledge of how all the Confluence code works.

As great as BTrace is, it is still a beta. If you decide to use it in production environment you should keep that in mind. During the several hours I spent working with BTrace, I experienced one JVM crash/lockup caused by BTrace. I bet that this kind of issues will be ironed out soon, and often this kind of risk is worth undertaking in order to resolve otherwise untraceable bugs.

Since last week, BTrace is part of my toolkit and I already have other mysteries on my ToDo list that I want to shine some light on with BTrace. I applaud the BTrace team for giving us this amazing tool!

19 comments:

alexismp said...

I've always thought that TheAquarium was as much for internal and external audiences ;)
Or does it make sense to differentiate the 2?
Thanks for sharing your experience!
-AlexisMP

Igor Minar said...

In a company like Sun where many things happen in open, there is no separation between the two. All of us are just developers eager to learn something new and fix problems that bother us.

If this project was hidden behind the company firewall, there would be no way for me to find it at the time I needed it the most.

william.louth said...

Sorry but you can do this and much more in other tools without the obvious pain and they will work across practically all platforms and not just Java 6 and one or two unix flavors.

For example JXInsight has 500 extension libraries (think btrace classes) covering most commercial and open source solutions. Most importantly JXInsight provides a comprehensive visualization console that works on a software execution model rather than primitive string sets. I could have deployed one extension in a matter of minutes to find this issue and I would not have needed to even know the actual hibernate class names and method calls.

Igor Minar said...

William,

Based on what do you claim that BTrace runs only on "one or two unix flavors"? AFAIK, it's cross-platform.

I checked out JXInsight, but from the examples I could find, it looks very intrusive and definitely not as easy to pick up and use, which is what I like a lot about BTrace.

I don't mind that BTrace is console only (for now - looks like some GUI is coming as a part of VisualVM).

Don't take me wrong, JXInsight looks interesting, but if I had to debug this issue again and had to decided between BTrace and JXInsight based on the info available online, I'd go with BTrace.

william.louth said...

You decision is understandable....you are just looking for the remote capability to do a simple println(..). Maybe btrace should have been called dprintln().

Lets be honest here btrace is a poor mans AOP solution built on a low level system tool with no concept of a software execution model which is probably a strength in some cases. JXInsight approaches data collection and analysis from a explicit software and system execution modeling perspective with API's not so generic and restrictive and whole fully lacking tool support. JXInsight has been developed over 7 years and has more than 500 extension jars and 4000 system properties so yes you might be inclined to use a simple println solution but I never expected it to solve such urges especially in the context of short term goals.

I find it amazing that you would take the time to look at hibernate sources and create a class that explicitly links into external libraries and not consider creating a aop.xml packaged in a jar but I suppose this is easy when you know AspectJ.

Igor Minar said...

William,

I don't see a reason why you do you need to be so derogatory about BTrace.

Just because your tools have been around for much longer and are much more sophisticated and complex, it doesn't mean that other similar tools can't exist.

Do you think that MySQL AB should have never created MySQL, just because Oracle and IBM were on the market already?

In my opinion BTrace is much more than just println. It's a handy tool that is great for troubleshooting problem even when you need to use it as an afterthought. From what I saw, I can't say the same about JXInsight.

And because it can integrate with DTrace, it can be used to troubleshoot wide variety of issues, even those that spread to the native code, all the way down to the OS kernel. Is that something that JXInsight can do as well?

Let's be honest here. You are just trying to sell your apps, while I'm being excited about a new open source project that has already proved it self to be useful.

william.louth said...

When an engineer has to revert to marketing tactics (open source versus closed source) in an attempt to win an argument I suppose there is no point discussing further but I feel obliged to justify my comments which you incorrectly label "derogatory".

"btrace is dprintln()" or "btrace is a poor mans aop"

Here are the restrictions listed in the dtrace user guide.

can not create new objects.
can not create new arrays.
can not throw exceptions.
can not catch exceptions.
can not make arbitrary instance or static method calls - only the public static methods of com.sun.btrace.BTraceUtils class may be called from a BTrace program.
can not assign to static or instance fields of target program's classes and objects. But, BTrace class can assign to it's own static fields ("trace state" can be mutated).
can not have instance fields and methods. Only static public void returning methods are allowed for a BTrace class. And all fields have to be static.
can not have outer, inner, nested or local classes.
can not have synchronized blocks or synchronized methods.
can not have loops (for, while, do..while)
can not extend arbitrary class (super class has to be java.lang.Object)
can not implement interfaces.
can not contains assert statements.
can not use class literals.

All the btrace scripts I have seen call com.sun.btrace.BTraceUtils.println().

Igor it was YOU that started the name calling referring to complexity and intrusion. Now either you are a smart guy who likes to twist the truth or tell little white lies or none of these. Which one is it?

If there is anybody (over) selling here it is you (SUN) with claims of "extreme innovation" at JavaOne around a btrace and a visual shell that allows developer to place a Swing tab in a Swing tab pane. It is not like you guys at Sunnyvale are completely ignorant of the other tools on the market judging by the weekly stats on my blog.

Now tell me about complexity. Below is the Probes API what part do you not get?
http://www.jinspired.com/products/jxinsight/api/

William

Igor Minar said...

William,

From btrace web: "BTrace - dynamic, safe tracing system for the Java platform"

"dynamic" - no static probes, unless I'm mistaken the API you pointed me to is for static probes.
"safe" - this explains all the restrictions you posted

BTrace is a CLI tool, so using println is a way it can communicate with the user. Except for writing the output to a file, there is no other way for a CLI tool to do that.

By "intrusive" I mean that from what I've seen, you can't use JXInsight on a running app without restarting the JVM. If you can, then you should highlight it on your web.

And I'm not "(over) selling" anything, just sharing my positive experience.

william.louth said...

Igor: "unless I'm mistaken the API you pointed me to is for static probes."

Well that explains your responses. You have not bothered to even read a single post on my blog because if you did you would know that we instrument via AspectJ a fully fledge AOP solution that can easily be extended. At least I read what docs there are related to btrace.

This might come as a big surprise to you and others at Sun but instrumentation and measurement (resource metering) can be indeed be decoupled. We support various types of instrumentation which allows us to profile a distributed request across different hosts, processes and threads. Again read the blogs.

You can be safe and have much more than a CLI or println tool which you at least agreed is the only way btrace can communicate with the user.

Igor: "By "intrusive" I mean that from what I've seen, you can't use JXInsight on a running app without restarting the JVM. If you can, then you should highlight it on your web."

Another insight. Have never worked in a production environment at least one that is managed (ITIL)? No system operator is going to allow anyone (especially a developer) to dynamically load a profiling agent that was not already configured. This is CHANGE REQUEST and the whole application would have to RETESTED and REDEPLOYED.

In my first posting I wanted to point out that other tools that are FREE can and have done a much better job at this over the last few years.

It is frustrating to see Sun waste time on adhoc and short term solutions while not addressing the need to have a proper JSR for both resource metering and runtime diagnostics which I have pushed for with both IBM, Oracle and Sun. Unfortunately it falls on deaf ears at your end because all Sun can see and hear is dtrace and every attempt from Sun at sometime Java like just looks so like dtrace. Why is it so hard for you guys to see that a Java API that addresses the needs of both consumer (management tools) and producer (dynamic agent) available in the runtime and used by "dynamic" agents and tools would benefit us all much more than a back to the past with println? We even offered to donate both our Probes and Diagnostics Open APIs at JavaOne.

Igor Minar said...

You really are good at twisting everything I write.

First of all, I don't speak for Sun. This is my private blog and even though I often touch on work related topics, all the opinions are solely my personal opinions.

Secondly, it seems that you are venting your frustration from a failure at JavaOne on my blog. IMO with open sourcing Java, Sun is letting the community have a bigger impact on the decisions made around Java. Maybe that should be the direction you should take to achieve your goals.

Jaroslav Bachorik said...

I can't help myself but to react here.
William, what I could see lately was you guys from your company stalking any performance related blogs and attacking any solutions which were not yours.

If you don't like BTrace, don't use it. It's as simple as that. You don't need to start flames just to increase the search score for your product in google.

Regards

william.louth said...

Jaroslav, do you not think it best to at least state your own interest before posting your trite.

techzen said...

Hi Igor!

Thanks, this was a very informative post.:)

Regards

Anonymous said...

Igor, nice post. Next time I'm using Java, I'll check it out.

P.S., don't feel the need to respond to obvious trolls like "william.louth".

Cymen said...

Interesting and informative. I'm posting mainly to have another voice saying please ignore those that have a set agenda. It's clear to anyone reading this and it doesn't nothing to help the agenda (in fact, it hurts it but I doubt they would believe that). Thanks for the good read.

Anonymous said...

Hello Igor.

Awesome post; can't wait to try BTrace.

Apparently william is the ceo/cto/cleaning lady/ for that company but based on his professional behavior I wouldn't touch his products with a 10 feet pole.

Anonymous said...

I'm a complete noob trying to use BTrace and haven't figured out how to contact A. Sundararajan -- so I'll ask you.

I have a simple BTrace script logging the getColumn method and it works fine in telling me the argument to the getColumn is "javax.swing.table.DefaultTableColumnMode@1ebf568, 1" -- what I would like to log is the actual contents of the column -- in other words, I'm still one level of indirection from where I want to be.

Here's the file I am using:

package com.sun.btrace.samples;

import com.sun.btrace.annotations.*;
import com.sun.btrace.AnyType;
import static com.sun.btrace.BTraceUtils.*;

@BTrace public class showgetColumn {
@OnMethod(
clazz="javax.swing.table.DefaultTableColumnModel",
method="getColumn"
)
public static void m(AnyType[] args) {
print(strcat("entered ", name(probeClass())));
print(strcat(" method ", probeMethod()));
printArray(args);
}
}


What should I use in place of "printArray(args)" to log the actual strings that appear in the table column?

Thanks!

Igor Minar said...

Hi Anonymous,

check out faban/samples/JdbcQueries.java

Particularly this part (sorry about the formating):
@OnMethod(clazz = "+java.sql.Connection", method = "/prepare.*/", location = @Location(Kind.RETURN))
public static void onPrepareReturn(AnyType arg) {
if (preparingStatement != null) {
print("P"); // Debug Prepared
Statement preparedStatement = (Statement) arg;
put(preparedStatementDescriptions, preparedStatement, preparingStatement);
preparingStatement = null;
}
}

Anonymous said...

Thank you for the link and specifying where to focus my attention.

Now I get to go play with the code!