
Last week, I was trying to nail down a bug in
SunWikis that was triggered by some kind of race condition. These kinds of issues are pretty nasty, especially if the application in question is a pretty complex beast, something that
Confluence definitely qualifies as.
The debugger was no help because the timing had to be very precise for the issue to occur, and logging was totally useless because of zillions lines of logs that would be difficult to safely filter out. In many cases the information I needed was not present in logs anyway and since my guess was that the bug was coming from 3rd party code, logged data couldn't be easily expanded.
DTrace, which I blogged about in
my previous post, could have revealed some information, but I think that it would be very difficult to write a good D script that could give me some clues.
While waiting for my dev server to restart, I came across an
interesting blog post that caught my eye. It mentioned this thing called
BTrace, that I hadn't heard about before. It promised it to be a DTrace aimed at Java apps. With a fair amount of skepticism, I navigated to the BTrace website and started reading. Four things about BTrace earned my interest:
- It's based on same or very similar principals as DTrace, but specialized for Java apps
- No restarts necessary, you can observe a running app by connecting to a JVM identified by a PID
- The BTrace programs are written in a language that is a subset of Java, and heavily based on annotations. This was a bit of a turnoff for me at first, until I found out that the programs don't need to be compiled manually but
btrace
takes care of that for me. Bravo! - I can call DTrace from my BTrace programs if I need to instrument native code or the OS
Armed with all this info and my eyes sparking, I wrote my first BTrace program and it worked as advertised! No restarts, no compilation, no hassle.
A few hours later, I had a program that was observing exactly those parts of Confluence that I needed. All of this while the application was running uninterrupted. I was able to snoop on how my code was interacting with Confluence and Confluence was in turn interacting with Hibernate in a way that I would have never dreamed of. All of this while concurrent events aimed at triggering the bug were happening.
Running BTrace programs is as easy as this:
$ ./bin/btrace \
-classpath hibernate-2.1.8-atlassian.jar:spring-hibernate2-2.0.6.jar \
6266 HibernateSessions.java
Where 6266 is the PID of my app/web server and
HibernateSessions.java
is my BTrace program.
Parts of the program looks like this (check out the javadoc comments for explanation of the code):
import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
import net.sf.hibernate.HibernateException;
import net.sf.hibernate.collection.PersistentCollection;
import net.sf.hibernate.engine.SessionImplementor;
import org.springframework.orm.hibernate.support.OpenSessionInViewFilter;
@BTrace
public class HibernateSessions {
/**
* A thread local variable used to filter out all the events that we
* are not interested it.
*/
@TLS private static boolean viaSso = false;
private static boolean printStack = false;
/** print a message when a thread enters doFilter method of SsoFilter */
@OnMethod(
clazz="com.sun.dse.wikis.auth.SsoFilter",
method="doFilter"
)
public static void enableProbing() {
println(str(currentThread(), " - Entered SSO Filter"));
viaSso = true;
}
/** print a message when exiting the doFilter method of SsoFilter */
@OnMethod(
clazz="com.sun.dse.wikis.auth.SsoFilter",
method="doFilter",
location=@Location(Kind.RETURN)
)
public static void disableProbing() {
println(str(currentThread(), " - Exited SSO Filter"));
viaSso = false;
}
/**
* print an message with a HibernateException is thrown, with detailed
* info about the current context
*/
@OnMethod(
clazz="net.sf.hibernate.HibernateException",
method=""
)
public static void onthrowreturn(HibernateException self, String s) {
println(str(currentThread(),
strcat(" - HibernateException was thrown: ",
strcat(s, strcat(" | collection: ", str(collection))))));
}
/**
* Print a message when Hibernate is attaching a collection to a session.
* This was crucial info for me to get in order to resolve the issue
*/
@OnMethod(
clazz="net.sf.hibernate.collection.PersistentCollection",
method="setCurrentSession",
location = @Location(Kind.RETURN)
)
public static void setSessionForCollection(boolean returnValue) {
if (returnValue == true) {
println(str(currentThread(),
strcat(" - a collection was attached to a session. Collection:",
str(collection))));
}
}
The result of this was a workaround for my code as well as a pretty
detailed bug report that Atlassian confirmed to be a very similar to a confirmed bug that was reported recently. I could hardly achieve this without BTrace or without an in-depth knowledge of how all the Confluence code works.
As great as BTrace is, it is still a beta. If you decide to use it in production environment you should keep that in mind. During the several hours I spent working with BTrace, I experienced one JVM crash/lockup caused by BTrace. I bet that this kind of issues will be ironed out soon, and often this kind of risk is worth undertaking in order to resolve otherwise untraceable bugs.
Since last week, BTrace is part of my toolkit and I already have other mysteries on my ToDo list that I want to shine some light on with BTrace. I applaud the BTrace team for giving us this amazing tool!