Friday, August 10, 2007

SunWikis 1.0.4 released - Is the Outage Really Necessary?

We just went through the first upgrade of wikis.sun.com since the launch a week ago.

The main highlight of the 1.0.4 release was the upgrade to Confluence 2.5.6 which contains many critical security fixes. Other than that, we fixed some UI issues that we found in the previous SunWikis release.

You might wonder why did we go offline for a little less then 30 minutes during the upgrade. The answer is that Confluence doesn't support "live" upgrades. As mentioned in the Confluence Cluster Upgrade HowTo, we had to take the site down in order to upgrade safely.

I don't like the idea of having an outage every time we upgrade Confluence. I'll see if we can get Atlassian to help us minimize the down time somehow.

It is usually safe to let nodes with different versions of software touch the same database, supposing that no changes to the data or the db schema happened during the upgrade. This should apply to Confluence as well and it could be the first step in minimizing the down time.

If Atlassian started differentiating between the upgrades that modify the underlaying database and those that don't, and this information would be available in the Confluence release notes, we could minimize the down time from 30minutes every 2-3 weeks down to 30 minutes every 2-3 months.

2-3 months is the usual release cycle for major Confluence versions. Changes to the db schema in such releases are expected, as opposed to minor releases which IMO should change the db only when it is necessary for fixing a high priority bug.

Atlassian has been pretty responsive to our requests so far*, so I hope that we can come up with some solution to this problem as well.

* though I haven't heard from them anything related to the request for Confluence roadmap yet

No comments: