Monday, October 13, 2008

My Confluence 3.0 Wishlist

Confluence 2.9 was released last month and I've seen references to 2.10 in the Confluence issue tracker, so I expect to see it out in 1-2 months. That makes me think about what's next.

As a part of my adventures of working on Sun's external wiki wikis.sun.com, I've been working on Confluence plugins and even the Confluence core code for a year and a half now, adding new features, enhancing the existing features and very often fixing bugs. Sometimes it was trivial to enhance the code or fix a bug, other times it was not, but what I want to write about today are things that were not possible at all without irreversibly forking the code.

Confluence 3.0 should be a version that really deserves to have the first digit incremented. Not because marketing said it's time for that, but because the changes in the application are so significant.

I'm sure that Atlassian has lots of ideas about what Confluence 3.0 should look like, but Atlassian guys, in case you start to run out of ideas, here is my wish list:

Fix the Database Schema

Confluence has been in development for years and the database schema definitely shows that. Since the database is the heart of the application, I think it deserves a lot of attention and major performance boost could be gained by doing a clean up.

Specific improvements:
  • Establish and in the future enforce naming conventions
  • Replace all the natural foreign keys with surrogate keys, e.g. user name, spacekey, group name should be replaced with ids in all the referencing tables (this would finally allow CONF-4063 to be implemented)
  • Add caches for the lower function (patch) and maybe counter caches

Rework the Clustering

Clustering is usually supposed to fulfill two functions: scalability and robustness. In the case of Confluence mainly the second attribute is missing. In fact, I'd go as far as saying that a Confluence cluster is less robust than a single instance of Confluence. Why? Because the way it is implemented makes the entire cluster vulnerable when one node has problems.

I personally experienced several cluster lock-ups or crashes, usually initiated by a separate Confluence bug, in which the effect was multiplied by the clustering code. One of such of these bugs: CONF-12319

Mike's presentation covers quite a few design goals behind the implementation in Confluence. Clustering can really get ugly and complicated and Mike covered it pretty well. Unfortunately the distributed share part of the clustering makes Confluence prone to problems.

One of the clustering goals that Mike emphasizes in his presentation is that clustering should be "admin-friendly" (low admin overhead and easy setup). While I agree with the low overhead part, the easiness of setup should not compromise the goals which clustering is trying to fulfill in the first place. Clustering is for people who are serious about running Confluence, and as such should be expected to be qualified for the job.

Specific improvements:
  • Either reevaluate the distributed share clustering so that it is super robust, or consider implementing clustering via a centralized share
  • Avoid shutting down the entire cluster when "cluster panic" is detected. A better solution, which avoids unnecessary downtime, would be to shut down all the nodes, except for the nodes properly clustered with the oldest node.

Clean Up the HTML and CSS Code

The html code that comes out of Confluence is horrendous. While the rendered output looks pretty pleasant, looking under the hood (browsing the source code in a browser) is not recommended for pregnant women, men with ED, high cholesterol, and generally not recommended for people over 50.

Some improvements were done in this area in the recent releases, but all of them were just minor cosmetic surgeries. Confluence really needs major surgery that will bring the html code up to current standards. The benefit of this will be much faster page loads and code that is easier to maintain and enhance.

Specific improvements:
  • Rewrite most of the templates and macros to make them XHTML 1.0 compliant
  • Minify and combine javascript and css files (CONF-8622)
  • Use image sprites to even further speed up page loads (especially in the rich text editor)

Redo the URI Namespace

Human friendly URIs and URLs are becoming more and more important on today's Internet. Confluence is not doing well in this area.

Specific improvements:
  • /display/MySpace/My+Page - is the /display part really necessary? Can't we do /MySpace/My+Page
  • /pages/diffpages.action?pageId=2490471&originalId=45714293 - What is this? I don't know. How about: /MyWiki/My+Page/diff/23:22. I think that actually means something. There might be a better format, this is just a thought.
  • I think in general redoing the URI name space using REST conventions would be interesting.

Improve Atlassian-Renderer

When I was creating come patches for the atlassian-renderer I was surprised to find that atlassian-render, the module responsible for rendering wiki markup into html is full of hardcoded html snippets. The main reason why this surprised me is that most of the Confluence code is pluggable, which allows for parts of the code to be replaced with a better version without a lot of problem. This is not the case with the render. And this presents two problems: it's not possible get Confluence to directly render anything else than html (pdf and doc are only derived from the html), and it's not possible to use anything else than Confluence markup as the input for the renderer.

The first problem makes me unable to render custom output like docbook or to improve the PDF output, which is pretty poor.

The second issue means that all the customers that use Confluence are locked-in because all the content created via Confluence is Confluence-specific and can't be easily moved to a different wiki engine when needed.

In my opinion the sooner all major wiki engine developers settle on one wiki markup standard the sooner we will all be better off. This might be especially difficult for Atlassian to swallow and implement, because they standardized on their own markup that they also use in their other products.

An interesting initiative that is gaining a lot of traction is Creole, a standardized wiki markup. Confluence is one of the few major wiki players that doesn't support this initiative.

Specific improvement:
  • Split the current renderer into two pluggable parts: parser and renderer
  • Implement Creole support (CONF-12077)

Improve Developer Documentation

spent countless hours, especially in the beginnings trying to figure out how Confluence works and how Confluence plugins should be written. I learned some new tricks and that's the good part, the bad thing is that the experience could have been much better if the the code contained more javadocs comments and if the plugin interfaces and mainly the configuration file format was better documented.

Specific improvements:
  • Add JavaDoc comments where missing
  • Finally provide complete specification and documentation for the plugin config file (JRA-12183)

Thanks! :-)

That's about as much as I can think of for now. There are probably other things that I missed and then there enhancements around security, which I know are already on the roadmap.

I understand that most of the changes above will create incompatibilities with many existing themes and plugins, but hey, Confluence 3 will happen only once EVAR and releases like this are expected to bring major incompatibilities. Data can always be migrated automatically and existing plugins and themes will be migrated when there are people interested in using them and proper migration instructions are provided.

I hope that Confluence 3 will not be a "marketing" release, but instead something really cool that all users and developers will enjoy working with.

10 comments:

Anonymous said...

I like where you're going with this; going to swing back around soon and put my two cents in. At the moment, however, I am buried in code. Good read. I at least wanted to say that for now.

pelegri said...

Plus support for GlassFish; please? - eduard/o

Igor Minar said...

I thought that glassfish was already supported, no?

Unknown said...

Hi,

Looks like you've run in to many of the pain barriers that Adaptavist encounter on a daily basis. Totally agree that 3.0 needs to be a major update with significant improvements to _existing_ functionality (not just a bunch of new features for marketing purposes).

As for Creole - it's 'effing hideous. Since when does "==" instinctively mean "heading one"? I'd rather climb the Eiffel Tower with nothing but the suction power of my own lips, or have my kneecaps stapled together, than have to work with nasty Creole-like syntax.

And as for making HTML output better that's going to be a very hard task indeed - first, we have the joy of widespread MSIE6 infestations on corporate networks. IE6 can barely render HTML, yet alone XHTML (which even modern browsers like Firefox are only just getting to grips with). Then there's all the plugins out there that are written in Java and, uhm, well, Java developers are awesome at server-side code. But not that hot when it comes to the browser... ;)

Anonymous said...

thank u r information

it very useful

Anonymous said...

Hi Igor, I wholeheartedly agree. Confluence is a very good product, but could definitely be better and I think 3.0 is the time to do it. I'll also try and throw some suggestions on here soon.

Peter Raymond said...

Tossing another "agree wholeheartedly" on the pile. Our Enterprise requires us to run a cluster even though Confluence stability dropped through the floor after doing so. Our Enterprise Application support people looked at how clustering is implemented and alternated between laughing and cursing...

Igor Minar said...

@eduard/o You are right. Glassfish is *not* supported. The linked issues are reported against Glassfish v1 and SJSAS8.x. I bet that Glassfish v2 works just fine. I recall seeing some blogs about people who run confluence on glassfish already.

@guy confluence markup has it's own cons and pros. The point is that users (or admins) should have an option to select the markup the want to use. I'm not saying that confluence markup should be replaced with Creole markup, I think that it should be possible to make them run side by side.

@Peter Raymond :-) I feel with you. Clustering is *very* important for us too to achieve high availability, but with Confluence it's been a painful experience so far.

Charles Miller said...

I'm pretty sure Confluence runs on current versions of Glassfish, but there's a pretty significant investment of effort standing between "we're pretty sure it works" and "we actively support it". We simply don't get enough demand for Glassfish support to justify that investment. (more info Confluence supported platforms FAQ)

The main focus of the Confluence 2.9/2.10/3.0 trilogy has been on user experience. To respond to Guy, most of the work for these releases has focused around improving existing features -- the basic UI, search, the rich text editor, the plugin system, performance and security -- rather than adding new stuff.

A lot of the suggestions on this list are pretty high up on my personal list of priorities: improving the database schema, the renderer, clustering, developer documentation, all things I agree we need to put time into over the next few releases.

Some we've already been working on: for example we have an implementation for auto-minifying and batching javascript and CSS resources that just didn't get done in time for the 2.10 feature freeze.

Other suggestions are a little harder to make a business case for. It makes sense for us to follow our existing plan that when we work on a piece of the system, we improve the standards compliance of the HTML code as we work on it. It's a lot harder to make a case for overhauling all 535 velocity templates in one fell swoop, just so we can say in the release notes "Confluence: now does exactly what it did before, but without offending your office web standards guy"

Similarly, it's hard to make a business case for cleaning up the URL space for a single release, but there are things we can do to improve it incrementally.

(The "/display" prefix is there because if we didn't have the prefix, the app's root URI namespace would clash with space keys. We'd have to arbitrarily disallow keys that clashed with reserved symbols ("users", "images" etc) and constantly worry that anything new we put into the namespace was going to clash with some existing user's space key)

Igor Minar said...

Charles, thanks for the comments.

My main point is that 3.0 is when you can make major changes that will break many legacy stuff and you'll get away with it.

I'm glad to hear about the code for processing javascript and css.

wrt "standards compliance" your business case can be decreasing the page size (and with it simplifying the DOM), which results in faster page loads and better javascript performance.