Tuesday, July 19, 2011

Large Scale Java Design

I ran into a problem when I reached the point where I wanted to start integrating my append-only database storage engine into the rest of Suneido.

The problem was that the old storage engine was too tightly coupled with the rest of Suneido. Not very good design on my part, but in my defense, the design came from the old C++ code, much of which is 10 or 15 years old.

So how should the code be structured? What's the best way to use Java packages and interfaces to reduce coupling? I looked for the Java equivalent of Large Scale C++ Software Design by John Lakos. (Seeing that it was published in 1996, I guess I don't really have an excuse for the poor structure of my old code!) But there doesn't seemed to be an equivalent for Java. (Let me know if you have suggestions.)

There are books like Effective Java but they're mostly about small scale. And there are books about J2EE architecture, but I'm not using J2EE.

Books like Growing Object-Oriented Software (recommended) talk about coupling from the view point of testability, but don't explicitly give guidelines for high level architecture.

In addition, what I needed wasn't just less coupling. I needed to be able to configure the code to run with either storage engine. For that, there couldn't be any static coupling between the storage engine and the rest of the code.

One really crude way to do this would be to simply rename one of the two storage engine packages to be the name referenced by the rest of the code. That would require changing every source file. Eclipse would do that, but it wouldn't play well with version control. And I would have to manually ensure that the two packages were defined the same public methods.

Instead I decided to clean up my architecture. In case it's helpful to anyone else, I'll sketch out what I came up with. Some of it is standard advice, some is a little more specific to what I needed. None of it is particularly original or unique.

First, other packages (i.e. the rest of the system) should only reference interfaces that are defined in a separate interface package.

That means no direct field access, you have to use setters and getters for public access.

It also means no public static methods or constructors. Instead there is an interface for a "package" object. A singleton instance of the package object is created at startup. All access to the package starts here. So to use the new storage engine you just have to use its package object. This class contains any public constructors and static methods and is the only public class in the package.

One mistake I made, coming new to Java from C++, was to make methods either private or public. I should have been using package scope a lot more than public. (Which might explain why it's the default in Java!) Now, the only methods that are public are the ones implementing public interfaces.

It's nice that Java (since version 5 / 1.5) allows covariant return types. i.e. the public interface can say that a method returns another public interface, but the actual method implementation can return the concrete type. This reduces the need for casting within the package.

I've finished refactoring so the old storage engine code follows these guidelines. I thought I might run into areas that would take a lot of redesign, but it's gone surprisingly smoothly.

Next I have to refactor the new append-only storage engine to implement the same interfaces. Then I should be able to easily switch between the two.

Of course, the interface I've extracted from the old storage engine doesn't match the new storage engine so I'll have to do some refactoring to end up with a common interface.

As usual, the code is public in Suneido's Mercurial repository on SourceForge.


No comments: