The Software Life: 2012

Thursday, December 13, 2012

CommaStringBuilder

I realized I was building a lot of comma separated lists in my Suneido Java code. I had used a mixture of approaches.

If I had an array or iterable, I could use things like Guava's Joiner or Ints.join. But if I needed to process the items that wouldn't work. (You could use something like Iterables transform or FluentIterables, but without lambdas, it's a lot of overhead to write a class just for one line of processing.)

Otherwise, I'd use a StringBuilder with a variety of ways to handle getting commas between items but not before the first or after the last.

Sometimes I'd do:

Other times:

And other times:

Note that in the last two you need to check for an empty list or else the deleteCharAt or substring will fail.

I decided to write a utility class to make my code simpler and more consistent. Here's an example of its use:

The add methods are for the comma separated items, the append methods are just passed to the internal StringBuilder. The comma separator is hard coded, but obviously it would be easy to allow passing it in. I decided to implement the Appendable interface since it was easy.

Wednesday, December 05, 2012

The Joys of High DPI

I recently got a new monitor at work (a Samsung SA850). It's a nice monitor and it matches the resolution of my home iMac (2560 x 1440).

The first hurdle was that I was using the on-board video and it didn't handle that high resolution. I needed a new video card. It still didn't work. Finally figured out it needed a dual-link DVI cable. Thankfully our company hardware guy dealt with all this. Just hearing about it reminded me of how much easier it is to just go buy an iMac. (And this is before all the software fun described below!)

I was hoping the new video card would raise my Windows Experience Index, but strangely, it made it worse! Before my desktop graphics was at 5.5, now it's 4.6. The gaming video index increased, but I don't do any gaming! I guess these cards are optimized for gamers. Anyone have a recommendation for a good video card for programming? (preferably low power)

Once it was working, I found the everything a little small, especially the fonts. (My eyes aren't getting any younger unfortunately.) So I played with the Windows 7 display settings. 125% was too big, so I tried a couple of custom sizes and settled on 115%. Afterwords I thought to measure the actual DPI - it was roughly 109 dpi (not exactly a "retina" display - my MacBook Pro is 220 dpi) Windows standard is 96 dpi. 109 / 96 is about 114% so my 115% setting was very close.

Note: You can adjust the size of Windows icons and their labels separately so don't go by that when choosing the scaling. (Select one, hold down the CTRL key, and then use the mouse scroll wheel)

So far so good, except now Chrome (and other programs) look "fuzzy"! It turns out, this is due to Windows DPI scaling. Taking the bitmap for a small font and just enlarging it by 115% gives crappy results. You can turn off the scaling by setting "Disable display scaling on high DPI settings" in the Compatibility section of a program's Properties. (for 32 bit but not 64 bit programs, and for local but not network drives, sigh) Of course, this may mean it goes back to smaller fonts, but you can choose font sizes within Chrome. To their credit, Firefox and Thunderbird were not fuzzy. It's hard to tell if they are actually scaling since 15% is a fairly subtle difference.

To avoid this issue, programs need to declare themselves DPI aware, either in their manifest or by calling SetProcessDPIAware. (See the MSDN article Writing High-DPI Win32 Applications)

I'm amazed that so many programs don't handle display scaling! It's not like this is something new. Sure, most people don't have high DPI monitors, but of anyone, I would expect programmers to have them, and therefore care about such things. I guess not. Even some Windows dialogs come up fuzzy!

I feel a little guilty about this. For years I've been telling people NOT to set their LCD monitor resolution lower just to get larger fonts. Leave it at its native resolution and use the Windows scaling, I tell them. But now I can see why they do it. Just using the Windows scaling gives even worse results than non-native resolutions!

Even setting scaling is an ugly user experience. (This is Windows 7) You start with Control Panel > Appearance and Personalization > Display. (searching for DPI will get you there). Notice that the title "Display" doesn't even mention anything to do with scaling, even though that's all that's on this screen. To get a custom size, you have to pick "Set custom text size (DPI)". This is the first mention of DPI. Notice that it says "text size" even though the main screen says "size of text and other items". This gives you an image of a ruler. And if you figure it out (I didn't at first) you can hold a physical ruler up to the screen and drag until they match. Unfortunately, the only way to really see what it will look like is to click OK, at which point it makes you log out (shutting down all your programs) and log back in again.

It gets worse. At the bottom of the Custom DPI Setting dialog is a check box labeled "Use Windows XP style DPI scaling". In the words of Princess Bride, "I do not think it means what you think it means". From what I could find out, this actually means something more like "Use ONLY Windows XP style DPI scaling". Or in other words, disable Windows Vista (and later) DPI virtualization. (more info in High DPI Settings in Windows) If this box is unchecked, then virtualization is actually enabled.

Then I discover that Suneido is fuzzy. Argh! I confirmed that calling SetProcessDPIAware fixed the fuzziness. But I ended up updating the manifest since that seemed like a safer approach. Mostly, Suneido was actually already "DPI aware" since it used GetDeviceCaps(hdc, GDC.LOGPIXELSY) to scale fonts. But there were a few spots that weren't scaling (e.g. tab controls). I also realized that Suneido wasn't using the best fonts. We were using "MS Sans Serif" for the default font and this is a bitmap font which scales poorly. Which is why we overrode it to use Arial for larger headings. The current standard for Windows is Segoe UI, and prior to that, Tahoma. We were using Courier New for mono-spaced code, whereas Consolas is much nicer (IMO).

So I spent a bunch of time cleaning up and modernizing the fonts in Suneido. I think the end result is a lot nicer. However, things have changed, which will mean complaints. (Yes, some of our customers complain about any kind of change, no matter how small, regardless of whether it's an improvement or not. You can't win.)

I'll end with a wish that someday we might have GUI's that are truly scalable. I realize one of the problems is bitmaps, but we could use vector graphics for icons (which is often how they're designed in the first place). Photographs and video aren't the problem - we scale them all the time. Apple is no better in this respect. They might hide the issues better than Windows, but they are still tied to certain DPI and on their mobile devices, specific screen sizes.

Sunday, November 25, 2012

Creatorverse

I continue to be amazed by all the great phone and tablet apps. (Admittedly, amongst a lot of junk.) If only I had time to play with them all!

Linden Labs (makers of Second Life) has released an app called Creatorverse that looks like a drawing app, but is actually much more.

Sunday, November 11, 2012

Nitpicking Google UI

Here's a classic mistake that I wouldn't have expected from Google. Can you spot what I'm talking about in this Google Calendar screenshot?

My complaint is with the Repeat checkboxes. Because the labels and the checkboxes are evenly spaced, it's not clear whether a label goes with the preceding checkbox or the following one. Obviously, you can tell by looking at the beginning or the end, or by assuming the convention that checkboxes usually come before their labels.

If you "Zoom in" to expand the font size it gets worse (at least in Chrome):

at a quick glance, it looks like Thursday is checked, but it's actually Friday. (Admittedly, lots of pages don't handle zooming very well.)

As you can see from the second screenshot, you just need to change the spacing slightly to give a clue to the association.

Another solution would be to put the labels above or below the checkboxes. That would make it clear, but it would be non-standard.

Saturday, October 27, 2012

How quickly we take technology for granted

Amazingly, there was free wifi on the Mexico side of the border crossing at Tecate.

It took Shelley about two minutes to go from "cool!" to "why don't they have a better connection?"

Admittedly, that's because we moved forward in line and lost the connection.

But it reminded me of how quickly our customers go from "wow, what a great new feature" to "this sucks, why doesn't it do xyz as well?"

Thursday, September 13, 2012

Thirty Years

It's been 30 years since Ken and Larry and I started Axon. Back then, I was a 22 year old computer geek. Now, I'm an older (you can do the math) computer geek. I've spent more than half my life at it. Ours was not an overnight success story. More a matter of being too stubborn to give up and go work for someone else.

Some things haven't changed. What I wanted to do back then was write code. What do I like to do now? Write code. The trick was to figure out how to make a living writing the code I wanted to write, not just get a job writing code for someone else. In the end, I've been pretty successful at that. I've ended up writing some code that maybe wouldn't have been my first choice, but that's been the minority. And that's been more than offset by actually having customers and users, and a programming team, not just coding in a vacuum.

It's nice that Axon is now successful. I've been extremely fortunate. Although, to be honest, we spent so many years struggling that I feel vaguely uncomfortable not to be struggling. I see all our staff with their steady pay cheques and all the money flowing in and out of the company, and it seems a little alien. Even my mother never quite grasped our success. Long after it was a silly question, she would continue to ask me "Did you pay yourself this month?", and then slip me a twenty for coffee money. My father never lived to see our success. He would have been amazed.

I can see why entrepreneurs move on to start new companies rather than stay with a successful one. It's different. But I'm not really an entrepreneur, my interest was never the business side. So I guess I'll just keep on coding.

Wednesday, September 12, 2012

Memory Usage of Suneido Versions

I've been assuming that the Java version of Suneido (jSuneido) used a lot more memory than the C++ version (cSuneido). But I think what gave me that idea was that I've been running the 64 bit version of Java. Running jSuneido on the 32 bit version of Java doesn't use a lot more memory than cSuneido.

This was after running our complete application test suite.

Of course, you will still want to use 64 bit Java if you are running a server for a big database and lots of users. In this case you should have lots of memory, so the extra usage is not important.

But for standalone development, it's nice to know you can run with not much more memory than cSuneido.

Tuesday, September 11, 2012

Using MethodHandle's in jSuneido

This is a continuation to my last post, Moving jSuneido to Java 7.

Previously, the implementation of a built-in method looked like:

The new version, using method handles (behind the scenes) looks like:

At first, I wasn't sure how to handle the parameter information since I no longer had a class to put it in. Then I thought of using annotations, which I think works very nicely.

To use an annotation, it was easiest to specify the parameter information as a single string description rather than separate arguments. This means a little more work to parse it and split it up, but it makes the description cleaner. Of course, I could have done this before as well.

As before, during startup the methods are picked up by reflection. The difference is that now an instance of a generic class is created, containing a MethodHandle pointing to the static method. I still need the methods and instances to handle adjusting call arguments i.e. filling in default argument values, handling named arguments, and collecting or spreading arguments. (Suneido has a more flexible argument / parameter scheme than Java.)

Once I start using invokedynamic the argument adjustment will probably be done by composing method handles. I could do some of that now if I wanted, but it fits better with invokedynamic.

One advantage of this approach is that the methods can now be static and don't need to take an implicit "this" argument (which I didn't use) in addition to my own explicit "self" argument. Before, they had to be instance methods so I could call them virtually.

Meanwhile, I've been playing with invokedynamic and figuring our how best to use it. I'll post more on this as I progress.

Sunday, September 09, 2012

Moving jSuneido to Java 7

I'm finally getting around to using Java 7 features in jSuneido. I've been hearing about this stuff for so long, but it hasn't seemed practical to use it till lately. At first, Java 7 wasn't available. Then it was available, but there were some issues. Then Eclipse didn't support it. Now that these issues are resolved, and end-of-life for Java 6 has been announced, it seems like I can finally go ahead.

The main thing I've been waiting for is JSR 292 - Supporting Dynamically Typed Languages on the Java Platform.

For some reason I have a hard time wrapping my head around invoke dynamic + bootstrap + method handles. The ideas seem simple enough, but actually figuring out how to use them properly is tough. Part of the problem is that there's not much documentation available. That's not surprising, considering that the audience is primarily language developers and therefore small. There's the JSR292 Cookbook, which helps, but it's just code, with no explanation. There is a corresponding slide deck from last year's JVM Summit which is still mostly code, but at least highlights the critical pieces.

I decided to take the smallest first step I could - just use java.lang.invoke.MethodHandle to avoid making a class for each built-in method. Even that gave me a few problems because I wasn't sure how much of my infrastructure for default and named arguments I could/should replace. (I ended up keeping most of it for now.)

I got that working, but then I could no longer get ProGuard to run. (Note: I use it to produce a final jar with only what I need, I don't obfuscate.) It failed with:

Warning: suneido.language.Builtin$Method: can't find referenced method 'java.lang.Object invoke(java.lang.Object,java.lang.Object[])' in class java.lang.invoke.MethodHandle

I searched on the web, but couldn't find anything helpful.

I tried every combination of ProGuard options without any luck.

I eventually looked at the source for MethodHandle (which I somehow had available in Eclipse). The invoke method is native, so I tried the various options which ProGuard recommends for keeping native methods, again with no success.

I think the problem is that MethodHandle.invoke is "signature polymorphic", meaning it doesn't have a fixed signature. I'm guessing that is confusing ProGuard. (Although the signature is (Object... args) so it should accept anything anyway.) Hopefully they'll fix it at some point when more people try to use it.

In the meantime, I more or less gave up and just added:

-dontwarn java.lang.invoke.MethodHandle

to my ProGuard configuration. That "fixes" the warnings and the resulting jar appears to work fine.

Just using MethodHandle for built-in methods doesn't really accomplish much, but it's a start. The real benefit will come from using invokedynamic, but that has a lot more moving parts to figure out.

Thursday, September 06, 2012

Eclipse + MercurialEclipse Problem

Since I updated to Eclipse Juno 4.2 I've had problems a few times where Eclipse won't start - you just get a pop up saying an error has occurred and to look in the log. Often this seems to follow killing (Force Quit on Mac OS X) Eclipse after it freezes (locks up).

You can see from the log that the problem seems to be related to the MercurialEclipse plugin. I'm not sure if the original freezing is also caused by MercurialEclipse, but I wouldn't be surprised.

Side note: Mac OS X doesn't make it easy to access files starting with a dot, in a directory whose name starts with a dot. You can't really do it with the Finder (AFAIK). You can't even do it with an editor since the standard open file dialog also hides them. I usually just open a terminal window and cat /Users/andrew/workspace/.metadata/.log (On the positive side, you can copy the path from the error pop up and paste it into the terminal window.)

The solution I've come up with is:

go into your Eclipse/plugins directory
rename com.vectrace.MercurialEclipse_<version>.jar to ...jarX (i.e. add X on the end)
start Eclipse (it should work now, but you won't have Mercurial)
quit from Eclipse
rename the file back (remove the X from the end)

For me, this seems to solve the problem. I'm guessing that what is really required is to "reset" MercurialEclipse and I've just found a roundabout way to do that.

Wednesday, September 05, 2012

Kotlin - a JVM language

I've been aware of the Kotlin JVM based language for a while, but hadn't taken a close look at it. I have to admit I thought it was odd for a new language to be coming from an IDE developer (Jetbrains). But who am I to talk about who should or shouldn't develop their own language :-)

I like Scala for its power, but the complexity (especially the type system) scares me. I like Xtend for its simplicity, but it's somewhat limited. Kotlin seems to be aiming for a sweet spot somewhere in between the two - more powerful than Xtend, but not as scary as Scala.

Here are a few of the things I like (in no particular order)

optional semicolons
var and val
modest type inference
extension methods (like D and C# but more powerful)
(I like this better than Scala's implicit conversions)
traits (interfaces with method implementations)
function literals aka lambdas or closures
IDE support from the start
improved handling of null references
no checked exceptions
standalone functions (everything doesn't have to be in a class)
operator overloading
infix function calls
default and named arguments
no "new" keyword
when expressions
static imports
a goal of fast compilation
good Java integration

There are a few more attractive features like inlining and runtime generics that are coming.

Of course, because it runs on the JVM and has good Java integration, I could mix Kotlin with Java in jSuneido. And I'd still have access to libraries like java.util.concurrent and Guava and ASM.

Another intriguing feature is that Kotlin can compile to JavaScript. I'm not sure why they're working on this, since their stated motivation for Kotlin is to have a better alternative to Java to write their IDE and other tools. Regardless, it's a neat idea. I've even had the crazy idea of implementing Suneido in JavaScript, but I'm not sure what the advantage would be, other than running in the browser. And I prefer statically typed languages for systems programming. I guess if I converted jSuneido to Kotlin, then in theory it could run in the browser (except for all the Java libraries I use).

I installed Jetbrains IntelliJ IDEA Community Edition (free) and added the Kotlin plugin. I've never really used IntilliJ but it wasn't hard to get started. Hello world was easy and so was implementing one of the classes I've been using for language evaluation. The Kotlin version of this class was the most clear and concise of the languages I've played with lately. (Without being so concise that it becomes cryptic.) I didn't have any installation problems or weird errors. Although this is beta software, it seems quite good.

I have mixed feelings about Kotlin arising from a commercial company. On the one hand it can mean there are some real resources behind it. And if Jetbrains really does use it for their own applications, then they will have a strong motivation to improve and polish it. (And hopefully less tendency to go down the rabbit hole of complexity.) On the other hand, a company can decide to drop projects for various business reasons. Thankfully, much of Kotlin is open source, so it has a life outside of just Jetbrains.

In a way I'm glad I didn't look into Kotlin when it was first announced (July 2011). Even now, it's still in beta and somewhat in flux. But it seems far enough along to play with and consider using at some point.

Hi, my name is Andrew, I'm a programming language addict ...

Tuesday, September 04, 2012

Second Thoughts on C# and .Net

I got quite optimistic about nSuneido - a .Net implementation of Suneido. The C# language is decent, the Dynamic Language Runtime (DLR) looked promising, and Mono and MonoDevelop were better than I expected. I went quite a bit farther with my experimenting than I had with D.

But ... I started to run into some frustrations. First, the MSDN .Net documentation sucks. That wasn't a big deal when I was just writing simple code. But when I needed to use the framework, it wasn't so good. Granted, I was trying to work with memory mapped files, which is often tricky. And some of the problems may have been due to working in Mono, rather than on Windows. I ended up having to look at the Mono source code to figure out how to get things working.

Looking at the source code made me realize that Mono probably isn't quite as production capable as I would like. Don't get me wrong, they've done a great job, but unfortunately they'll always be a second class citizen playing catch up with limited resources.

Don't get me wrong, I realize there are going to be frustrations with any platform. The grass always looks greener than it actually is. You initially tend to focus on the problems that something will solve, not realizing what you might lose, or the new problems it will introduce.

Finally, I started to wonder if I really wanted to support multiple implementations. I realize that I actually chose quite well when I picked Java. I don't think there's any question that the JVM is the most stable, portable, and performant platform.

So why was I looking for an alternative to Java? There are two main things:

- Suneido's current user interface is Windows API based

- The Java language is ok, but it's ... old. We can do better.

(There are other issues, like the lack of fixnum's and value types, but you can't have everything.)

But it may be possible to solve both of these, and stay with the JVM platform and all its goodness.

- Java Native Access (JNA) can provide access to the Windows API

- Other languages are available on the JVM (more all the time). I could use Xtend, or Scala, or maybe Kotlin. And using another JVM language could be done incrementally.

I'm a strong believer in not starting software projects over from scratch. And I don't believe that's what I've done with jSuneido. Porting or writing an alternative implementation are not the same as starting from scratch. What I've found, especially lately, is that rewriting existing code, even multiple times, is very beneficial. You find bugs, discover better approaches, and gradually work towards a clean design. The jSuneido implementation is much better than cSuneido. And from rewriting code in Xtend, and D, and C# I continue to find better approaches.

I don't want to give up the benefits of rewriting, but I also don't really want to write and maintain multiple implementations. Perhaps sticking to the JVM and experimenting with alternate languages is the best compromise.

Saturday, September 01, 2012

No DRM Computer eBooks Update

You can now get most computer books in DRM-free ebook versions.

O'Reilly
Pragmatic Programmers
Manning
Apress
Informit (Addison-Wesley, Que, Sams)

I've run into a few older books that are available on Kindle from Amazon, but aren't available as ebooks from the above sources. I can see older books maybe not being available digitally, but if there's a Kindle version obviously that's not the case.

They all want you to sign up for their email lists. I prefer to keep my email free of that kind of stuff. Some of them have RSS feeds which I don't mind.

One of the advantages of getting the emails or following the feeds is to get discount offers. Another option for this is to do a web search for something like "oreilly coupons". This will usually turn up valid discount codes. I'm not sure if the companies like that, but they're the ones offering the discounts, so I figure it's ok.

It's too bad that Amazon doesn't sell DRM-free books. It's not ideal to have to go to multiple suppliers, and set up accounts on each, and deal with their differences. Some take PayPal, some don't. O'Reilly automatically sends my books to my Dropbox, the others don't. Maybe someday Amazon will see the light. Until then, they're losing my business.

Several of the sources offer access to "pre-release" versions. At first it seemed attractive to get access sooner, but I ended up preferring to wait to read the final version.

I've been looking forward to Tor (science fiction) going DRM-free, but the date has slipped from "July" to "summer", and now it's September. There's already been pressure by other publishers against Tor authors. [Update: I see on Amazon that at least some Tor books say: "At the publisher's request, this title is being sold without Digital Rights Management software (DRM) applied." I'm not sure why the Tor web-site still says "Coming in Summer 2012", maybe that's just referring to Tor's online store.]

Just to be clear, I prefer to buy my ebooks without DRM because it gives me more flexibility in where and how I can read them. I do not do it because I want to "pirate" them.

Thursday, August 30, 2012

nSuneido DLR Spike

After getting a C# version of the lexer working I decided to take a stab at the parser. Suneido uses a hand-written recursive descent parser so it's pretty simple code to port.

I've been doing some reading on the dynamic language runtime (DLR) so before implementing the complete parser I decided I'd push a spike through to actual code generation.

This turned out to be remarkably easy. Rather than tie the parser directly to the DLR methods, I created a ParseOutput interface. For testing I wrote a simple AstNode class and a corresponding AstOutput. Once the parsing was working I added a DyExprOutput class that creates a DLR Expression tree, which you can then compile via Expression.Lambda (see the tests at the bottom of DyExprOutput).

source code at GitHub (the link is to this specific version, you can navigate to the latest)

This only handles very simple expressions - math on numeric integer literals. (It doesn't handle even as much as the partial parser does.) But to get that far in Java took me way more work.

I found one book on this - Pro DLR in .Net 4 - which has helped. There's also quite a lot of material on-line.

Another nice feature is that the DLR is just a library - it doesn't involve any .Net internals. And even better, it's open source on CodePlex. It's troubling that it hasn't been worked on since 2010, but since the "dynamic" keyword in C# uses the DLR, I assume it won't be going away.

So far, I'm still quite positive about C# and the possibility of nSuneido. My primary motivation is that cSuneido is falling behind jSuneido, but I'm really not keen on doing a lot of work on the C++ code. And even if I did, it wouldn't solve the garbage collection issues. And we can't switch to jSuneido for the client because of the current Windows specific GUI.

Sunday, August 26, 2012

nSuneido ?

It's been a long time since I've looked very closely at C# and .Net so when I saw a special on the C# 5.0 Pocket Reference I figured I should catch up. I read it cover to cover, and surprisingly (to me) I was quite impressed. C# has come a long way since version 1. It's certainly moved much faster than Java. In my recent System Programming Languages post I discounted C#, but I'm starting to wonder if that was a mistake.

* "nSuneido" comes from the convention of using 'n' for .Net projects e.g. NUnit, NHibernate, etc.

I also picked up C# in Depth by Jon Skeet (of StackOverflow fame). It was perfect for me because it covered the evolution of the language, not just how to use it.

One concern with C# and .Net is portability. But Mono seems to be in good shape and in active development. So far all my playing with C# has been on the Mac with Mono and MonoDevelop. Of course, Mono lags somewhat behind - it's on .Net 4.0, whereas the latest is .Net 4.5. And it's performance isn't quite as good, but it's not bad.

There are a lot of things I like about C# and .Net (in no particular order)

It meets my mandatory entry level requirements of safety and garbage collection
Unlike Java, it has value types (in addition to reference types)
Like Java, .Net has a VM and IL (byte code) that Suneido can target
(whereas with native languages like C++ Suneido has to implement its own VM)
Stable, robust, performant platform
Rich ecosystem of books, tools, libraries, developers, etc.
Easier to find programmers who know C#
Extension methods
Class members and fields default to private
Decimal floating type (one less thing to implement)
Lambdas :-)
Delegates (method handles)
Named and optional arguments
Type inference (var) although only for local variables
Dynamic language runtime (DLR)
Good access to Windows API's (PInvoke)
Annotations
Anonymous types and tuples
Operating overloading
Nullable (a bit like Scala Option)
Unsafe code if needed
Good IDE support
Generics on unboxed primitives (unlike Java)(no need for Trove collections)
Reified generics (unlike Java)
LINQ

There are, of course, some things I don't like. I'd really prefer optional semicolons (like Scala and Go), but I can live with them. It's taking a little getting used to capitalizing public methods and fields even though this is similar to Suneido. I miss Java's static imports - there's no way (that I've found) to have something you can call as simply Foo(). It always has to be Something.Foo()

With Java I could get almost all the tools and libraries I needed open source. With C# it seems a lot more commercial. Visual Studio and popular add-ons like ReSharper and Reflector are all commercial products.

I'm maybe just not up to speed on the C#/.Net ecosystem, but it doesn't seem like there aren't as many good libraries available. I would miss Guava. I need to learn more about what is and isn't available in the standard .Net libaries.

Once more, I started by implementing the Suneido lexer. This is turning into a coding kata for me, but I still find ways to improve or simplify each time I re-implement it. I took advantage of a number of C# features like extension methods and even lambdas. The code is on GitHub along with the D version. By the time I finished this exercise in D I had lost much of my enthusiasm and was somewhat disillusioned. In contrast, after the C# implementation I was still quite keen. In the long run, that probably doesn't mean much, but it's a good sign. As much as we might like these decisions to be objective it often comes down to subjective factors.

Ideally I would have just a single implementation of Suneido. (There are sometimes advantages to multiple implementations, but it's a lot more work.) Although the Java version is great for larger servers, it's not well suited to the current Windows GUI. And it's not great for smaller servers (e.g. smaller EC2 instances). An intriguing possibility is using Scala on both the JVM and .Net. I really like Scala, and the idea of writing one implementation that could be compiled to run on either the JVM or .Net is very attractive. The .Net version of Scala isn't production ready yet, but it's something to keep an eye on.

LastPass + Safari + Mountain Lion problem solved

Since I upgraded to Mountain Lion, every time I open Safari I got: "Safari 6.0 has not been tested with the plugin LastPass 1.73.0". Except I wasn't using LastPass in Safari and 1.73 is a really old version. Safari seemed to work fine, but it was annoying, even though I don't use Safari as my primary browser (I mostly use Chrome, and Firefox for LogMeIn). In case anyone else out there has the same problem, I'm posting how I solved it.

I tried to track it down several times with no success. Safari > Preferences > Extensions didn't show anything. I couldn't find anything useful searching on the web. Searching for "lastpass" files didn't uncover anything. I tried deleting extensions.plist and the Safari cache.

Finally I saw a suggestion (for an unrelated problem) to download LastPass and run the uninstaller. The closest to 1.73 I could find was LastPass 1.75 on the LastPass old downloads page. I opened the dmg and run the uninstaller, and that solved the problem. Now Safari opens with no error messages. I still don't know where the extension was lurking though! (I even checked the Trash after running the uninstaller, but couldn't see anything.)

Wednesday, August 22, 2012

D-etractions

I love the potential of the D language, but I have to admit I'm becoming a little disillusioned with the current reality. This is of course, extremely subjective. I don't want to get in a fight with D supporters. I agree it's a great language. To me, most of the issues revolve around a lack of maturity and a small user base. If I was just playing, that wouldn't be a big deal. But for an implementation language for Suneido, I'm looking for something rock solid. Java may be old and boring and "weak", but it's solid.

One of the big issues for me is the lack of tools, especially good IDE support including refactoring. You know this is unlikely to be a priority when the main developers are old school text editor users. (In my mind I can hear them saying "we don't need no stinking IDE") Scala had the same problem until recently, also due to the main developers not using IDE's. But recently they've made IDE support more of a priority. Maybe not so coincidentally, this coincided with the formation of a commercial entity (TypeSafe) promoting Scala. I've done my share of old school editor + command line programming, but to me, a large part of the benefit of statically typed languages is that they allow powerful IDE manipulations.

A similar issue is the meager ecosystem e.g. books, libraries, tools, etc. I look at the libraries and tools I use with Java and few, if any, are available for D.

One thing that makes me a little nervous is D's fairly simple (from what I've seen) garbage collector. When I think about the effort that has gone into the JVM garbage collectors, I wonder how D will compare. D's garbage collector is also conservative in at least some situations (I'm not sure of the details). Based on my experience with conservative garbage collection in cSuneido, this increases my nervousnous.

D's templates and compile time function evaluation, and mixin's also worry me. They are very powerful and very cool, and they might be way better than C++ templates, but they're still complex. Of course, it's the usual story, you judge a new language when you're a newbie, so you don't necessarily have the basis to form a good opinion. But we seldom have the time to become expert before making a decision. I do have a fair amount of experience with C++ templates to judge by. Templates, as in D and C++, are undeniably powerful. Much more so than the generics in Java or C#. But I wonder if they are on the wrong side of the power / complexity balance. And these advanced features appear to be absorbing much of the energy of the development, to the neglect or even detriment of the overall maturity.

I love the potential of explicit "pure" and "immutable". I wish other languages like Java and C# had more of this. But from the meager exposure I've had, the reality is not as nice. I'm sure some of that is simply learning curve. And some of it may be improved by future language improvements. But when you can't put immutable values in a container without wrapping them in a "Rebindable" template, that again makes me nervous.

Of course, you could use a subset of D and try to ignore templates etc. but this is hard psychologically (I got sucked into mixins just to implement lexer tokens!), and also because the libraries make heavy use of templates. Then you'd primarily be consuming templates, not writing them, but you still need some understanding even to use them.

One of my concerns with Suneido implementation is making it easier for other programmers to get involved in the implementation. (i.e. improving the bus factor) In that respect, I think the jSuneido Java code is much better than the cSuneido C++ code (with templates!). And it's probably easier to find Java programmers than C++ programmers, let alone D programmers. (Of course, the challenge of finding good programmers remains.)

I plan to keep an eye on D and hopefully continue to play with it a bit. I wish the project good luck. But for now, I am going to put on hold any ideas of using it to replace cSuneido.

See also: D-tours and System Programming Languages

Saturday, August 18, 2012

Btree Range Estimation

Suneido's database query optimization needs to select which indexes to use. Part of this is estimating what portion of a table is spanned by a given range of index keys.

My assumption in the past was that a crude approximation was sufficient, since usually the best index is obvious. So the estimation would only look at one or two of the top levels of the btree.

But recently, as a result of some performance work on our applications, we've found a number of cases where this isn't true, where you need a more accurate estimate in order to choose the best index. This is usually with large tables, e.g. millions of records. With big tables like this, only looking at the top one or two levels of the tree is not sufficient to differentiate e.g. an index range of a few records versus an index range of thousands of records. In this situation, the "wrong" choice can make quite a difference to performance.

One of the sources of inaccuracy is from estimating the size of a Btree level. So my first idea to make it more accurate was to look at all the nodes on each of the top levels in order to get an exact count. I implemented this, and it was an improvement, but the downside was that it could end up reading a lot of nodes because Btrees have a large fan-out.

So I went back to my old approach of doing searches for the beginning and ending values of the range, and estimated the level sizes and therefore the portion of the tree between them. I made a better estimate of the level size by basing it on the size of the nodes passed through on the search. Since this approach only does two searches down the tree (not reading whole levels) it's reasonable to go further down the tree to get a more accurate estimate. In the end, I found some cases required searching all the way down the tree to the leaves. This is ok, since Btrees are shallow due to their large fanout. (Even going all the way to the leaves is still not 100% accurate because you're still only approximating the level sizes on the way down.)

This was better, but there was still an annoying flaw - because node sizes vary, if a search happened to pass through smaller nodes, it would estimate different level sizes than a search that happened to pass through larger nodes. Different level size estimates would result in poorer estimates. Differences in level size estimates near the top of the tree were especially bad since their effect was multiplied. For small ranges (e.g. "close" values) this wasn't a problem because the two searches would pass through the same nodes at the top levels of the tree and therefore estimate the same level sizes. But larger ranges could be less accurate.

The improvement I came up with was to use the average of the search's estimates of level sizes. Not only does this give a better estimate, but it also means that the two searches use consistent level size estimates, which is just as important as the estimates being accurate. It was a little more awkward for coding since the averaging had to be done in the middle of the searches. I originally wrote the code as recursive, but ended up with an iterative approach since it was shorter and simpler. (Not because it might be faster.)

This may be a known technique, but I haven't encountered it so I thought I'd share it. If anyone is aware of better approaches, let me know.

jSuneido btree source (scroll to the end for the rangefrac methods)

Monday, August 06, 2012

Security

A friend recently had their Gmail account hacked. They had a strong password, but they admitted they had used the same password for other sites. Luckily, they discovered it quite quickly and it doesn't appear that any harm was done.

It's a good idea to use a separate, strong password for your email account because your email account is the "master key" to all your other online accounts. (most password recovery mechanisms use email) I do sometimes reuse the same password, but only for unimportant things like forums or bug trackers which don't have credit card numbers or any other critical information.

This incident gave me the push to get around to setting up 2 factor verification on my Gmail account, something I've been meaning to do for a while.

It wasn't too painful. I installed the Google Authenticator app on my iPhone so I don't have to use SMS. I also set up my LastPass account to use the Google Authenticator as well.

Most people can't even manage to use decent passwords, let alone deal with 2 factor authentication. But if you can put up with the small extra hassle, it's probably worth it.

Also on the security front, I recently decided to install anti-virus software on our Mac's. There's still debate on whether this is necessary, but with Apple's growing popularity, they're becoming an increasingly attractive target. I picked the free version of Sophos to start. It's always hard to tell how good anti-virus software really is, but this was easy to install, hasn't had any noticeable affect on performance, and is completely unobtrusive. Of course, it's yet one more piece of software running on your machine, and can't help but slow down startup.

Sunday, August 05, 2012

D-tours

I've been continuing to spend a little time here and there playing with the D language. For a sample project I decided I would try implementing Suneido's lexical scanner. It's pretty simple, just a little string and character manipulation so I thought it would be easy.

I based the D version on the Java version since it's newer and cleaner than the C++ version.

The first thing I ran into was that the lexer returns "tokens", which in the Java code are not just integer enums, but actual enum classes with fields and methods. Java went from no enums in early versions, to quite sophisticated enum capabilities.

In typical new user fashion, I tried to reproduce the Java style token enum in D. (It's hard not to avoid "cutting against the grain" when learning a new language.) D does allow you to use structs for enums so it seemed this would work. But if you use structs then you lose the automatic assignment of consecutive integer values, and D struct's are value types so they aren't necessarily unique.

Then I went off an a tangent looking into D's "mixin" ability. D has the ability to run D code at compile time, generate a string, and insert that string into your source code. This is known as CTFE (compile time function evaluation). For example, you can write:

mixin(myfn(arguments...);

where myfn returns a string containing source code to be compiled into your code in place of the mixin "call". This is easier than trying to use C++ templates to do compile time calculations, but not as seamless as Lisp macros because you have to work at the level of strings.

An example of the power of this is the Pegged library for D. This allows you to include a grammar specification in your source file, have a parser generated from it, use it to parse DSL code also included in your source file, and then use the resulting parse tree to generate D code - all of this at compile time, without requiring any external tools.

It was fun looking at what you could do with mixin but it was overkill for what I was trying to do. I realized I could use another feature of D to use simple integer enum tokens, but still work with them as if they were objects with properties.

enum Token { IF, ELSE, AND, OR, ...
bool isKeyword(T token) { ...

D allows you to call functions like isKeyword as if they were methods on their first argument, so you can say token.isKeyword() even though token is an int, not an object.

This also led me to discover D doesn't have a "set" data structure, which surprised me a little. It does have hash maps built-in to the language, which can be used to emulate sets, and it has a red-black-tree implementation which could also be used. But it doesn't have an actual "Set" class. It was easy enough to throw together my own Set so I could make a set of keywords. (Which I didn't end up using because what I really needed was a map of keyword strings to tokens.)

What I initially wrote was:

static auto keywords = new Set!Token(IF, ELSE, ...);

But that won't work because static initializers can only be simple literals. However, you can write:

static Set!Token keywords;
static this() { keywords = new Set!Token(IF, ELSE, ...); }

I'm not sure why the compiler can't do this same translation as simple syntactic sugar. It's extra annoying because you can't use the "auto" type inference.

You could do it with a mixin like:

mixin(stat("keywords", "new Set!Token(...);");

but that's not exactly pretty.

Another minor annoyance was the lack of an equivalent to Java's static import. If I want to refer to the tokens as just IF or ELSE I could make them "anonymous". But then they don't have a specific type. Or I can give the enum a named type, but then I have to always reference them with Token.IF . In Java I could say import static Token.* and then just use the bare names.

I ended up going back to a mixin approach with each Token a static class. I used a class rather than a struct because I wanted to pass tokens around by reference, not by value. See token.d

I also ended up writing multiple versions of the lexer. As with most rewrites, I found a number of improvements I could make. Most of them aren't specific to D, I could apply them to the other versions as well.

After getting it working, I decided to go back and write it in a functional (rather than object oriented style). In this version the lexer is a function that takes a string and returns a struct with the token, its string value, and the remainder of the source string. This version should work with Unicode UTF8 strings, not just ASCII, because I only access the source string with front and popFront.

Along the way I also started a simple version of Hamcrest style asserts. I almost expected something like this to be available, but I couldn't find anything.

One of the features I was excited about with D was immutability. But when I tried to make Token immutable, I ran into all kinds of problems (as others have also). One of them being that you can't store immutable objects in a map! Hopefully this is something they'll figure out. I eventually gave up on immutable and managed to get const to work, although even that was a struggle.

I set up a dsuneido GitHub repository for my experimenting. I haven't used GitHub before, but I keep hearing how great it is so I thought I better give it a try. I used their Mac GUI app, which made it quite painless. However, I'm not sure it's the smartest idea to be using three different version control systems - Subversion for cSuneido, Mercurial for jSuneido, and Git for dSuneido. I chose Mercurial for jSuneido because they had better Windows support than Git at that time, but now that Eclipse comes with built-in Git support, I wonder if I made the right choice.

I'm not sure where I'm going with this, so far I'm just playing and learning D. So far, I like it, but it certainly has some rough edges.

Saturday, July 28, 2012

Upgrading to Mountain Lion

I couldn't resist upgrading to Mountain Lion right away. It's gone quite smoothly.

I have multiple machines to update, so rather than download 4 gb multiple times, I used the free Lion Disk Maker to make a "disk" that I could use on each machine. WARNING: If you want to make a disk you have to do it after you download from the App Store, but before you install (because the install removes the files). You need an 8 gb device. I didn't have an 8gb USB thumb drive handy so I used an SD card from my camera. You can do the same process manually (as I did with last time with Lion), but the disk maker utility makes it easy.

It took something like an hour to run the update from the SD card on each of my iMac and MacBook Air. I also updated to Xcode 4.4 (free from the App Store) and installed JDK 7u5 (OS X is now one of Oracle's supported platforms for Java). The extra installs were probably the wrong thing to do in terms of isolating the source of problems, but it was nice to get a bunch of updates done at once.

Here are the issues I've run into so far:

The first issue was Gatekeeper stopping me from installing programs that didn't come from the App Store. I went to turn this off but I found that you can control click on programs and override Gatekeeper, so I left it turned on.

TIP: Update all your software to the latest versions before upgrading to Mountain Lion. My iMac was up to date, but my MacBook Air wasn't. After the upgrade Parallels and Dropbox had been moved to an Unsupported Applications folder. I just downloaded and installed the latest versions and I was fine, but it would have been easier to update first.

I had to reinstall Mercurial tools (as I did when I upgraded to Lion).

When I tried to run the Java Preferences (Applications/Utilities) it said I required Java 6 and offered to install. I already had Java 6 but I told it to go ahead. It now shows Java 6 32 bit, Java 6 64bit, and Java 7. I made Java 7 my default (by dragging it to the top of the list)

Eclipse seems to work fine, but JUnit Max quit working. It seemed to be running the tests, but the success/fail indicator stayed blank. This could have been an OS X issue or it could have been from switching to Java 7. Rather than try to debug it I just switched to Infinitest. It has quite similar functionality and it's open source.

The D compiler couldn't find GCC, which I solved by re-installing the Xcode Command Line Tools (Preferences > Download)

Parallels Coherence mode has some issues with window layering that are annoying but not fatal, hopefully they'll fix it soon.

So far I haven't really noticed much difference running Mountain Lion. Despite the "200+ new features" there's not a lot that's significant to me. That's fine, I still prefer to stay on the latest version, for security improvements if nothing else.

See also: Upgrading to Lion

Thursday, July 26, 2012

More on Parameters

We have a lot of Suneido code where the class constructor simply sets members (i.e. instance variables or fields) from its parameters:

New(foo, bar)
    {
    .foo = foo
    .bar = bar

which is similar to what you'd do in Java:

class C {
    int foo;
    String bar;
    C(int foo, String bar) {
        this.foo = foo;
        this.bar = bar;

Scala has a nice shortcut for this:

class(foo: Int, bar: String) { ... }

For Suneido, I decided to allow:

New(.foo, .bar)

This would be equivalent to the first example. (with foo and bar still available as regular parameters as well)

You can also combine this with implicit dynamic parameters:

New(._foo)

So you'd end up with a foo member in the instance, that could either be passed explicitly, or could come from a dynamic variable set in one of the callers.

The way this is implemented, it will also work on regular methods, not just the constructor. I can't see a lot of use for that other than setters, but it was easier than restricting it.

I implemented this (along with dynamic implicit parameters) first in jSuneido, which was fairly straightforward, and then in cSuneido which was a little trickier but not too bad. (I have to say I prefer working on the Java version of Suneido these days. The code is cleaner, and the tools are better.) The changes are in version control on SourceForge.

Friday, July 13, 2012

Implicit Dynamic Parameters

Don't worry if you don't understand the title - it's a term I just made up. Hopefully by the end of this post you'll know what I mean by it.

They say a big part of creativity is combining existing ideas in new ways. This idea is a combination of ideas from Suneido (via Lisp), Scala, and dependency injection.

Suneido has a little used feature where a variable that starts with an underscore is "dynamic", meaning it is available to functions that it calls, directly or indirectly. Nowadays, most languages are "statically" (i.e. lexically) scoped because that makes the code easier to understand. But Lisp (and probably other languages) had "dynamically" scoped variables where you could access variables from up the call stack.

We've never made much use of dynamic variables in Suneido and I regarded them as deprecated. I didn't even implement them in jSuneido. (Although that was partly because I thought it would be hard.) But lately I've been thinking of some uses for them, so I went back and implemented them in jSuneido. (It didn't turn out to be too hard.)

Meanwhile, Scala has something called "implicit parameters" where a parameter marked implicit can be automatically supplied by a suitable value in the current (lexical) scope. I didn't immediately see the benefits of that feature, but it does let you do some nice things.

I realized I could combine these ideas in Suneido and allow implicit parameters that can be supplied by dynamic variables. (But could still be passed "manually" as well.) A neat idea and I can see some uses, but no big deal.

Then, one of the books I'm reading was talking about dependency injection. Dependency injection relies on having some kind of "context" that does the injection. But where do you get the context from? Making it global is ugly, passing it around is ugly.

I realized that I could use my newly imagined dynamic implicit parameters to do dependency injection. Nice!

You could either use dynamic implicit parameters to inject actual dependencies, or in more complex scenarios, use it to pass some kind of context (or factory, or service locator).

For testing (or just more control) you can still pass the parameters manually.

It seems like a pretty slick idea, but of course, that's partly because it is still just an idea. The real test is when the rubber meets the road.

Thursday, July 12, 2012

System Programming Languages

Or, more specifically for me, languages I'd consider using to implement Suneido. Here are some opinionated thoughts on C++, Java, Scala, Xtend, D, and Go.

First, I want garbage collection. Other than for things like constrained devices, as far as I'm concerned garbage collection has won.

For me, that's a major drawback to C++. So I wrote my own garbage collector for C++ which we used for a number of years. Eventually I replaced it with the Boehm garbage collector which has worked quite well, given the limitations of conservative garbage collection.

The last few years I've been using Java which gave me great garbage collection and one of the best virtual machines out there.

The big advantage to Java is the rich ecosystem. You have multiple choices for great IDE's (e.g. Eclipse, Netbeans, IntelliJ) with good refactoring support, lots of tools like JConsole and Visual VM, good libraries, both standard and third party (e.g. Guava and Asm), and tons of books and other sources of information.

On the other hand, the Java language itself is nothing to get excited about. Lately I've been looking at Xtend and Scala. Xtend is fairly modest, it's basically a better Java. See my First Impressions of Xtend

Scala is much more ambitious. It has some great features. The claim is that the language is simpler than Java. But I'm more than a little scared of the type system. Although it's that type system that allows some of the great features. Scala strikes me a bit like C++, you can do some amazing things, but it can also get pretty twisted.

I'd be more tempted by Scala, but as system programming languages, JVM language all have a major drawback - you can't write efficient low level code. Don't get me wrong, I have no desire to write "unsafe" code. I don't want to go back to my C and C++ days. Java has primitive (non-heap) types, but they don't work with generic code (without boxing onto the heap). I'd like to have "value" types (e.g. a pair of ints) that I could pass and return by value (not on the heap) and embed (not reference) in other data structures. Scala tries to unify primitives with the other types, but ultimately it comes down to the JVM, and there's still a lot of boxing going on.

Also, Java is primarily in the locks and shared state style of concurrency, a style whose drawbacks are well known. Scala is better on this front with its actor system, which admittedly is also possible from Java.

What else is out there? C# is a possibility, but I have to say I'm leary about Microsoft products. And portability means Mono, which is another question mark in my mind.

Recently I've been rereading the D book by Andrei Alexandrescu (also author of Modern C++). You can get a taste of his writing style in The Case for D. In some respects the D language could be called a better C++. But it has gone beyond that and has some very attractive features. It has value types as well as reference types. You can write efficient code like C++, but still make it safe.

D has garbage collection, although I suspect (with no evidence) that it's not as good as the JVM. Of course, part of the reason so much effort has gone into JVM garbage collection is that everything has to go on the heap. That doesn't apply quite so much in D (depending on the style of your code).

One of the features I really like about D is its support for immutability and pure functions. My feeling is that these are really important, especially for concurrent code. You can, of course, write immutable data structures and pure functions in any language, but D actually lets you enforce them. Unlike C++ const or Java final, D's immutable and pure are "deep". Making an array final in Java doesn't stop you from modifying the contents of the array. Making an array immutable in D does.

One minor disappointment is that D still requires semicolons. I think languages like Scala and Go have shown that it's quite feasible to skip the semicolons. But I could live with this.

I'm not against virtual machines like the JVM or CLR. But they have their drawbacks, especially for deployment. Ensuring that customers have the right version of .Net is bad enough and that's a Microsoft product on a Microsoft platform. We can require our customers install Java, but when they don't update it and there's a security hole, guess who they will blame? One of Suneido's main features is that it is self contained and easy to deploy. Requiring another runtime underneath it is not ideal. So D and Go generating native apps is attractive in this respect.

Another advantage to D is that (I think) it would allow accessing the Win32 API so Suneido's current user interface would still work. (albeit still only on Windows.)

Yet another possibility is Google's Go language. Personally I find it a little quirky. It's got some good features, but it's also missing things that I like (e.g. type safe containers). I'm currently reading Programming in Go but I'm finding it somewhat dry. I find it a lot easier to learn about a language from a well written book. I didn't really get interested in D until the book came out. And books got me interested in C and C++ and Scala.

One of the things discouraging me from Scala and D and Go is the lack of IDE support. That's gradually improving, but things like refactoring are still pretty limited. I programmed for years with just an editor, but once you get accustomed to a good IDE with refactoring support, it's hard to go back. To me, one of the big advantages of a statically typed language is that it allows great tooling. Unfortunately, Java seems to have a big lead in this area.

Currently, if I was to consider another language for implementing something like Suneido, I think I'd lean towards D.

Monday, July 09, 2012

Struggling with Eclipse CDT

I just spent a frustrating day trying to get Eclipse CDT set up to work on cSuneido (C++).

I've tried this several times in the past but always had problems and given up.

Every time a new version of Eclipse comes out I read about the improvements in the CDT (C/C++ Development Tooling) project and I feel guilty that I'm not using it and embarrassed that I'm too stupid to set it up.

Surely if I just devoted some time to it I could get it working. Maybe, but a day wasn't enough time, and I'm not sure how many days I can stand banging my head against the wall on it.

I realize that a portable C++ IDE is a tough problem. C++ is a gnarly language, and there are multiple versions of multiple compilers. A good IDE has to actually understand the language, which means it has to deal with all the intricacies of C++. I'm sympathetic.

But when I take a clean install of MinGW GCC (one of the supported compilers), and a clean install of Eclipse Juno CDT, and let it create a sample "hello world" project, I'd expect it to work. Not for me. It took me a couple of hours to get rid of the errors in this 10 line project. And even after all that, I was no further ahead than when I started. I just thrashed around with the myriad settings until some combination worked. No doubt most of my thrashing made things worse and just muddied the waters, but what else can you do?

The problems were mostly related to getting the right include paths. That seems like a straightforward problem, that should have a straightforward solution. But there are about four different places where you can set up include directories, and at least two different "auto discovery" systems that try to do it for you. And it's not enough to give c:/mingw/include, that would be way too simple. Instead you have to give every sub-directory that mingw happens to use, most of which are under the lib directory for some reason.

It was doubly frustrating because the project would "build" and produce a runnable executable, all while showing major errors.

Searching the web wasn't much help. Lots of other people have similar problems, although mostly with older versions of Eclipse and CDT so the settings they talk about no longer even exist.

There is an "indexer" which also gives compile type errors, plus multiple "builders", including "managed" ones, and ones that use your makefiles, and ones that generate makefiles. I couldn't tell whether errors were coming from the indexer, or something calling the actual GCC compiler, or from linking, or who knows where.

Eventually I got hello world to build and not show any errors so I moved on to Suneido, albeit with a certain amount of trepidation. I managed to get the include directories sorted out without too much trouble since I had a vague idea of which areas to semi-randomly play with the settings.

Nevertheless, several times I had to exit completely out of Eclipse because it got into seemingly infinite loops of building the project over and over. I don't know what triggered this or how to avoid it.

By the end of the day I managed to get rid of all the actual errors, although there are still a ton of warnings. Part of the problem is that once you have an error, it may not go away even if you fix the problem. I spent all my time running Clean and Build and re-indexing in attempts to figure out if I had fixed the problem or not.

A bunch of the warnings were from functions that weren't returning values. This weren't bugs, they were calling functions that never returned (e.g. threw exceptions or exited). The functions were marked with __attribute((noreturn)) and this works fine in GCC. But not in CDT. I found an Eclipse bug for this and left a comment, but I hadn't been able to isolate a simple test case so, predictably, the response was less than satisfying.

(Note: The Suneido C++ code builds fine with both GCC and Visual C++ using make)

I really wanted this to work. I am quite happy with Eclipse with my jSuneido Java code. I would really like to have better navigation and refactoring tools for C++. But even if I could get it working without tearing all my hair out, I have zero confidence that it's going to be a stable platform for development. All I can envision are things breaking randomly and unpredictably with cryptic errors.

I don't blame the CDT developers. They've got a tough job and I'm sure it would have taken them minutes to sort out what I flailed on for a day. But in the end, that doesn't help me get my work done.

Maybe I'll look at the latest NetBeans - I haven't tried it lately. Of course, I'm even less familiar with NetBeans than I am with Eclipse, which doesn't bode well!

Monday, July 02, 2012

Moving from Eclipse Indigo to Juno

Eclipse 4.2 Juno was released on June 27. Usually I wait a while to make sure that plugins have been updated to work with the new version, but I thought I'd give it a try and I found that all my plugins still worked.

So far I haven't run into any problems with Juno. I've seen some complaints about the UI changes, but I don't mind it. I got a few warnings from the new null analysis, although none of them turned out to be actual problems.

I did run into an old problem when I updated my C++ (CDT) copy of Eclipse - it crashed on start up. I knew I'd run into this before but I couldn't remember the solution. It turned out to be a bug in some versions of Java. I found I wasn't running the latest and after I updated the problem went away. See Bug 333227

It is possible to update an install of Eclipse 3.7 to 4.2 in place, but I prefer to do a clean install from a fresh download. It may be overly paranoid but I figure it can't hurt to start fresh. You can move your plugins over from a previous install using File > Import > Install > From Existing Installation. This saves re-installing them one at a time.You can also move your Preferences over by exporting them from your old copy and importing them into the new one. I haven't found any way to move my perspective layouts, but my normal layout is simple to recreate.

From my experience, if you're running Eclipse, I'd say give Juno a try.

Thursday, June 28, 2012

First Impressions of Xtend

I've been interested in Xtend for a while. It's an improvement over Java, without being a big jump like Scala.

I finally got around to playing with it a little. I took one of my Java classes from jSuneido and rewrote it in Xtend. But I got stuck with some weird errors around array access. I did some searching on the web and found that Xtend doesn't support array indexing e.g. array[i]. I was surprised. You can pass arrays and do for loops over them, but not index them.

I imagine part of the reason for this is that square brackets are used for closures, although in different contexts, so it seems like you could still parse it ok.

Then I saw you can write array.get(i) - more verbose, but more consistent with other containers. (Personally, I'd rather be able to use x[i] or x(i) like Scala for all the containers.)

Then I discovered that the way it handles .get(i) is to wrap the array in another class. This is similar to how Scala does implicit conversions to add methods to existing classes. There's nothing wrong with this approach, but it means allocating a new object for every array access. Considering arrays are usually used for low level efficiency, this doesn't seem ideal. Granted, in some cases the JIT compiler may be able to eliminate the allocation, and if this technique gets common, it's more likely it will be a target for optimization.

It seems like a better approach would be to recognize array.get(i) and compile it to array access byte codes, without any wrapping. Maybe that's hard to do in their compiler.

Another alternative is to leave array code in Java. Which is fine, but doesn't leave a good first impression.

One of the justifications given for not supporting array indexing is that you should use the higher level containers instead. I think there are times when the extra efficiency of bare arrays is justified, but for the most part I agree. And, in fact, the class that I converted could just as easily use ArrayList's rather than arrays, with likely not too much performance penalty.

Even after I fixed this I still had some weird errors left. It turned out that Xtend doesn't have increment and decrement operators (++ and --) Fair enough, Scala doesn't have them either. But I was more surprised to find that it didn't handle += either, although it is listed as an operator in the documentation. For a language that claims "Less Noise", it seems odd to have to write "i = i + 1" instead of "i++".

Once I got past these issues, I was pretty happy with the results. Even without using any of Xtend's more powerful features (like closures and extension methods) the code was cleaner and less verbose, although not radically so.

One of Xtends claims is "top notch" Eclipse IDE support. Unfortunately, that doesn't mean much in the way of refactoring.

I'm still left with doubts about the future of Xtend. It doesn't have the momentum or backing of Scala. With a "smaller" project like this, it could easily stagnate, mutate, or die.

Saturday, June 23, 2012

Immudb in Production

We took the plunge and converted our (Axon's) in-house accounting and CRM system over to the new version of jSuneido with the immudb append-only storage engine. We have about 50 users so it's a decent test, although we're not as heavy users as some of our customers.

So far so good, it's been running for a couple of days with no glitches.

We'll give it a little longer and then look at converting actual customers.

Thursday, June 14, 2012

Optimizing Java Memory Usage

One of the things I've been meaning to work on for a while is the memory usage for temporary indexes in Suneido's database.

I started by measuring the current memory usage. I used two test cases, first a single table with 400,000 records sorting by a non-indexed field, and second the same table but joined with another one. I used Visual VM to watch the heap and force garbage collections to get an idea of memory usage. The first case used about 200mb and the second about 400mb. For the simple case, that's about 500 bytes per record. I was pretty sure I could improve this usage.

My assumption was that a lot of the usage was Java's per-object overhead.

For each record, the temporary index code created a new sort key record, plus an Object[] array that stored references to each of the source data records (more than one in the case of joins). These sort key arrays are then added to a MergeTree.

My first step was to switch from allocating a separate small array for each sort key, to putting them into a big array and referencing them by offset.

I could have used a standard ArrayList, but it works by doubling the size of it's internal array each time it needs to grow. At very large sizes, this is not the best strategy. For example, to grow from 100mb to 200mb you need to allocate a new 200mb array, copy the old 100mb array into it, and then the old array has to be garbage collected. On top of this, if the data is long lived, it will probably end up being moved to older generations by the garbage collector, where it won't get garbage collected as easily.

Instead, I wrote an ArraysList which uses a bunch of medium sized arrays rather than one big array. To grow it simply allocates another medium size array, without any copying or garbage collecting. This is not necessarily better in every situation, but for my usage it is.

The next problem was that now I was storing an int offset (into the ArraysList) instead of an array. My MergeTree is generic, which means to use if for int's they'd have to be boxed into Integers. Then I'd be back to too much per-object overhead. So I had to make a specialized IntMergeTree. It's too bad that Java makes such a big distinction between primitives and references, and that generics only work with references. (Scala tries to reduce the distinction, and it even lets you specialize a generic class for a primitive type, but it still ends up boxing and unboxing most of the time.)

In the common case where you're just sorting a single table, there's no need for the array at all, and instead of storing the ArraysList offset, you can just store the integer "address" of the record.

The next step was to store the sort key record in larger blocks of memory rather than individually, and again reference them by offset rather than a Java reference. For this I could use the MemStorage class I had already written to create in-memory databases for tests. It inherits from the same Storage base class as my MmapFile memory mapped file access. These are based on ByteBuffers. Each sort key record used at least three memory objects (the record, its ByteBuffer, and the ByteBuffer's array) so this was a significant saving. (This doesn't require much serialization because the sort key records are already in a serialized form.)

The end result was that the memory usage was less than 1/4 of what it was before. My first case now took about 40mb instead of 200mb, and my second case took about 90mb instead of 400mb. Not bad for a couple of days work.

I wouldn't normally recommend this kind of messing around - the overhead is seldom a big issue. But in the case of a server, with large numbers of users competing for memory, I think it was worth it.

Wednesday, June 06, 2012

Guava Overview

A good overview of Google's Guava library for Java.

AnOverviewofGuavaDevoxxFRApril2012.pdf - guava-libraries - DevoxxFR 2012 presentation slides

Wednesday, May 23, 2012

IllegalMonitorStateException and Stack Overflow

Just when I thought immudb was ready to deploy, I got sporadic IllegalMonitorStateException's.

I wasn't sure what that exception meant, but it sounded like concurrency, and that was not good news.

According to the documentation it comes from wait and notify. The catch is that my code doesn't directly use wait and notify. So something that my code uses is in turn using wait and notify.

I tried to catch the exception in the debugger to see where it was coming from, but of course, it never happened when I ran inside the debugger :-(

I started looking around at the docs for concurrency classes that I use and found that ReentrantReadWriteLock WriteLock unlock can throw IllegalMonitorStateException "if the current thread does not hold this lock".

Ouch. My server is NOT thread-per-connection. It uses a thread pool to service requests. So the thread that handles the request that starts a transaction (and acquires the lock) may not be the same thread that handles the request that ends the transaction (and releases the lock).

Because I'm not testing with lots of connections, most of the time all the requests will be handled by the same thread and it will work. But once in a while an additional thread would be used AND would happen to be ending a transaction, and then I'd get this error.

And my concurrency test is equivalent to thread-per-connection so it doesn't run into the problem. That's a weakness in my test, and in this respect the bounded executor I was using before is more equivalent to the actual server.

Of course, I can't guarantee this is the only source of the error, but regardless, it was a bug I needed to fix.

I needed a read-write lock that allowed lock and unlock to be in different threads. I didn't need (or want) it to be reentrant (where the thread holding the lock can acquire it multiple times).

I searched on the web but couldn't find anything.

It looks like it could be written using AbstractQueuedSynchronizer but I'm afraid if I write it myself I'll make some subtle concurrency mistake. I could find various examples of using AbstractQueuedSynchronizer but not a ReadWriteLock.

I'm stumped so I decided to post a question on Stack Overflow. (After searching to make sure there wasn't an existing question.)

I've always been a fan of Stack Overflow. I'm not a heavy user, but I've asked and answered a few questions, and if it comes up in web searches I give it preference. But this time I got a little frustrated with the responses. No one wanted to answer the question - they just wanted to tell me what I was doing was wrong. That's a valid response to some questions, and I have to admit I hadn't really explained the context. I thought the question was specific enough to make the context unnecessary.

I kept clarifying the question and trying to convince the responders that I really did need what I was asking for. It was even more frustrating that people were voting for the responses that just told me I was wrong.

Of course, part of the frustration was that I started to doubt my own design decisions. Maybe I should just be using thread-per-connection - it would have avoided the current issue.

But in the end, Stack Overflow came through again - someone posted an answer that was exactly what I needed. Bizarrely, that answer didn't get as many votes, even though it was the "right" answer.

The answer was to use Semaphore. I hadn't even noticed this class because it's in java.util.concurrent, not in java.util.concurrent.locks where I was looking. I guess it's not a "lock" although it can be used as one.

And when I went to look at the source code for Semaphore, I found that it is implemented (at least in OpenJDK) with AbstractQueuedSynchronizer (which is in java.util.concurrent.locks)

It was simple to write my own read-write lock using Semaphore and everything seems to work fine. I wondered about performance but it seems to be roughly the same as before. I ran some tests and I didn't get any IllegalMonitorStateException's, but it was sporadic before, so that doesn't guarantee it's fixed.

AbstractQueuedSynchronizer has both "shared" and "exclusive" features which seem to map well to read-write locking. But Semaphore doesn't use the exclusive feature. It seems like you could write a read-write lock based on AbstractQueuedSynchronizer that would be a little "cleaner" than Semaphore. But for now at least, I'm happier using something like Semaphore that is tried and tested.

Sunday, May 20, 2012

More Immudb Results

The test that I was using to measure concurrency performance was using a bounded executor - code I'd found on the web, back when I knew Java even less than I do now. I decided that it was more complex than it needed to be and I rewrote it just using a number of worker threads. Surprisingly, that seemed to eliminate the drop in performance with more threads.

I also tested on my Windows machine which has a similar CPU but with an SSD (solid state drive) instead of a hard disk. Here are the results:

Now the performance seems to consistently level off as the number of client threads increase. That's less worrying than the performance dropping, but it's still a little puzzling. It still doesn't appear to be due to actual concurrency scaling issues like lock contention. Perhaps it's running into some other limit like storage or memory bandwidth?

Windows + SSD was roughly 2 times as fast as Mac + HD. I suspect that's mostly due to the SSD, although the OS and hardware could also play a part.

immudb also shows more improvement on SSD than the previous version. (blue to red versus orange to green) My guess would be that this is because immudb's large contiguous writes are optimal for SSD.

On both platforms, immudb is about 4 times as fast as the previous version.