Saturday, November 09, 2013

Hi-tech Travelling Blues

[On the train to Montreal.] After getting my fill of reading I pulled out my laptop to do a little programming. I clicked on Eclipse and got a message that I needed Java 6 and it wouldn't start. It helpfully offered to install one but I had no internet so that failed. This was working fine when I used it a few days ago so I assume an OS X update removed it. 

I like automatic updates but only if they don't blatantly break stuff. I've already gone through this reinstall of Java multiple times. It's not a big deal when you're connected. I understand the security issues, but that's pretty much all related to the browser plugin. I'm not sure why it has to keep removing Java entirely rather than just the plugin. 

I hunted around and found I still had several copies of Java 7 but I couldn't figure out how to get Eclipse to use them. I found a spot in the plist where you could specify a particular JVM but that didn't seem to help. I'm not sure why it was insisting on Java 6. 

There's probably a way to get it to work but I didn't have the patience and gave up frustrated. After all, I am on holidays, and this was supposed to be recreational programming!

But five minutes later I remembered that I had a computer within my computer - my Windows VM in Parallels, where I also have Eclipse. I got my laptop back out and fired up Parallels. Sure enough, that copy was out of reach of OS X updates and was still functional. (Windows updates haven't decided to arbitrarily remove Java, yet.) 

I don't normally do my Eclipse development under Windows on my MacBook so this copy of my source code was out of date but I could get the files I needed from OS X. 

So in the end I got operational. During our layover in Toronto I managed to install the missing JVM using Starbucks Wifi so I should be functional on OS X again. And just in case, I brought the Windows copy of my source code up to date. (Although for some reason it failed when I tried to pull the changes on OS X. Argh!)

Now I just have to figure out why Eclipse is using Java 6 instead of 7 ...

Tuesday, October 22, 2013

Updating Source Code to Java 7

I had this vague memory that NetBeans had a way to upgrade source code to Java 7, which seemed like a good thing to do. But when I searched on the web I couldn't find much. I did find stuff about the IDE giving hints and fixes individually in the IDE but nothing about mass changes. (Which is partly what prompted me to write this.) I also looked for a way to do it with Eclipse but didn't find anything.

So I downloaded the latest NetBeans (7.4) and hunted through the menus. I found Refactor > Inspect and Transform which has a Configuration choice for Convert to JDK 7.

TIP: Set your tab and formatting (e.g. switches) preferences before you run Inspect and Transform. It didn't seem to work correctly when I changed the preferences while Inspect and Transform was open.

It found the following applicable changes in my code:

  • Use diamond inference
  • Convert to switch over strings
  • Convert to try-with-resources
  • Replace with multicatch
  • Replace with multicatch catching specific exceptions

The majority were diamond inference. There would have been even more, but I already used Guava's helpers such as Lists.newArrayList which avoid repeating the generic types.

Convert to try-with-resources didn't merge surrounding try-catch's but there were only a few of these so it was easy to fix them manually.

I wasn't too sure about the last item - replacing catching general exceptions with catching multiple specific exceptions. It seemed like it wouldn't catch everything it did before so I didn't accept those changes. (unchecked them)

But when I clicked on Do Refactoring I got a little "Refactoring" window that was blank. I assumed it was working and left it for a while. But it never went away, and when I tried to close it I couldn't. So I exited out of NetBeans (with no problems) and tried it again. This time nothing happened (except the Inspect and Transform window closed). I thought maybe it was finished but nothing had changed. I ran it again and got only a few of the issues and Do Refactoring worked (on those few). Next time I ran it, I got the long list again. I finally noticed the error marker in the bottom right. I submitted the error and it appears to be a known bug :-(

I ended up doing one package at a time and that seemed to work fine.

Monday, October 21, 2013

Fixing a Suneido Design Problem

In Suneido object.Delete(key) returns false if the member isn't found, otherwise it returns the object.

Several times we've had bugs resulting from doing things like:


which works as expected, except when key isn't found and Delete returns false.

It's likely the worst of possible designs. It would have been better if it returned nothing, or true/false, or my favorite - always the object.

I've been aware of this problem for quite a while but I was hesitant to change it because I was afraid I'd break existing code. I finally decided to go through our code and see if it would actually break much.

There were about 800 uses of Delete.

By far the majority ignored the return value.

I didn't find a single place where we made use of the false return value.

I did find a number of uses which assumed that it always returned the object - i.e. potential bugs.

Other than being tedious, the worst part was seeing all the ugly code. I wonder if it's possible to write code that doesn't make you cringe when you come back to it later.

I found quite a few places where we were doing multiple deletes so while I was at it, I changed Delete to handle multiple arguments.

This is a small change, but I think it's just as important to get the details right as it is to work on the big picture.

Monday, October 14, 2013

More Hamcrest Hassles

The Hamcrest library is very useful. So useful that other useful libraries, like JUnit, include pieces of it. And then JUnit is so useful that Eclipse includes its own copy.

Not surprisingly, having multiple copies, of different versions, of different subsets, with some signed and some not, results in numerous problems. For accounts of my own experiences see: Upgrading to Eclipse Kepler 4.3, Eclipse Hamcrest Runaround, and I Give Up. Searching the web will find lots of other people with similar problems.

Once again, I had thought I had solved this by not using Eclipse's copy of JUnit. Everything has appeared to be working fine for months.

Until I was adding a test today. In good TDD style, I started by adding the test before making the change. It failed, as expected, but not with the error message I expected. I got:

java.lang.NoSuchMethodError: org.hamcrest.Matcher.describeMismatch

I thought maybe it was because I was using org.junit.Assert.assertThat instead of org.hamcrest.MatcherAssert.assertThat - but switching it didn't fix anything.

I also noticed that Eclipse was marking as deprecated. Maybe it should be Nope, still deprecated. I eventually found a comment on Stack Overflow that there are three overloads of "is" and it's only the class one that is deprecated, not the one I'm using. So I guess I just live with that warning :-(

Another comment on Stack Overflow mentioned that Mockito also included it's own version of Hamcrest. Hmmm... I don't think I was aware of that before.

One of the suggested solutions to this problem is to rearrange the order of the jar files on the class path. I had tried that previously without any success. But I was only moving JUnit and Hamcrest, since I wasn't aware that Mockito was also involved.

I find the Eclipse project properties Java Build Path a little confusing. There's a Libraries tab that lists all the jars alphabetically and you can't change the order. And then there's an Order and Export tab where you can change the order. I'm not sure why two tabs are needed. There's also a Referenced Libraries in the Package Explorer that does show the actual order, but doesn't let you change it.

I moved Mockito down in the list so it was below Hamcrest (JUnit was already below) and sure enough, that solved the problem.

Except now I had a different error in another test. That one turned out to be because I was using "is" with a class. Maybe it's deprecated in one of the versions/copies of Hamcrest but removed in the version of Hamcrest I'm explicitly including? Luckily this was simple to fix by just changing it to instanceOf.

It looks like you can download separate jars for Mockito which would let you leave out it's copy of Hamcrest. Except that it appears to be using a different version of Hamcrest (1.1) from the one I'm using (1.3). I have no idea if that would cause problems, but since it's currently not broken, I don't think I'll try to fix it!

Probably someone out there will tell me that I should be using Maven to manage dependencies, and maybe I should. But I'm not so sure that would eliminate these problems. I see several comments on the web about the same issues when using Maven.

Sunday, October 13, 2013

jSuneido GUI

What's special about this screenshot of the IDE isn't what's visible.

It's that this is running on jSuneido! (the Java implementation of Suneido).

Up till now jSuneido has only been the server side. The only "UI" it had was a command line REPL.

But in the long run, I'd rather not support two implementations of Suneido. It would be nice if we could just use jSuneido since it's a better implementation.

Suneido's user interface is Win32 based and implemented with the DLL interface. None of the UI is built into the exe, it's all Suneido code in stdlib. (Other than a few support functions.)

Ideally, we'd switch to a portable GUI, but that's a huge job and would likely mean a bunch of changes to our application code.

So we decided to see if we could implement a Windows DLL interface in jSuneido and get Suneido's existing GUI to run on it.

One of Suneido's early programmers , Victor Schappert, returned to us and worked on this project. Thanks Victor!

As you can see, it's far enough along to run most of the IDE. but we still have a few things left to do like the COM interface and SuneidoAPP interface to the IE browser component.

As usual, the code is in version control on SourceForge. The JSDI project in Mercurial (Hg) contains a support dll for jSuneido (written in C++) that jSuneido talks to via JNI. A pre-built version of jsdi.dll is included in the jSuneido project.

Monday, August 12, 2013

A Recurring Lack of Assertiveness

We recently ran into a problem with jSuneido - when you loaded a certain customer's dumped database it would load successfully but then fail the consistency checks.

After looking at where and how it was failing (a negative size value), I figured the problem was probably that the database had a single table larger than 2 gb and I was using a 32 bit int to store the size of commits so it was wrapping around. Normally commits wouldn't be anywhere near that large, but when bulk loading each table is written as a single commit.

I felt pretty good about finding the problem so quickly and easily.

I put asserts into the code to confirm that this was the problem. But they didn't fail. Hmmm... maybe I put them in the wrong place. I added more asserts in other places. They still didn't fail.

So I fell back on the age old debugging method of inserting print's. Of course, it's a big database with a lot of tables so there was lots of output. I skimmed through it and couldn't find a table bigger than 2gb.

So much for finding the problem quickly and easily!

The next day, at home, I continued working on it, partly just trying to remember how the database code works! On the positive side, I updated some comments and diagrams while I was at it.

Eventually, I ended up full circle, finding that there was indeed a table bigger than 2gb and my original guess about the problem was correct! Argh! (I'd missed it when I skimmed through the prints.)

The problem was that I didn't have assert's enabled, which is why the asserts I added didn't fail. I've been burnt by this before - see Don't Forget to enable Java assert and Burnt by Java assert Again. You'd think I would learn.

Part of the problem is the way Eclipse works. You can set options on JRE's, but when you update to a new version of Java, then you have to remember to set the options again (which I had forgotten to do, both at work and at home). It's too bad there isn't a way to set options that are common to all JRE's.

You can also set options in Eclipse launch configurations, but I have a ton of them, and again (AFAIK) there isn't a way to set default options that are common to all launch configurations.

I thought I had good defenses in place for this. I have a test which confirms that assert is enabled. But I'm using Infinitest to run my tests automatically and it must enable asserts itself. So unless I run the test manually, it's useless for confirming that I have asserts enabled.

I also enable asserts programmatically in the start-up code. But while I was testing I was running specific classes and bypassing the start-up code.

I'm not sure what else I can do to defend against this. Any suggestions?

Thursday, August 01, 2013

Upgrading to Eclipse 4.3 Kepler

Another relatively smooth upgrade.

I downloaded the Eclipse IDE for Java Developers (rather than Standard) since I don't do any plugin development.

See Top 10 Eclipse Kepler Features

I did have one problem on Mac OS X - when you try to run Eclipse you get an error that it is "damaged and can't be opened". I'd run into this before. It's a known issue which, for some reason, the Eclipse developers have closed as "RESOLVED NOT_ECLIPSE" i.e. not their problem. However, as one of the commenters points out, other apps don't have this problem, so it's obviously something Eclipse is doing different if not technically "wrong". It's relatively easy to work around by doing:

xattr -d

I imported my plug-ins from my previous Eclipse 4.2 Juno and they all came across without any problems. Even Mercurial still seems to be functional. Here's the list of plug-ins I'm currently using:
  • Bytecode Outline
  • EclEmma Java Code Coverage
  • Checkstyle
  • FindBugs
  • Infinitest
  • MercurialEclipse
  • Metrics plugin for Eclipse
The other problem I ran into is also a known issue - conflicts (giving SecurityException) between the hamcrest-core in the Eclipse copy of JUnit, and hamcrest-library that I'm adding for additional matchers. At first I thought it was because the new version of Eclipse came with a newer version of JUnit, so I upgraded my hamcrest-library to match. But that didn't solve the problem.

I knew I'd run into this before so I went to my blog to see how I'd resolved it. I found Eclipse Hamcrest Runaround and I Give Up but nothing about how I eventually got it to work. It's funny how my blogs have become my external memory and I'm unhappy if I forget to record something.

After searching the web and messing around I realized the easiest solution is just to remove the Eclipse JUnit from the build path and have my own junit, hamcrest-core, and hamcrest-library. That seems to solve the problem.

See also:
Although I don't feel like I'm usually recording much useful information, these upgrade posts are some of my most frequently visited. That may just be because they come up in searches.

Friday, July 12, 2013

Systems Upgrade

I haven't done anything with my home computer systems for several years and it seemed like time for some upgrades and some preventative maintenance.

My Time Capsule is getting older and I knew sooner or later it would fail. It's nice having wireless router and network storage all in one unit. On the other hand, if it fails you're in trouble. Being paranoid about backups, I also decided I should have some kind of redundant storage.

So I replaced the Time Capsule with an ASUS RT-AC66U wireless router and a Synology DS413 4 bay NAS server with three 2tb Western Digital Red drives. The default setup for the Synology NAS will handle a single drive failure without losing any data. It combines the capacity of all the drives so I ended up with about 4tb of storage from the three 2tb drives. I can add a fourth drive at any time (of any size) if I need more space. This was a pretty painless upgrade and improved both wireless and storage speed and space.

I've been mildly tempted by a new iMac, but mostly for more memory and an SSD, which I decided I could get without replacing the whole machine, since it's got a decent i7 and is otherwise fine. (USB 3 and Thunderbolt would be nice, but not essential.) One advantage of the older iMac over the newer models is that it has the SD memory card slot on the side whereas the new "skinny" machines have it on the back where it would be a lot more awkward to use.

The trick was that I need more space (mostly for photos) than I can reasonably get with an SSD, which meant an SSD plus a hard drive. But since my model of iMac doesn't have space for a second drive, that meant removing the optical drive and putting the SSD there (using a "data doubler"). I didn't really mind losing the optical drive - I can't remember the last time I used it. A new iMac wouldn't have had one anyway. And I can always get an external one.

I could have kept the existing 2tb hard drive since it wasn't full, but I decided to set up a "Fusion" drive which combines the hard drive and SSD and automatically migrates data to the appropriate drive. This requires wiping out the hard drive and I was nervous about depending on my backups. So I bought a new 3tb Seagate Barracuda drive and kept the old drive as an extra backup.

I also upgraded the memory from 8gb to 16gb.

Having been in the computer business through the whole progression from 5mb hard drives to 500mb, to gigabytes, and now to terabytes, I sometimes have to think twice about the sizes I'm talking about. Is that backup 1000mb or 1000gb? I know everyone's tired of hearing it from us old timers, but it's still mind boggling that the current drives have grown something like a million times bigger over the course of one working career.

If it had been a PC I probably would have done the upgrade myself (although my hardware days are long past), but to get inside an iMac you have to take the glass off the front which seemed a little scary to me so I got our local Apple dealer to do it.

I had made an OS X installer USB thumb drive beforehand and I had no problems booting from this, setting up the Fusion drive, and installing OS X. Then I used the Migration Assistant to restore from my Time Machine backup. This took roughly 4 hours for about a terabyte of data and restored all my files and applications. It was nice not having to re-install applications.

The only thing I missed was my Parallels Windows VM. For some reason this wasn't included in my Time Machine backup. I had previously excluded it, but I was sure I had started including it. I'm not sure what happened.

I put my old hard drive into an external USB 3 / Firewire 800 enclosure and retrieved the VM with no problems.

The iMac definitely seems faster. If I watch the drive activity (using iStat Menus) it appears the Fusion drive is working properly. The only concern I have is that the new hard disk seems to be running quite hot, even when the machine has been "sleeping". The preferences are set to power down the drive but it maybe that isn't working.

All in all it went quite smoothly and hopefully will keep me happy for a few more years.

Tuesday, July 02, 2013

Of Mice and Keyboards

At home, on my iMac, I use the Apple full size wired keyboard and magic (touch) mouse. (In addition to not having to worry about batteries, the wired keyboard also has USB ports at either end which are much more accessible than the back of the computer.) I have a magic trackpad too, but I don't find I use it much.

At work, on my Windows PC, I wanted a similar keyboard. For a while I used the same Apple keyboard, but it wasn't ideal because it's missing Windows specific keys.

So I switched to the Logitech Wireless Solar keyboard, which has a similar look and feel but with a Windows layout. I've been pretty happy with it. The solar has worked great and it's nice not to have to change batteries. We've ended up with quite a few of these around the office.

Unlike many people, I actually liked it when Apple switched the default direction of mouse scrolling. Partly, I guess, because it was similar to iPhone and iPad.

But I had a hard time switching between one direction of scrolling at home, and another at work. I also quite liked the Apple magic touch mouse, so I bought the Logitech t620 Touch Mouse which is quite similar.

At first, I thought it was pretty good. But after a while it started to drive me crazy. It was way too sensitive and I would end up scrolling all over the place unintentionally. I stuck with it, thinking I'd get used to it, but if anything it got worse after a driver update. I occasionally have the same problem with the Apple mouse, but nowhere near as bad.

I finally got fed up and shopped for a new mouse. (Hopefully someone else in the office will have better luck with it.) I wanted another Logitech one so I could share the same dongle. I could have gone back to a traditional mouse wheel, but I decided to try the Logitech t400 Zone Touch Mouse.

But things are never simple with computers - I couldn't get the reverse scrolling to work. Logitech's Set Point software has a check box for this, but it had no effect. I did the usual incantations of uninstall, reinstall, reboot, etc. but no luck.

When I started searching the web, I remembered that originally I had used a registry hack to switch scrolling direction. Even better, someone had supplied a Powershell command line to do it, rather than manually editing the registry.  (It would be nice if the Windows control panel had a way to change this setting.) But it still didn't work! I ended up uninstalling the Logitech Set Point software. The mouse works fine without it, and now the scrolling works the way I want.

So far I've been pretty happy with this compromise. I occasionally find myself trying to scroll with my finger not on the touch part, but other than that it seems fine. It still has the ability to scroll horizontally, which is occasionally useful, but because the touch sensitive area is limited, I don't find I trigger it accidentally. The smallish size, and rubber sides feel quite comfortable.

Hopefully this combo will keep me happy for a while!

Thursday, June 27, 2013

Jot! iPad App

I'm not a very visual person when it comes to software and I don't use a lot of diagrams, but occasionally I find them useful to visualize something complex.

I have a bunch of iPad drawing apps (which is a little odd because I don't draw). The one I like for doing quick diagrams is Jot! Whiteboard by Tabula Rasa. There are both paid and free versions. (I also use Google Docs drawing program for more "formal" diagrams.)

What I like best is that you can move/copy/delete things (or groups of things) after you've drawn them. That's hard or impossible with a lot of "paint" type programs.

You can also easily add text boxes, which is handy for the kind of diagrams I draw.

Here's a couple of examples (the meaning isn't important)

Sunday, May 05, 2013

Optimizing Tr

Suneido has a string.Tr function, similar to the Unix tr command. Recently, I was looking at the stdlib Base64 code. Decode was using string.Tr to strip any newlines. However, there may not be any newlines. Which made me wonder what Tr did in this case - did it still make a copy of the string? Sure enough, it did.

My first reaction was to add a guard:

if s.Has?('\n')
    s = s.Tr('\n')

Then I started to wonder where else we should be doing this. But that was ugly. It made more sense to build it into Tr.

But implementing it as above would mean doing an extra scan of the source string. Since the code was already scanning, it would be more efficient to just defer copying until found somewhere where you needed to make a change.

I implemented this in the C++ code. It complicated it a little, but not too bad.

Then I tried to implement the same thing in the Java code. Not being able to "cheat" with macros like in the C++ version meant I had to create an instance to share the variables. This defeated part of the purpose (to avoid allocation) and seemed ugly.

Instead I decided to do an initial scan to find the first character to be changed, and then either return or continue from that point. This still avoided redundant scanning, without complicating the main part of the code.

In the process, I discovered there were a few other cases I could short circuit - if the source string is empty, or if the "from" character set is empty then you can just return the source string unchanged. Also, if there are no ranges in the from set or the to set, then you don't need to expand the sets, avoiding more allocation and copying, albeit only on the sets. I also simplified the code a little. (It was a good thing I had tests since I "simplified" a little too aggressively a few times!) See the code.

I also decided to add a cache for expanding sets with ranges. I'm not sure that's justified, but it's similar to what I do with regular expressions. The regular expression code had been using a home-brew LruCache, but I switched to Google Guava's caching facilities. That also allowed time based expiry so I could make the cache bigger without wasting space if it wasn't needed.

Then I went back and revised the C++ tr code using the same approach as the Java code.  For some reason I had originally use std::vector for the sets. I switched to making them gcstring's to avoid making a new set if there are no ranges to expand. I also added a cache, using the existing CacheMap that was being used for regular expressions.

Although it sometimes bugs me to have to update two implementations of Suneido, it does have its benefits. It means if I want to use the same approach I can't take too much advantage of specific language features (e.g. macros). This might mean I can't use some fancy feature, but in many cases that's not the wisest anyway. By the time I've written the code in two different languages, I think the end result is usually better than if I'd just written it once.

Again, I'm guilty of optimizing without real proof of whether it's justified. Tr is fairly heavily used in some areas, and the changes will probably speed up those areas. But whether that makes much difference in the bigger picture is questionable.
An amusing historical aside - I originally ported this code from Software Tools by Kernighan and Plauger, which uses Ratfor (rational Fortran). The code's origins are still visible in some of the names and structure. This book was an inspiration in my early programming days. I still have my original copy, although it's falling apart. It's pretty cool that it's still in print 37 years later.

See also previous posts: String Building Internals and An Amusing Bug

Friday, May 03, 2013

An Amusing Bug

I recently realized that the block form of Suneido's string.Replace was a more efficient way to "map" over strings.

As I mentioned in my recent post on String Building Internals, I also discovered cSuneido's string.Replace didn't handle nul's. I fixed this problem and we sent out the new suneido.exe to our beta customers.

And we started to get support calls that PDF attachments weren't being sent properly. Sure enough it was the new version of Base64.Encode that used string.Replace.

But I had tests, and we had also tested manually and it worked fine. We got one of the problem files from a customer and sure enough it failed.

Digging into it, I could see that it was only encoding part of the file. That seemed a bit like the previous nul problem, which led me in the wrong direction for a while.

More testing revealed it was encoding the first 39996 characters of the file. That seemed like an odd number. My first thought was that it was in the general vicinity of SHORT_MAX or 32767. When I first wrote the C++ version of Suneido I was still trying to use short int's when possible. This has led to a number of issues since SHORT_MAX isn't very big in modern terms. But the relevant code wasn't using any short int's.

But I noticed a magic number of 9999 in the code. Base64 encode outputs groups of 4 characters. 4 x 9999 - 39996. Aha!

string.Replace takes an optional argument for how many replacements to do. Usually, you either want 1 or all. When not specified, the count was defaulting to 9999. For "normal" usage, that's plenty. But when using replace to map large strings, it obviously isn't.

I changed it to INT_MAX and that fixed the problem. Out of curiosity  I went and checked the jSuneido code. (It did not have the same issue.) Frustratingly, it already had INTMAX. I don't think I foresaw this issue when I ported the code, it probably just bugged me to have a magic number.

I'm not sure why this strikes me as amusing. It just seems funny that the "bug" was not something obscure, just a badly chosen limit, that no doubt seemed entirely reasonable at the time.

Friday, April 26, 2013

No auto-update for 64 bit Java on Windows

It's hard to believe after all these years, and all the security issues, that there's still no auto-update for 64 bit Java on Windows.

I have known that Java on my Windows machine wasn't updating properly, but I just assumed it was because I had multiple copies and versions etc. But it's a hassle having to remember to manually download and install updates so finally I decided to try to fix it, only to discover there is no fix.

This was entered as a bug in 2006, and it's currently scheduled to be fixed in Java 8 (!)

Sometimes you really have to wonder about how these things get prioritized. Granted, our customers say the same thing about us, but considering what a security issue this is, I would have thought it would get addressed. Back in 2006 I can see thinking "no one" was running 64 bit, but nowadays that's not a good assumption.

Sunday, March 31, 2013

String Building Internals

Recently we ran into a problem where a service using cSuneido was running out of memory. I confidently switched the service to use jSuneido, and got stack overflows. So much for my confidence!

We tracked it down to calling Base64.Encode on big strings (e.g. 1mb) (This came from sending email with large attachments.)

Base64.Encode was building the result string a character at a time by concatenating. In many languages this would be a bad idea for big strings. (e.g. In Java you would use StringBuilder instead.) But Suneido does not require you to switch to a different approach for big strings, the implementation is intended to handle it automatically.

Up till now, Suneido has handled this by deferring concatenation of larger strings. If the length of the result is greater than a threshold (64 bytes in cSuneido, 256 bytes in jSuneido) then the result is a "Concat" that simply points to the two pieces. This defers allocating a large result string and copying into it until some operation requires the value of the result. Basically, Suneido makes a linked list of the pieces.

For example, if you built a 1000 character string by adding one character at a time, Suneido would only allocate and copy the result once, not 1000 times. (It still has to create 1000 Concat instances but these are small.)

In practice, this has worked well for many years.

I thought I had implemented the same string handling in jSuneido, but actually I'd taken a shortcut, and that was what caused the stack overflows.

The obvious way to convert the tree of Concat's to a regular string is a recursive approach, first copying the left side of each concat, and then copying the right side, each of which could be either a simple string or another Concat. The problem is that the tree is not balanced. The most common case is appending on the end, which leads to a tree that looks like:

But with many more levels. If your tree has a million levels, then recursing on each side (in particular the left side) will cause stack overflow. In cSuneido I had carefully iterated on the left side and recursed on the right. (Basically doing manual tail-call elimination.) But when I ported the code to Java I had unknowingly simplified to recurse on both sides.

Once I knew what the problem was, it was easy enough to fix, although a little trickier due to using StringBuilder.

Thinking about the problem I realized in this case, there was a better way to write Base64.Encode. I could use the form of string.Replace that takes a function to apply to each match, and simply replace every three characters with the corresponding four characters in Base64. Since string.Replace uses a buffer that doubles in size as required (via StringBuilder in jSuneido), this is quite efficient. (In the process I discovered string.Replace in cSuneido choked on nul's due to remnants of nul terminated string handling. Easy enough to fix.)

But that left the problem of why cSuneido ran out of memory. I soon realized that the whole Concat approach was hugely wasteful of memory, especially when concatenating very small  strings. Each added piece requires one memory block for the string, and another for the Concat. Most heaps have a minimum size for a heap block e.g. 32 bytes. So in the case of appending a single character, it was using 64 bytes. So building a 1 mb string a character at a time would use something like 64 mb. Ouch!

In theory, cSuneido could still handle that, but because it uses conservative garbage collection, spurious references into the tree could end up with large amounts of garbage accumulating.

Maybe my whole Concat scheme wasn't ideal. To be fair, it has worked well up till now, and still works well for moderate string sizes. It just wasn't designed for working with strings that are millions of characters long.

I dreamed up various complicated alternative approaches but I wasn't excited about implementing them. It seemed like there had to be a simpler solution.

Finally, I realized that if you only worried about the most common case of appending on the end, you could use a doubling buffer approach (like StringBuilder). The result of a larger concatenation would be a StrBuf (instead of a concat). Because strings are immutable, when you appended a string onto a StrBuf, it would add to the existing buffer (sharing it) if there was room, or else allocate a new buffer (twice as big).

Theoretically, this is slightly slower than the Concat approach because it has to copy the right hand string. But in the common case of repeatedly appending small strings, it's not bad.

This approach doesn't appear to  be noticeably faster or slower, either in large scale (our entire application test suite) or in micro-benchmarks (creating a large string by repeatedly appending one character). But, as expected, it does appear to use significantly less memory (roughly half as much heap space).

One disadvantage of this approach is that it doesn't help with building strings in reverse order. (i.e. by repeatedly adding to the beginning of a string) That's unfortunate, but in practice I don't think it's too common.

A nice side benefit is that this approach is actually simpler to implement than the old approach. There is no tree, no recursion, and no need for manual tail-call elimination.

Unfortunately, in Java (where I've been working on this) you still have to convert the StringBuilder to a string when required, which makes yet another copy. In cSuneido I should be able to simply create a string wrapping the existing buffer.

Thinking about it some more, I wasn't happy with the extra copying - a lot more than the previous approach. I realized I could combine the two approaches - use a doubling array, but storing references to the pieces rather than the actual characters. This eliminates the copying while still reducing memory usage.

Note: Concats and Pieces cannot be combined because Pieces is shared - appending to a Concats will result in a new Concats that (usually) shares the same Pieces. Concats is effectively immutable, Pieces is mutable and synchronized. (It would be nice if Java let you combine a class and an array to make variable sized objects, but it doesn't.)

However, in the worst case of building a large string one character at a time, this would still end up requiring a huge array, plus all the one character strings. But given the array of strings, it's relatively easy to merge small strings and "compact" the array, reducing the size of the array and the overhead of small strings.

I didn't want to compact too often as this would add more processing overhead. I realized that the obvious time to compact was when the array was full. If there were contiguous small strings that could be merged, then you could avoid growing the array.

I was disappointed to discover that this new approach was about 1/2 as fast as the old approach (when building a 1mb string one character at a time). On the positive side, it resulted in a heap about 1/4 the size (~200mb versus ~800mb). It was a classic space/time tradeoff - I could make it faster by doing less merging, but then it used more memory.

Interestingly, the test suite (a more typical workload) ran about 10% faster. I can't explain that, but I'm not complaining!

It still nagged me that the new approach was slower building huge strings. I tried running it longer (more repetitions) to see if garbage collection would have more of an effect, but that didn't make much difference. Nor did building a 100kb string instead of 1mb (and again doing more repetitions). Then I thought that since my new approach was specifically aimed at building huge strings, maybe I should try increasing the size. Sure enough, building a 10mb string was 4 times faster with the new approach. Finally, the results I was hoping for! Of course, in practice, it's rare to build strings that big.

I initially wrote special code for adding to the beginning of a concatenation and for joining two concatenations. But then I added some instrumentation and discovered that (at least in our test suite) these cases are rare enough (~1%) that they're not worth optimizing. So I removed the extra code - simpler is better.

There are a number of "thresholds" in the code. I played around a bit with different settings but performance was not particularly sensitive to their values.

I probably spent more time on this than was really justified. But it was an interesting problem!

As usual, the code is in Mercurial on SourceForge. Or go directly to this version of Concats.

Wednesday, March 27, 2013

Ilford Marketing Abuse

Can you spot the "unsubscribe" link on this email fragment:

It's at the bottom in tiny dark gray text on a black background. Could they have made it any harder to find?

I can just imagine the conversation: "Ok, so we have to have an unsubscribe link. How about if we make it really hard to see."

The reason they got my email address was that it was required so I could download color profiles for their high end inkjet paper. It seems to me, if you want people to buy your paper, then you want to make it as easy as possible for them to get good results. Not make them jump through hoops and get spammed, just for the "privilege" of buying their products.

They didn't even have a way to opt out of emails when you register, as most registration forms do.

Needless to say, it doesn't make me inclined to buy any more of Ilford's products.

Tuesday, March 12, 2013

Geek Overthink

We rented a car when we were in Virginia and we managed to get a Toyota Prius. The radio was playing and I hate radio as much as I hate TV! I had my music on my iPhone but I didn't have a car adapter. But hey, the car supports Bluetooth audio. I poked around in the menus but I couldn't figure out how to connect. I pulled out the owners manual but there was no mention of audio? I found there was a whole separate manual for the audio system!

It was a little hard to follow the manual because it described multiple versions of the audio system, none of which seemed to match our vehicle. I figured it had to be possible since a bunch of iPhones were listed on the Bluetooth menu. It turned out that might have been part of the problem since the manual said you could link up to five devices, and there were already five. No problem, I'll just delete one. Except the options to delete that were described in the manual didn't appear on our car. I wonder if rental cars are locked down somehow so you can't mess with certain settings?

I gave up at this point. But then I remembered seeing something about USB in the manual, and I did have a USB cable (from the charger). Sure enough there was a place to plug in a USB cable and when I did my iPhone showed up and played perfectly.

The moral of the story is - don't get so wrapped up in the wonderful complexities of one approach that you forget that there might be other (simpler) solutions.

Friday, February 22, 2013

Service is Not a Joke

Some companies seem to have the idea that humour equates to great customer service. (e.g. WestJet and Amtrak)

I disagree.

There's nothing wrong with lightening things up and getting people smiling, but you also need to fix the underlying system. Telling someone a joke while they are on hold for an hour doesn't change the fact that they were on hold for a hour.

And not everyone makes a good comedian. Management can dictate "you shall tell jokes" but that doesn't mean it's always going to be funny. Especially the fifth time you hear it.

Unless you also fix the real issues you're just putting lipstick on a pig.

Of course, that leaves the question - if you can't fix the pig are you better off at least dressing it up?

If this was individuals trying to make the best of a situation then ok. But in most cases it seems to be a top down directive. In other words it's coming from exactly the people that could actually fix the system if they really wanted to.

Thursday, January 10, 2013

Building Suneido with Visual Studio 2012

Each time Microsoft releases a new version of Visual Studio I try building Suneido with it. (Using the free Express version.) Usually it's easy. Occasionally I have to tweak some code because they've tightened up the compiler.

Unfortunately, up till now the results have been slower than the old version (version 7) I've been using for release builds. I assume that's because version 7 was the last version to support single threaded libraries. But it makes me a little nervous to still be using such an old version. So I was happy to find that Suneido compiled with the latest version 11 seems as fast as version 7. (it's confusing that the version number is different from, but so close to the year e.g. VS 2012 = version 11) If my assumption about the speed being a result of the multi-threaded libraries, then maybe this means they have reduced the overhead. Or it could  be that they've just improved the compiler enough to balance out the overhead from the multi-threaded libraries. (The C++ version of Suneido is not multi-threaded so it's annoying to pay a penalty for something you don't need. It goes against the C++ philosophy of only paying for the features you use.)

Normally I have just been building using a makefile from the command line. This time I decided to try building with the Visual Studio IDE. Having got accustomed to using an IDE with Java, I find I miss it when I work on the Suneido C++ code. I've been using Eclipse CDT a bit, but I haven't been able to get it to build and it shows all kinds of spurious errors.

You'd think that once your code builds with a certain compiler, that it would be easy to build with the same compiler from an IDE. But that hasn't been my experience. I've tried to build with Eclipse CDT and with Netbeans but I've always given up in frustration. (Even though it's using the same MinGW compiler that I build with from a makefile.)

I did manage to succeed in building Suneido from the VS 2012 IDE but once again it was an extremely frustrating process. A large part of the problem is the huge number of compiler options. I know exactly what I'm specifying for the command line builds, but I don't know what the defaults are for what I'm not specifying. You can see what all the settings are in the IDE, but the defaults are different from the command line compiler. It's helpful that the IDE settings will show you the command line equivalent to your current settings, but I'm not convinced that this is 100% correct. I'm pretty sure that at times I was getting different results in the IDE from what it claimed was the equivalent command line. I could be wrong - there are so many settings that it's easy to get mixed up unless you're incredibly disciplined in your testing.

Even though I had the same warning level and the same settings (AFAIK) I was getting compiler errors in the IDE for code that compiled perfectly from the command line. Eventually I gave up and changed the code so it would compile in the IDE. That feels a bit like giving up, but I'm only willing to spend so long thrashing on this kind of accidental complexity.

One big mistake I made was that I inadvertently did all my tweaking of the settings under the Debug configuration. So when I switched to the Release configuration, I lost all my settings. Argh! I had to redo the settings under All Configurations. The problem is that when you go into the settings it goes to the current configuration. So if you forget to switch to All Configurations you end up changing only one of the configurations. Leading to frustration when you change to the other configuration and things are broken.

Once I got it to compile, it passed my C++ test suite perfectly. But when I ran the Suneido IDE I was missing most of the toolbar buttons. Oops, didn't include the resources. I added the resource file to the project but then I got errors about a duplicate manifest?! After some digging on the web I found this was because Visual Studio automatically creates its own manifest but my resource file added my own manifest. One solution was to change my resource file to not include my manifest. But the automatic manifest didn't include one of the things I needed (version 6 of the common controls). There might be a way to do this through the IDE but I didn't find it. Eventually I figured out how to turn off the automatic manifest and just use my own. In my digging on the web I'd come across people saying this was a "bad" idea, but it seems to work fine for me. I looked at the automatic manifest and it didn't seem to have anything special in it. (The free Express version of Visual Studio doesn't include the resource editor, but that's not a problem because the main suneido.rc resource file is just a text file.)

Another issue I discovered is that programs compiled with VS 2012 won't run on Windows XP (they require Vista or newer). That's not a problem for me, but I'm pretty sure we have a substantial number of customers who still have XP. It sounds like maybe there was some backlash on this issue and Update 1 for VS 2012 now includes support for building programs that will run on XP (or newer). This was fairly easy to turn on in the IDE settings, but I haven't got around to figuring out the equivalent makefile changes.

But even this setting wasn't without its frustrations. After I turned on XP support, I started getting weird errors about multiple definitions of preprocessor symbols in the standard headers. Once more I thrashed around on this for a long time. I found a couple of places on the web talking about the same (or at least similar) problem but they didn't help. Eventually, after a lot of semi-random poking at the settings, the problem went away. I think I had messed something up in my earlier tweaking of the settings, possibly related to turning on or off the "inherit from parent or project defaults" on certain settings.

I know it's fashionable to dislike Java, but I have to say it's a heck of a lot easier compiling a Java program than it is a large C++ program! Although I'm sure Visual Studio experts will shake their heads at my ignorance. It took me a while to get familiar with Eclipse for Java as well. But that's was just learning the IDE, not a zillion obscure compiler options. This isn't really the fault of the C++ language itself, but it doesn't seem to be a common feature of C++ compilers.

Once I got Suneido to build I started poking around seeing what the IDE was like. One of the things I looked for is an outline of the current file in the editor, showing the classes and their members. I assumed this must exist and that I just wasn't looking in the right place. But  when I started searching on the web, I found that the closest thing it has is a Class View, but it shows all the classes in the project - a huge list for a big program. There are some add ons to do this but no good free ones. It's not a big deal but it's such a main feature of Eclipse that I've come to expect it. Even the Suneido IDE has an outline! (I did discover there were pull down lists at the top of the editor pane that provide similar functionality, but it's not quite the same as an outline view.)

If I just want to build the code, I prefer the command line. But if want to navigate around in the code and look for things, then the IDE is nice. Refactoring support is another reason to use an IDE, but Visual Studio doesn't have any refactoring for C++ (it does for C#). There are commercial 3rd party add-ons for refactoring like Visual Assist X (which also has an outline view)

I very much doubt if I've managed to get the settings identical between the makefile and the IDE project. (I know for sure I haven't in the case of XP support.) So which one should I build with? I think I can build the IDE project from the command line if I want. That would presumably ensure I was at least getting consistent settings.

I had one mysterious crash when I first started testing this build of Suneido. But since then I  have used it quite a lot without any problems. Maybe I changed (fixed) something in between, but I'm not sure about that.

As usual, the code changes and the Visual Studio solution are in version control on SourceForge. If you try building it, let me know how it works.

Friday, January 04, 2013

The Evolution of a Feature

The application software that we sell has gotten bigger over the years. We now have roughly 750,000 lines of (Suneido) code. That's peanuts to someone like Microsoft, but it's a lot for a handful of programmers in a small company.

Mostly I work on the Suneido implementation itself i.e. the C++ and Java code. But recently, for a change, I've been doing some programming in Suneido. (My staff doesn't always appreciate this because I tend to be a little more "aggressive" than they would be, and as a result I've been know to break a few eggs in the process of making my omelettes.) I quite enjoy the chances I get to actually use the language I've designed and implemented. And unlike C++ and Java, if there's something I don't like, I have the option of fixing it!

As our code has grown navigating through it has become more of an issue. I might vaguely remember the name I'm looking for, but I have a hard time remembering if it's FooBar or Foo_Bar, or Foobar. We have naming conventions, but there's still variation.

A lot of the time you can follow references through the code using "go to definition", but sometimes you need to start from scratch.

We use a "Find" function in the library view, basically a "grep". That works, but it's not ideal. It's too awkward to write regular expressions like "Foo_?Bar" every time.

Several years ago I improved the Find function to search the names by converting both the actual names and the search to a canonical form, e.g. converting to lower case and stripping out underscores. It then "scored" the matches.

This was an improvement, but it didn't show you a list of matches, it just let you step through the matches. There were still lots of times when the first match wasn't what you wanted.

About the same time, I added auto-completion in the code editor. So you can start typing a name and it would display a list of matches. At least this way you could see a list and pick the one you wanted.

I noticed programmers go to the work space, start typing a name, use auto-completion to get the right one, and then go to the definition. This worked, but it's a little round-about!

My recent next step was to change the name search field in the "Find" dialog to do auto-completion. This avoided having to go the workspace route.

But because of the way Scintilla (the code editor component we use) works, auto-completion can't ignore underscores or do other kinds of matching.

Also, it seemed a little ugly to auto-complete the exact name, and then go searching for it, even though no searching is necessary in this case.

So my most recent improvement is to add a "locate" field (at the top-right of the library view). And instead of using the Scintilla auto-completion, I used a different auto-complete control we had which allows more flexible matching. (In the process finding and fixing a bunch of bugs in this control.)

I also got the idea to add what Eclipse calls "camel case searching", being able to search using just the capitalized letters in a camel case name. e.g. using "oic" to search for OpenImageControl. This is quite handy when names get long.

The simplest way to implement this would be to just perform a linear search of all the names. But we might have roughly 20,000 active names, and doing a complex search linearly is fairly slow. (Although perhaps usable.)

The auto-completion uses a sorted list of names and does a binary search in it. This is fast, partly because Suneido has a built-in lower bound binary search.

But how do you implement something like camel case searching with a binary search?

I realized that I could do it by putting multiple entries in the list. e.g. for OpenImageControl there would be two entries in the list, one for "openimagecontrol" and one for "oic". I also needed the original name so the entries in the list look like: "oic=OpenImageControl".

Suneido also lets you overload the same name in multiple libraries (a much abused feature!). So I added the library to the entries e.g. "oic=OpenImageControl:stdlib".

But when I sorted those entries, the library part sorted alphabetically, which isn't a useful order. I wanted to sort the library part in the same order they were being used. So I changed the entries to use a library index instead of name e.g. "oic=OpenImageControl:01"

The auto-complete does a binary search to find the matches, removes the part up to the equals sign, sorts and uniques it, then converts the library back to the name, resulting in e.g. "OpenImageControl (stdlib)"

The other question is when to build and update the list. It takes a noticeable amount of time to build the list (about 1/2 a second on a fast machine) so I didn't want to do too often. And yet I wanted the list to be up to date. One option is to update it when the GUI is "idle". But most of the time it wouldn't need updating. Ideally, I wanted changes to "invalidated" the list, and only update it when it was next required. Unfortunately, there's no easy way to be notified of library changes. I ended up using two things to determine if the cached list was still valid. First, if the libraries in use changed, then the list had to be updated. Second, if the maximum "num" id field in any of the libraries changed, that meant a record had been added and the list needed to be updated. But checking the num field meant database queries so I didn't want to do that too often so I throttled it to check at most every 15 seconds. That still leaves a slight lag occasionally when you use the locate feature (if an update is needed) but it's rare enough (and brief enough) not be be annoying.

I only just finished this feature so it's hard to say how it will work out, but so far it seems like a good (or at least better) solution.