Thursday, November 27, 2008

Burnt by ByteBuffer

In my last post about this, I had "solved" my ByteBuffer problem rather crudely, by setting the position back to zero. But I wasn't really happy with that - I figured I should know where the position was getting modified.

Strangely, the bug only showed up when I had debugging code in place. (Which is why the bug didn't show up until I put in debugging code to track down a different problem!) That told me that it was probably the debugging code itself that was changing the ByteBuffer position.

I started putting in asserts to check the position. The result baffled me at first. Here's what I had:
int pos = buf.position();
assert(pos == buf.position());
Since mark() saves the position and reset() restores the position, I figured this should never fail. But it did! It turns out, what was happening was this:
method1:                         method2:
... buf.mark();
method2(); ...
... buf.reset();
The problem was that the nested mark() in method2 was overwriting the first mark(). So the outer reset() was restoring to the nested position, not to its own mark.

A classic example of why mutable state can cause problems. So not only does ByteBuffer burden you with this extra baggage, but it's also error prone baggage. Granted, it's my own fault for using it improperly.

The fix was easy. I quit using mark() and reset() and instead did:
   int pos = buf.position();
That solved the problem.

I almost got burnt another way. I had used buf.rewind() to set the position back to zero. When I read the documentation more closely I found out that rewind() also clears the position saved by mark(). So if a nested method had called rewind() that would also have "broken" my use of mark/reset.

Oh well, I found the problem and fixed it, and now I know to be more careful. On to the next bug!

PS. It's annoying how Blogger loses the line spacing after any kind of block tag (like pre).

Computers Not for Dummies

As much as we've progressed, computers still aren't always easy enough to use.

A few days ago I borrowed Shelley's Windows laptop to use to connect to my jSuneido server.

Of course, as soon as I fired it up it wanted to download and install updates, which I let it do. I thought I was being nice installing updates for her. But when I was done, the wireless wouldn't connect. It had been working fine up till then (that's how I got the updates). I just wrote it off to the usual unknown glitches and left it.

But the next day, Shelley tried to use the wireless and it still wouldn't connect. Oops, now I'm in trouble. I tried restarting the laptop and restarting the Time Capsule (equivalent to an Airport Extreme base station) but no luck. It was late in the evening so I gave up and left it for later.

Actually, the problem wasn't connecting - it would appear to connect just fine, but then it would "fail to acquire a network address" and disconnect. It would repeat this sequence endlessly.

I tried the usual highly skilled "messing around" that makes us techies appear so smart. I deleted the connection. I randomly changed the connection properties. Nothing worked.

Searching on the internet found some similar problems, but no solutions that worked for me.

One of the things I tried was turning off the security on the Time Capsule. That "solved" the problem - I could connect - but it obviously wasn't a good solution.

While I was connected I checked to see if there were any more Windows updates, figuring it was probably a Windows update that broke it, so maybe another Windows update would fix it. But there were no outstanding "critical" updates. Out of curiosity I checked the optional updates and found an update for the network interface driver. That seemed like it might be related.

Thankfully, that solved the problem. I turned the wireless security back on and I could still connect.

It still seems a little strange. Why did a Windows update require a new network interface driver? And if it did, why not make this a little more apparent. And why could it "connect" but not get an address? If the security was failing, couldn't it say that? And why does the hardware driver stop the security working? Is the security re-implemented in every driver? That doesn't make much sense.

But to get back to my original point, how would the average non-technical person figure out this kind of problem? Would they think to disable security temporarily (or connect with an actual cable) so they could look for optional updates that might help?

Of course, it's not an easy problem. I'd like to blame Microsoft for their troublesome update, but they have an almost infinite problem of trying to work with a huge range of third party hardware and drivers and software. Apple would argue that's one of the benefits of their maintaining control of the hardware, but I've had my share of weird problems on the Mac as well.

Wednesday, November 26, 2008

jSuneido Bugs

After I got to the point where I could start up the IDE on a Windows client from the jSuneido server, I thought it would be a short step to getting the test suite to run. (other than the tests that relied on rules and triggers which aren't implemented yet)

What I should have realized is that running the IDE, while impressive (to me, anyway), doesn't really exercise much of the server. It only involves simple queries, primarily just reading code from the database. Whereas the tests, obviously, exercise more features.

And so I've been plugging away getting the tests to pass, one by one, by fixing bugs in the server. Worthwhile work, just not very exciting.

Most of the bugs from yesterday resulted from me "improving" the code as I ported it from C++ to Java. The problem is, "improving" code that you don't fully understand is a dangerous game. Not surprisingly, I got burnt.

Coincidentally, almost all of yesterdays bugs related to mutable versus immutable data. The C++ code was treating certain data as immutable; it would create a new version rather than change the original. When I ported this code, I thought it would be easier/better to just change the original. The problem was that the data was shared, and changing the original affected all the places it was shared, instead of just the one place where I wanted a different version. Of course, in simple cases (like my tests!) the data wasn't shared and it worked fine.

Some of the other problems involved ByteBuffer. I'm using ByteBuffer as a "safe pointer" to a chunk of memory (which may be memory mapped to a part of the database file). But ByteBuffer has bunch of extra baggage, including a current "position", a "mark position", and a "limit". And it has a bunch of extra methods for dealing with these. It wouldn't be so bad if you could ignore these extras. But you can't, because even simple things like comparing buffers only compare based on the current "position". Of course, it's my own fault because obviously somewhere I'm doing something that changes that position. Bringing me back to the mutability issue.

For the most part I think that the Java libraries are reasonably well designed. Not perfect, but I've seen a lot worse. But for my purposes it would be better if there was a lighter weight "base" version of ByteBuffer without all the extras.

I can see someone saying that I'm "misusing" ByteBuffer, that since I'm coming from C/C++ and I'm trying to get back my beloved pointers. But I don't think that's the case. The reason for using ByteBuffer this way is that it's the only way to handle memory mapped files.

I guess one option would be to limit the use of ByteBuffer to just the memory mapped file io, and to copy the data (e.g. into byte arrays) to use everywhere else. But having to copy everything kind of defeats the purpose of using memory mapped access. Not to mention it would require major surgery on the code :-(

Monday, November 24, 2008

More on Why jSuneido

A recent post by Charles Nutter (the main guy for jRuby) reiterates the advantages of running a language on top of the Java virtual machine.

Thursday, November 20, 2008

jSuneido Slow on OS X - Fixed

Thankfully, I found the problem. Apart from the time wasted, it's somewhat amusing because I was circling around the problem/solution but not quite hitting it.

My first thought was that it was Parallels but I quickly eliminated that.

My next thought was that it was a network issue but if I just did a sequence of invalid commands it was fast.

I woke up in the middle of the night and thought maybe it's the Nagle/Ack problem, and if so, an invalid command wouldn't trigger it because it does a single write. But when I replaced the database calls with stubs (but still doing similar network IO) it was fast, pointing back to the database code.

Ok, maybe it's the memory mapping. I could see that possibly differing between OS X and Windows. But when I swapped out the memory mapping for an in-memory testing version it was still slow.

This isn't making any sense. I still think it seems like a network thing.

I stub out the database calls and it's fast again, implying it's not the network. But in my stubs I'm returning a fixed size record instead of varying sizes like the database would. I change it to return a random size up to 1000 bytes. It's still fast. For no good reason, I change it to up to 2000 bytes and it's slow!

I seem to recall TCP/IP packet size being around 1400 bytes so that's awfully suspicious.

I insert client.socket().setTcpNoDelay(true) into the network server code I'm using and sure enough that solves the problem. (Actually first time around I set it to false, getting confused by the double negative.)

A better solution might be to use gathering writes, but at this point I don't want to get distracted trying to implement this in someone else's code.

This doesn't explain why the problem only showed up on OS X and not on Windows. There must be some difference in the TCP/IP implementation.

In any case, I'm happy to have solved the problem. Now I can get back to work after a several day detour.

Tuesday, November 18, 2008

jSuneido Slow on OS X

The slowness of jSuneido isn't because of running through Parallels. I tried running the client on two external Windows machines with the same slow results as with Parallels.

Next, in order to eliminate Eclipse as the problem, I figured out how to package jSuneido into a jar file, which turned out to be a simple matter of using Export from Eclipse.

However, when I tried to run the jar file outside Eclipse under OS X, I got an error:

java.lang.NoClassDefFoundError: java/util/ArrayDeque

At first I assumed this was some kind of classpath issue. But after messing with that for a bit I finally realized that it was because the default Java version on OS X is 1.5 and ArrayDeque was introduced in 6 (aka 1.6).

From the web it seems that Java 6 is not well supported on the Mac. Apple has an update to get it, but it still leaves the default as Java 1.5 And the update is only for 64 bit. I didn't come across any good explanation of why Apple is dragging it's feet with Java 6.

I actually already had Java 6 installed since that was what I was using in Eclipse. (Which is why it was working there.)

But ... same problem, painfully slow, running the jar file outside Eclipse (but still on OS X)

Running the jar file on Windows under Parallels was fast, so the problem isn't the Mac hardware (not that I thought it would be).

I'd like to try running jSuneido under Java 1.5 to see if that works any better (since it is the OS X default). But in addition to Deque (which I could probably replace fairly easily) I'm also using methods of TreeMap and TreeSet that aren't available.

What's annoying is that I thought I was protected from this because of the compliance settings in Eclipse:
Maybe I'm misinterpreting these settings, but I expected it to warn me if my code wasn't compatible with Java 1.5

So far this doesn't leave me with any good options

- I can run slowly from Eclipse - yuck

- I can package a jar file and copy it to Windows - a slow edit-build-test cycle - yuck

- I can install Eclipse on Windows under Parallels and do all my work there - defeating the purpose of having a Mac - yuck

The real question is still why jSuneido is so slow running on OS X. I assume it's something specific in my code or there would be a lot more complaints. But what? Memory mapping? NIO? And how do I figure it out? Profile? Maybe there are some Java options that would help?

PS. I should mention that it's a major difference in speed, roughly 20x

Monday, November 17, 2008

Thank Goodness

I was excited when I got to the point where I could start up a Suneido client from my Java Suneido server.

But ... as I soon realized, it was painfully slow. I wasn't panicking since it's still early stages, but it was nagging me. What if, like many people say, Java is just too slow?

I kept forgetting to try it at work on my Windows PC since at home I'm going though the Parallels virtual machine.

Finally I remembered, and ... big sigh of relief ... it's fast on my Windows PC. I haven't done any benchmarks but starting up the IDE seems roughly the same as with the cSuneido server.

I'm not quite sure why it's so slow with Parallels - that's a bit of a nuisance since I work on this mostly at home on my Mac. Maybe something to do with the networking? But at least I don't have a major speed issue (yet).

I'm also still running jSuneido from within Eclipse. That might make a difference too. One of these days I'll have to figure out how to run it outside the IDE!

Sunday, November 16, 2008

Flailing with Parallels 4

The new version of Parallels is out. I bought it, downloaded it, and installed it. They've changed the virtual machine format so you have to convert them. The slowest part of this process was making a backup (my Windows Vista VM is over 100 gb).

Everything worked fine, and I should have left well enough alone, but during the upgrade process I noticed that my 30 gb virtual disk file was 80 gb. So I thought I'd try the Compressor tool. (I'd never used it before.)

I got this message:

Parallels Compressor is unable to compress the virtual disk files,
because the virtual machine has snapshots, or its disks are either
plain disks or undo disks. If you want to compress the virtual disk
file(s), delete all snapshots using Snapshot Manager and/or disable
undo disks in the Configuration Editor.

So I opened the Snapshot Manager and started deleting. I deleted the most recent one, but when I tried to delete the next one it froze. I waited a while, but nothing seemed to be happening and the first deletion had been quick. I ended up force quitting Parallels, although I hated doing this when my virtual machine was running since that's caused problems in the past.

But when I restarted Parallels it was still "stuck". Most of the menu options were grayed out. When I tried to quit I got:

Cannot close the virtual machine window.
The operation of deleting a virtual machine snapshot is currently in
progress. Wait until it is complete and try again.

I force quit Parallels again. I tried deleting the snapshot files but that didn't help. Force quit again.

I had, thankfully, backed up the upgraded vm before these problems. But it took an hour or more to copy the 100 gb to or from my Time Capsule. (That seems slow for a hardwired network connection, but I guess it is a lot of data.) So first I tried to restore just some of the smaller files, figuring the big disk images were probably ok. This didn't help.

Next I tried deleting the Parallels directory from my Library directory, thinking that might be where it had stored the fact that it was trying to delete a snapshot. This didn't help either.

So I bit the bullet and copied the entire vm from the backup. An hour later I start Parallels again, only to find nothing has changed - same problem. Where the heck is the problem?

The only other thing I can think of is the application itself so I start reinstalling. Part way through I get a message to please quit the Parallels virtual machine. But it's not running??? I look at the running processes and sure enough there's a Parallels process. Argh!

In hindsight, the message that "an operation" was "in progress" should have been enough of a clue. But I just assumed that force quitting the application would kill all of its processes. I'm not sure why it didn't. Maybe Parallels "detached" this process for some reason? I also jumped to the (incorrect) conclusion that there was a "flag" set somewhere that was making it think the operation was in progress.

If this had been on Windows, one of the first things I would have tried is rebooting, which would have fixed this. But I'm not used to having to do that on OS X. I probably didn't need to reboot this time either, killing the leftover process likely would have been sufficient. But just to be safe I did.

Sure enough, that solved the problem, albeit after wasting several hours. Once more, now that everything was functional, I should have left well enough alone, but I can be stubborn and I still had that oversize disk image.

This time I shut down the virtual machine before using the Snapshot Manager and I had no problems deleting the snapshots.

But when I restart the vm and run Compressor, I get exactly the same message. I have no snapshots, and "undo disks" is disabled. I'm not sure what "plain" disks are, but mine are "expandable" (which is actually a misleading name since they have a fixed maximum size) and the help says I should be able to compress expandable drives. I have no idea why it refuses to work.

While I'm in the help I see you can also compress drives with the Image Tools so I try that and finally I have success. My disk image file is now down to 30 gb. I'm not sure it was worth the stress though!

Saturday, November 15, 2008

Library Software Sucks

Every so often I decide I should use the library more, instead of buying quite so many books. Since the books I want are almost always out or at another branch, I don't go to the library in person much. Instead, I use their web interface.

First, I guess I should be grateful / thankful that they have a web interface at all.

But it could be so much better!

I'm going to compare to Amazon, not because Amazon is necessarily perfect, but it's pretty good and most people are familiar with it.

Obviously, I'm talking about the web interface for my local library. There could be better systems out there, but I suspect most of them are just as bad.

Amazon has a Search box on every page. The library forces you to go to a separate search page. Although "Keyword Search" is probably what you want, it's on the right. On the left, where you naturally tend to go first, is "Exact Search". Except it's not exactly "exact", since they carefully put instructions on the screen to drop "the", "a", "an" from the beginning. This kind of thing drives me crazy. Why can't the software do that automatically? (It's like almost every site that takes a credit card number wants you to omit the spaces, even though it could do that trivially for you.) However, they don't tell you equally or more important tips like if you're searching for an author you have to enter last name first.

Assuming you're paying attention enough to realize you want the keyword search, you now have to read the fine print:
Searching by "Keyword" searches all indexed fields. Use the words and, or, and not to combine words to limit or broaden a search. If you enter more than one word without and, or, or not, then your keywords will be searched as an exact phrase.
Since they don't tell you which fields are indexed the first sentence is useless. The last sentence is surprising. Why would you make the default searching by "exact phrase". If you wanted "exact" wouldn't you be using the exact search on the left?
Oops. I guess I was too slow. I'm not sure what session it's talking about since I didn't log in. When I click on "begin a new session" it takes me back to the Search screen, with my search gone, of course.

Let's try "edward abbey" - 7 results including some about him or with prefaces by him.

How about "abbey, edward" - 5 results including one by "Carpenter, Edward" called "Westminster Abby". So much for exact phrase. Maybe the comma?

Try "abbey edward" - same results so I'm not sure what they mean by "exact phrase"

The search results themselves could be a lot nicer. No cover images. And nothing to make the title stand out. And the titles are all lower case. That may be how librarians treat them, but it's not how anyone else writes titles.

Oops, sat on the search results screen too long. At least this time it didn't talk about my session.

Back on the search results, there's a check box to add to "My Hitlist". When I first saw that I was excited. Then I read that the list disappears when my "session" ends. Since my "session" seems to get abrubtly ended fairly regularly, the hitlist doesn't appear too useful.

It would be really nice if you could have persistent "wish lists" like on Amazon.

Once you find a book you can reserve it. That's great and it's the whole reason I'm on here. But it presents you with a bit of a dilemma. If you reserve a bunch of books, they tend to arrive in bunches, and you can't read them all before they're due back. But if you only reserve one or two, then you could be waiting a month or two to get them.

Ideally, I'd like to see something like Rogers Video Direct, where I can add as many movies as I want to my "Zip List" (where do they come up with these names?) and they send me three at a time as they become available. When I return one then they send me another.

Notice the "Log Out" on the top right. This is a strange one since there's no "Log In". It seems to take you to the same screen as when your "session" times out. The only way to log in that I've found is to choose "My Account", which then opens a new window. This window doesn't have a Log Out link, instead it tells you to close the window when you're finished by clicking the "X" in the top right corner. Of course, I'm on a Max so my "X" is in the top left. But that's not a problem because if you stay in the My Account window too long (a minute or so) it closes itself.

Of course, this assumes you didn't delay too long in clicking on My Account, because then you'll get:
Obviously, the lesson here is that you better not dither. But why? The reason the web scales is that once I've downloaded a page, I can sit and look at it as long as I like, without requiring any additional effort from the server. I'm not sure why this library system is so intent on getting rid of me. I could see if it was storing a bunch of session data for me that it might just be over aggressive about purging sessions. But I'm just browsing, it should be stateless as far as the server is concerned.

And then there's the quality of the data itself. I'd show you some examples, but:
I've tried reporting errors in the data, like author's name duplicated or mis-spelled but there doesn't seem to be any process for users to report errors. It'd be nice if users could simply mark possible errors as they were browsing so library staff could clean them up.

I could go on longer - what about new releases? what about suggestions based on my previous selections? what about a history of the books I've borrowed? what about reviews? Amazon has all these. But I'm sure by now you get the point.

I suspect the public web interface is an afterthought that didn't get much attention. And it's a case where the buyers (the libraries/librarians) aren't the end users (of the public web interface anyway). And since these are huge expensive systems there's large amounts of inertia. Even if something better did come along it would be an uphill battle to get libraries to switch.

And it's unlikely any complaints or suggestions from users even get back to the software developers. There are too many layers in between. I've tried to politely suggest the software could be better but all I get is a blank look and an offer to teach me how to use it. After all, I should be grateful, it's not that long ago we had to search a paper card catalog. I'm afraid at this rate it's not likely to improve very quickly.

The copyright at the bottom reads:

Copyright © Sirsi-Dynix. All rights reserved.
Version 2003.1.3 (Build 405.8)

I wonder if that means this version of the software is from 2003. Five years is a long time in the software business. Either the library isn't buying the updates, or Sirsi-Dynix is a tad behind on their development. But hey, their web site says "SirsiDynix is the global leader in strategic technology solutions for libraries". If so, technology solutions for libraries are in a pretty sad state.

Friday, November 14, 2008

Wikipedia on the iPhone and iPod Touch

I recently stumbled across an offline Wikipedia for the iPhone and iPod Touch. I don't have either, but Shelley has an iPod touch so I installed it there. It wasn't free but it was less than $10. You buy the app from the iTunes store and then when you run it for the first time it downloads the actual Wikipedia. It's 2gb so it takes a while. (and uses up space)

I had a copy of Wikipedia on my Palm and since I drifted away from using/carrying my Palm it's one of the few things I miss.

This version doesn't have any images, but otherwise it seems pretty good. The searching isn't the greatest, for example Elbrus and Mt Elbrus didn't find anything, but Mount Elbrus did.

I'm not sure exactly why I love having an encyclopedia at my fingertips. But there's something about having so much "knowledge" in your pocket. I'm just naturally curious I guess.

Despite trying to cut down on my gadget addiction, this adds another justification for buying an iPhone or iPod Touch. I hate the monthly fees and long term contracts with the iPhone, but it's definitely a more versatile gadget.

Wednesday, November 12, 2008

Lightroom and Lua

I'm a fan of Adobe Lightroom. It has a great user interface - understated elegance and smooth user experience. And it does a great job, it's fast, and it's relatively small.

I knew Lightroom used Lua, for example in plugins. But I was surprised when I came across a presentation that said 63% of Lightroom specific code is written in Lua (with the remainder in C, Objective C, and C++).

That's impressive. Many people would assume that a program written in a "scripting" language would be slow and ugly. That might be true of many programs (scripting language or not!) but I think this proves otherwise.

I also find it reassuring because, in a sense, it validates the approach we took in writing our trucking software. We wrote it in Suneido (comparable to a scripting language) for many of the same reasons Adobe wrote Lightroom in Lua.

Of course, the difference is that they chose an existing language, whereas I wrote Suneido. I would not have been as impressed if Adobe had chosen to write their own scripting language. Of course, that raises the question whether we should have used an existing language to write our trucking software. If I was starting today, I might say yes, but given the situation when we started (years ago) I'm not sure I'd do it any differently.

Tuesday, November 11, 2008

Another Milestone

A standard Suneido WorkSpace, right? Yes, but what you can't see is that it's a client running from the jSuneido server (which happens to be running on OS X). Pretty cool (for me, anyway!)

It actually wasn't too hard to get to this point. I had a few annoying problems with byte order since Java ByteBuffer's default to big endian, but native x86 order is little endian. In most places cSuneido was using big endian instead of native order but not everywhere, as I found out the hard way! One last gotcha was that ByteBuffer slice doesn't keep the source's byte order - easy enough to handle but somewhat counterintuitive.

As I keep reminding myself, there's still lots to do, but it's still nice to reach some tangible milestones :-)