The Software Life: March 2007

Friday, March 30, 2007

ETech 2007 Last Day

We started off with a few interesting keynotes. One on Adobe's new Apollo platform - an alternative desktop runtime for web apps (HTML, CSS, JavaScript, Flash). It looks pretty neat, especially the features for running apps when you're offline (not connected to the internet). But is HTML/CSS/JavaScript the best way to write apps? I'm not sure.

Next, Google gave a presentation on their project to add 1.6 MW of solar power at their headquarters. They also talked about other environmentally friendly practices at Google. Again, it seems Google is trying hard to not be evil despite their huge size.

After the break I went to a session by Andy Kessler on how Moore's law will soon be "invading" medicine - leading to better and cheaper health care. He was an entertaining speaker.

James Duncan's session on JavaScript and Zimki was quite interesting. He talked about some features of JavaScript that I wasn't aware of. Zimki is a JavaScript server and web app framework with some novel features. Fotongo offers paid hosting for Zimki, but it will also be released open source in the next few months.

At lunch I discovered a new coffee shop near the hotel - Brickyard. Although Starbucks is a good default, I like to find local shops especially if they have better coffee! It didn't hurt that it was another beautiful day and I could sit outside in their courtyard and enjoy the sun.

The sessions were thinning out by the afternoon. I went to one on why you should try to design your web app so it could be run as a text adventure (sounds crazy but actually made some sense) I hadn't recognized the presenter's name but he turned out to be the guy who had presented a Rails game (Unroll) at OSCon. He's an interesting character so I was glad I'd gone to this session although it ended up being quite short.

My last session was by Forest Higgs on building your own 3D printer. Commercial 3D printers still cost tens of thousands of dollars to buy and require expensive consumables. You can now build your own for a few hundred dollars. Forest has his own design but he also talked about the Reprap project. The goal is not just to make an open source 3D printer design, but also one that can replicate (most of) itself. (They use microprocessors which obviously can't be manufactured by a home machine yet!) He also talked about the implications of widespread grass roots manufacturing capabilities. Thought provoking.

And that was it for ETech 2007. Although I heard a lot of grumbling that it wasn't as good as previous years, I still think it was worthwhile. Lots of new ideas that will help fuel my brain.

I rounded off the day with supper at The Fish Market. I couldn't be bothered to wait for a table in the restaurant (long lineup) so I grabbed a table in the bar with a great view of the water. There was a limited menu in the bar but the fish and chips was the best I'd had in a long time, the waitress was cute and cheerful, and the sunset was beautiful - what more could you ask for!

Wednesday, March 28, 2007

Amazon S3 with cURL?

As I've talked about in previous posts, I've been searching for a good way to access Amazon S3 from Suneido (to use for backing up our customer's databases).

The SmugMug presentation recommended using cURL. We already use cURL for a variety of tasks (such as FTP) so we'd be happy to use this option. But S3 requires SHA-1 hashes which I didn't think cURL could do. Maybe you can calculate the hashes separately and use cURL just for the transfer. I'll have to look into this.

Day 3 at ETech 2007

Another good day at ETech. I've found a nearby Starbucks that is in a lovely courtyard and isn't very busy. This is where I start my day with a Grande Latte and a little peace before the mind-storm.

The keynotes were all good, but I really enjoyed Danah Boyd's presentation. It's hard not to appreciate someone who is so obviously passionate about their work. I also get a kick out of geeks who are brave enough to dress idiosyncratically. Personally, I can't imagine intentionally doing something to make myself stand out more! Except in my work, of course :-)

Lunch was outside in the Seaport Courtyard again. The weather was wonderfully sunny and warm (especially compared to yesterdays horrendous winds). After lunch I sat by the pond in Seaport Village for a quiet coffee before heading back for afternoon sessions. The sun was shining on the water and the ducks were entertaining. Mama duck brought her three ducklings within inches of my feet. I wished I had my camera.

The presentation on SmugMug's use of S3 was good. This year a number of the sessions have been thinly disguised marketing pitches, but this was a good, seemingly honest, first hand experience report.

Next was a combined presentation on Google's MapReduce and Hadoop - an open source implementation of MapReduce that is part of the Apache Lucene search engine project. Although map and reduce are familiar from functional programming, their application to processing large data sets with clusters of computers is pretty neat. In the past, this would have been out of reach of most of us who don't have access to clusters, but now you can "rent" as big a cluster as you want from Amazon EC2 (or you will be able to once EC2 is publicly available).

The last two sessions of the day that I attended were from Microsoft and IBM on various research projects. One of the Microsoft projects was Baku a "visual" programming environment for kids. I wonder if some of the ideas could be applied to allow end-users to do more complex "programming" in applications. One of the IBM projects was Koala - basically a macro recorder for the web with a unique slant towards sharing the resulting scripts on a wiki. Unfortuately, neither of these projects is publicly available yet.

I had supper at the Edgewater Grill in Seaport Village. The food and service were nothing great, but I enjoyed watching the sunset over the bay.

After supper I stopped in at the MAKE Fest but they only had a handful of exhibits so I didn't stay long.

Ideas from Jeff Jonas

There were some powerful ideas in Jeff Jonas's talk on analytics that I think are quite reasonable to implement in a simple form. One is the idea of "persistent queries"

Most of Jeff's work has been on identifying people - e.g. terrorists and criminals. For example, you get a tip that a criminal is flying into an airport using a certain name etc. So you query the passenger lists but you don't find anything. You're not sure when he's coming in, so you can keep querying every day or hour, but that's not really practical. Instead, you make the query "persistent" so if new data arrives that matches your query you will be notified.

The naive way to implement this is to simply run the query against incoming data. But that doesn't scale. The more persistent queries you have, the slower it will get to enter new data. You can do it as a batch process e.g. nightly - Jeff calls this trying to "boil the ocean" - but it still doesn't scale well and it also doesn't provide the results in real time.

Instead, you turn the problem around. You store the persistent queries as "data", and you treat the incoming data as "queries". So each incoming record requires one "query" regardless of the number of persistent queries. Very cool.

Obviously, there are some issues here. One is that a persistent query probably involves only a few attributes whereas the incoming data will have many attributes. So you're not doing a normal exact match. And you probably will need ways to expire queries.

You might think that this is cool but you don't need to search for terrorists. I think it can be more broadly applied. For example, lets say you're a real estate agent and someone comes in asking for a certain type of property. You do a search but you don't find anything. So you make the query persistent and a few days later you get notified of a new property that's come available. You call your client and make the sale.

Tuesday, March 27, 2007

ETech 2007

Just finished the second day (of four) of ETech.

I've heard some complaints that it's not as "good" as it used to be. The problem may be that a lot of this stuff isn't "new" anymore. When they're writing about stuff in mainstream magazines then you know it's no longer cutting edge.

I was signed up for the tutorials on Monday, but Kathy Sierra's was canceled (see her blog for more on the craziness behind this). They offered the option of upgrading to the Executive Briefing at no extra charge so I decided to do that since the alternatives to Kathy's tutorial didn't excite me. I was sorry to miss out on the other tutorial by Avi Bryant since DabbleDB seems to be such a great product. On the other hand, the briefing included a lot of speakers and topics that I enjoyed. However, a number of the briefing speakers were also included in the regular program so there was a certain amount of duplication.

Today (Tuesday) was the start of the actual conference. From experience I've learned that the best way to pick sessions is to go by the speaker rather than the topic. That doesn't work when you've never heard of the speaker, but it's a useful heuristic. For example, I'm not all that interested in TPM and DRM but I knew that Cory Doctorow would be interesting and he was. On the other hand, a session on Haml that looked really good (but I'd never heard of the speaker) was pretty mediocre. I also enjoyed Jeff Jonas in the briefing and his keynote so I made a point of attending his session, although again there was a certain amount of redundancy. And it was good to hear Jeff Hawkins, since I'm a long time fan of Palm and Handspring, and recently read his On Intelligence book. Unfortunately, I heard criticisms of his sessions from people who don't seem to see the same importance in his current work.

At lunch (great food, by the way) I sat with some guys from Sun. When I said I had a small company that developed software for the trucking industry one of them said they were surprised that someone like that would be at ETech. At first that seemed to make sense - I've never run into our competition (or other similar vertical software companies) at ETech. But then I started to question it. Isn't it just as important for small companies to be aware of emerging technology? I guess if you're really small you can't afford to go to conferences like ETech, but I don't think that's what they meant. And doesn't a lot of the emerging technology come from small companies? Maybe it's because they see ETech as an opportunity for big companies to come and see what the startups are coming up with. In any case, I think I can learn more, and more importantly, have a better chance of applying it, in my small company than they have in their big company.

So far, so good. I'll post more when I get a chance. I refuse to be one of the many people who spend the sessions with their heads down typing on their laptops. At times, the clicking of so many keyboards gets to be cumulatively loud enough to be annoying. Who are they chatting with? About what? Of course, this is from probably the only person there who didn't have a cell phone (if not two or three) and who asks the same thing about everyone on their cell phones - who are they talking to all the time? About what? I realize I'm not the most social person, but nor, supposedly, are many of these geeks. (Of course, this conference isn't all geeks.)

The trend towards Apple Mac laptops continues to grow. I would guess Apple has 60 to 70 percent of this particular market.

For pictures see Flickr/etech07

Monday, March 12, 2007

Air Canada Web Problem

Here's what I got when I entered my booking number and name into Air Canada's web site to check my booking:

java.lang.IllegalArgumentException: Empty country component in 'value' attribute in 
message:Empty country component in 'value' attribute in 

stacktrace:
...

A good reminder to make sure you catch errors and give users a more friendly message. Most people probably don't want to see the stack trace :-)

PS. I managed to get it to work. The problem seems to be that I clicked on a link from an Air Canada email that took me straight to the bookings page, bypassing the page where you pick your language. Another good lesson - don't assume people will always enter your site via the "front" page. These days, a large percentage of people enter via searches that point to within the site. (In this case it was their own link so you'd think they'd handle it!)

Friday, March 09, 2007

Holiday Inn Web Annoyance

I've complained about this kind of thing before, but I continue to be surprised that someone doesn't catch (and fix) these kinds of annoyances.

I was making a booking on Holiday Inns web site and had to enter my billing address. When I submitted it I got an error saying "spaces are not allowed in zip/postal codes". I shook my head and remove the space. I get another error saying the format is invalid. I read the fine print and it says you have to enter postal codes with a hyphen. Huh? Since when do Canadian postal codes have a hyphen in them?

The "funny" part is, I bet they added that explanation because people kept getting it "wrong". I wonder if it ever occurred to them to simply accept a variety of formats? e.g. postal codes with space, hyphen, or no separator.

Even better, they could use a little JavaScript to validate fields "on the fly" so it would be marked as invalid as soon as you left the field.

Thursday, March 08, 2007

Those that can, do...

As the saying goes, "Those that can, do, those that can't, teach."

I just listened to a podcast by Jon Udell with Marty Collins the senior marketing manager with the solution architecture group. Also see Jon's blog post about it.

As I understand it, the goal of her group is to "evangelize architecture". I don't know Marty's background but she calls herself a marketing person, and she talks like one e.g. "how do I push our content". Her group consists of 18 ex-architects.

First, I should admit I have a bias against anyone who calls themselves an "architect". In the software world, if a project "fails", the "architect" rarely gets blamed. Even "real" building architects have been know to design buildings that look great, but the roof leaks. Of course, they would similarly have you blame the builders, not the architect.

Don't get me wrong, I still want good architecture, but my feeling is that good architecture comes less often from self-proclaimed "architects", and more often from good programmers. How many "architects" did Unix or Apache or Linux or TeX have?

Marty tells us that one of the reasons for their group is that architects tend to be possessive and secretive about their work. According to her, even the architects working internally within Microsoft won't (or aren't allowed) to share because of the fear of "losing competitive advantage". Or is it that their work doesn't stand up well to public scrutiny?

So the architecture evangelists aren't working architects. Hmmm... And they are led by a marketing person. Ouch. If the working architects won't share, where are they getting the material they produce? Are they just sitting around dreaming up architectures? That doesn't sound very useful to me.

One of the ideas discussed is that companies should watch for people blogging in their area and jump in and contribute comments. Great idea. Jon asked Marty to let him know when they start doing this. But apparently, first they have to get some fancy new tool that is still in beta to let them monitor and comment on blogs. To me this is very much the wrong approach. It's a perfect example of something that you can try out right away and find out if it works or not. If it works, and once you have done it for a while, then you can think about tools. Why can't each member of her group, sit down, find a blog and add a comment. Not next month, not after getting more "tools", but right now, today! I'm not sure if this is a psychological block, or a bureaucratic one.

I've always liked Jon Udell but I am still wondering about his move to go to work for Microsoft. One of his goals is to reach a broader audience and he feels that Microsoft can help with this, but I'm not so sure.

Saturday, March 03, 2007

Slow Code

One of the problems we run into with our code is that it ends up too slow.

Before I continue I want to assure you that I'm not suggesting "premature optimization". I fully agree that optimizing before it's necessary is the wrong thing to do.

What I am suggesting is that "too slow" is a bug, and like other bugs it should be avoided if possible, or at least caught as soon as possible.

One of the causes of slowness is code that is ON² (or worse!). Nested loops are a common cause. Programmers either don't recognize the nested loops (hidden in separate functions or in recursion), or else they don't realize the dangers of them. For example, if you have 10 items and you iterate over them twice that's 20 loops. If you have one loop inside another that's 100 loops. 5 times as many loops but not a big issue. But what if you have 1000 items? Iterating twice is 2000 loops, but nested it's 1,000,000 loops or 500 times as many. If each loop takes .001 seconds then 2000 loops is 2 seconds, but 1,000,000 loops is 1000 seconds or roughly 17 minutes. In an interactive application that's a big problem.

I think one of the reasons for this problem is that programmers unconsciously relate speed to number of lines of code. So:

for each item
  for each item
     do stuff

can look shorter than:

for each item
  do stuff
for each item
  do stuff

Higher level languages, libraries, toolkits, frameworks etc. can make this even worse. Now one line of code can do a huge amount of work, but to the programmer it's just one little line of code. How bad can it be?

Programmers normally test with small amounts of data. When you're only dealing with a few items, both 2N and N² are fast. You don't get the feedback about the slowness until much later, when the user has larger amounts of data. And programmers are inclined to ignore whining from users anyway!

Another related issue is the difference between dealing with things in memory and dealing with things in the database. The difference is huge, but again, in testing it may be barely noticeable. And, again, the actual number of lines of code may be comparable. (This can be a problem with tests. For one test it doesn't matter. For a whole test suite it can mean the difference between a suite that runs in a minute and one where you have to run it overnight.)

Unnecessarily reading "everything" from the database into memory is another common mistake. If you actually need all the data in memory at once, fine, but if you just need to find a particular record or set of records then it's an inefficient way to do it. In our Ruby on Rails project I regularly come across code that does model.Find(:all). At first glance it might seem harmless. But what happens when that table contains 100,000 records? (Sadly, part of the reason for this is that Rails doesn't seem to provide any way to iterate through a query, so the alternative to Find(:all) is to do your own SQL - an ugly choice.)

I don't have a solution to this problem, but I think we need to become more aware of it. I think it's one of the reasons why our software gets slower at the same time as our computers get faster!

The Trouble With Programming

An interesting article/interview with Bjarne Stroustrup:

http://www.research.att.com/~bs/MIT-TR-original.pdf