Saturday, May 15, 2021

Blogger Issues

Yesterday afternoon I wrote a blog post about my gSuneido progress. In the evening I got an email saying "Your post has been deleted" because "Your content has violated our Malware and Viruses policy."

The post was just some text and a couple of screenshots. It's hard to see how it could contain malware or viruses. Of course, it was gone, so I couldn't prove that. And of course, there was no human to contact.

It was a funny because it actually made me a bit upset. I think that was partly from the feeling of helplessness against the faceless Google behemoth. A bit like dealing with the government. And it's free, so what can you say?

This morning I got another email saying "We have re-evaluated the post. Upon review, the post has been reinstated." Who exactly is "we"? Somehow I doubt that was a human. Now our software gets to use the royal "we"? I suspect it would have been more honest to say "sorry, we screwed up"

It was still not showing up, but then I found they had put it back as a draft and I had to publish it again.

A quick search found someone else reporting a similar issue and Google responding with "we're aware of the problem".

It was a good reminder to back up my content. Not that there's anything too important, but it's of nostalgic interest to me, if nothing else. (You can download from the blog Settings.)

Friday, May 14, 2021

Another gSuneido Milestone

This screenshot probably doesn't look too significant - just the Suneido IDE. The only noticeable difference is down in the bottom left corner. Normally it would show something like:

Instead it shows:

That means gSuneido is running "standalone", i.e. using its own database instead of connecting as a client to a jSuneido database. While the surface difference is tiny, internally this is a huge jump.

I've been working away on the gSuneido database over the last year at the same time that we've been debugging and then rolling out the gSuneido client.

If I had just ported the jSuneido database implementation it would have been much easier, but what would be the fun in that. I kept the query implementation but redesigned the storage engine and transaction handling. I'd call it second system effect, but it's more like third system since I also redesigned this for jSuneido.

I still have lots to do. Although the IDE starts up, it's quite shaky and easily crashed. Many of the tests fail. But even to get to this point a huge number of pieces have to work correctly together. It's a bit like building a mechanical clock and reaching the point where it first starts to tick.

Sunday, March 14, 2021

Twenty Years of cSuneido

Last week was a milestone. We (my company) finished converting all our customers from cSuneido (the original C++ implementation) to gSuneido (the most recent implementation, in Go).

That means I no longer have to maintain cSuneido and I no longer have to deal with C++. (sigh of relief)

cSuneido has had a good run. We first deployed it to the first customer in 2000, so it's been in continuous production use for over 20 years. It's served us well.

When I first started developing Suneido, in the late 1990's, C++ was relatively new on PC's. I started with Walter Bright's Zortech C++ which became Symantec C++ (and later Digital Mars C++). Later I moved to Microsoft C++ and MinGW.

Suneido, like most dynamic languages is garbage collected. But C++ is not. I implemented a series of my own increasingly sophisticated conservative garbage collectors. But eventually I admitted my time was better spent on other things and I switched to using the Boehm-Demers-Weiser conservative garbage collector. Under normal use conservative garbage collection works well. But there are cases where memory is not recycled and eventually you run out. That was somewhat tolerable on the client side, but it wasn't so good on the server side. (That was one of the factors that prompted the development of jSuneido, the Java version that we use on the server side. Another factor was the lack of concurrency support in C++ at that time.) It seemed for a while that the pendulum was swinging towards garbage collection. But Rust has given new life to manual memory management.

Honestly, I won't be sorry to leave C++ behind. It has grown to be extremely complex, and while you can avoid much of that complexity, it's hard to not be affected by it. I've also had my fill of unsafe languages. Even after 20 years of fixing bugs, there are very likely still things like potential buffer overflows in cSuneido. (Ironically, one of the things that added a lot of complexity to C++ was template generics. Meanwhile I am anxiously waiting for the upcoming addition of generics in Go. However, Go's generics will be much simpler than C++'s Turing complete template programming.)

While it might seem crazy to re-implement the same program (Suneido) three times, it's been an interesting exercise. I've learned things each time, and made improvements each time. It's been extra work to maintain multiple versions, but it's also caught bugs that would have been missed if I'd only had a single implementation. Doing it in three quite different languages - C++, Java, and Go, has also been enlightening. And having the hard constraint of needing to flawlessly run a large existing code base (about a million lines of Suneido code) means I've avoided most of the dangers of "second system" effects.

So far I've only implemented the client side of gSuneido. We are still using jSuneido (the Java version) for the server side. I'm currently working on implementing the database/server for gSuneido (in Go). Once that's finished I intend to retire jSuneido as well and be back to a single implementation to maintain, like the good old days :-) And given where I'm at in my career gSuneido will almost certainly be the last implementation I do. I wonder if it will last as long as cSuneido did?

Wednesday, February 10, 2021

Tools: Joplin Notes App

I recently (Nov.) started using Joplin, "An open source note taking and to-do application with synchronization capabilities" and I'm quite happy with it.

I've been a long time Evernote user (over 10 years). Although it was a bit rough at first (see A Plea to Evernote) it has worked well for me. Like Joplin, it meets my requirement for something that runs on Windows, Mac, and phone/tablet, and works off-line. (Joplin also runs on Linux.)

I'm pretty sure Evernote is an example of Conway's Law at work. Their versions have the same overall features, but there are enough small differences to be quite annoying when you're switching back and forth. You'd think someone at a high level would push for consistency. It's stupid stuff like one version puts a button at the right and another at the left. Then they came out with a new Mac version that was missing a bunch of features, and made yet another set of differences.

I've looked for alternatives in the past, but haven't found anything that matched my needs. I can't remember where I came across Joplin. I hadn't found it when I looked in the past. I wasn't specifically looking for an open source solution, but it's nice that Joplin is open source. It seems to have an active and growing community and user base. 

One of the things I like about Joplin is that it's primarily Markdown. Most of my notes are plain text (see Taking Notes) but it's nice to have a little formatting at times. There is a new WYSIWYG editor, but previously it was the standard split screen edit & preview, or toggle back and forth. I mostly stay in the raw markdown mode.

One of the things I feared about switching was having to leave all my old notes behind in another program. But Joplin can import Evernote's export format. I haven't moved everything yet (there's a lot) but I transferred my Suneido notebook which is roughly 2000 notes. It took a little while to sync/upload and then sync/download on other devices but it worked well. There are a few formatting glitches, but that isn't surprising.

Interestingly, Joplin doesn't (yet) run its own sync servers. Instead you can use Nextcloud, Dropbox, OneDrive, WebDAV, or the file system. I already use Dropbox so that was the easiest for me. They are working on their own sync server software.

Since I got Joplin I have hardly touched Evernote. I use Joplin every day to keep notes on my work. If you're looking for a notes app it's worth checking out.

Monday, January 04, 2021

Unix: A History and a Memoir

I just finished reading Unix: A History and a Memoir by Brian Kernighan. I'm not much of a history buff, but I enjoyed it, mostly because it brought back memories.

By the time I was in high school I was obsessed with computers. My father somehow arranged for me to get a Unix account in the university computer science department. I'm not sure of the details - my father worked on campus, but wasn't part of the university, he was an entomologist (studied insects) with Canada Agriculture.

My father also arranged permission for me to sit in on some computer science classes. But my high school principal refused to let me take a few hours a week for it. I can't recall why, something about disrupting my school work. Which is totally ridiculous given that I was at the top of my classes. You wonder what goes through the minds of petty bureaucrats.

I remember being quite lost at first. I was entirely self taught which left plenty of gaps in my knowledge. I was intimidated by the university and too shy to ask anyone for help, But I muddled along, teaching myself Unix and C. This would have been in 1977 or 1978, so the early days of Unix. Of course, it was in the universities first.

I recall being baffled that C had no way to input numbers. For some reason I either didn't discover scanf or didn't realize it was what I was looking for. It wasn't really a problem, I just figured out how to write my own string to integer conversions. When the C Programming Language book (by Kernighan and Ritchie) came out in 1978, that helped a lot.

Software Tools (by Kernighan and Plauger) was probably the book I studied the most. I implemented a kind of Ratfor on top of TRS-80 Basic, and implemented most of the tools multiple times over the years. I still have my original copy of the book - dog eared, coffee stained, and falling apart. A few years later, I reverse engineered and implemented my own Lex and then Yacc. Yacc was a challenge because I didn't know the compiler-compiler theory it was based on. Nowadays there are open source versions of all this stuff, but not back then.

I read many more Kernighan books over the years, The Elements of Programming Style (with Plauger), The Unix Programming Environment (with Pike), The Practice of Programming (with Pike), and more recently, The Go Programming Language (with Donovan).

Nowadays, with the internet and Stack Overflow, it's hard to remember how much harder it was to access information in those days (especially if you didn't talk to other people). I had one meeting with someone from the computer science department.  (Again, arranged by my father, probably because I was so full of questions.) Looking back it was likely a grad student (he was young but had an office). I questioned him about binding time. He didn't know what I was talking about. I don't remember why I was fixated on that question. I must have seen some mention of it in a book. Me and unsolved questions/problems, like a dog and a bone.

Presumably the Unix man pages were my main resource. But if you didn't know where to start it was tough. I remember someone gave me a 20 or 30 page photocopied introduction to Unix and C. That was my main resource when I started out. Nowadays, I'd be hard pressed to make it through a day of programming without the internet.

Monday, December 14, 2020

Coverage for Suneido

Since the early days of Suneido I've thought that it would be good to have some kind of coverage tool. But for some reason, I never got around to implementing it.

Code coverage is usually associated with testing, as in "test coverage". Lately I've been seeing it in connection to coverage based fuzz testing.

But coverage can also be a useful debugging or code comprehension tool. When you're trying to figure out how some code works, it's often helpful to see which parts are executed for given inputs. You can also determine that by stepping through the code in a debugger, but if the code is large or complex, than can be tedious, and doesn't leave a record that you can study.

For some reason I started thinking about it again and wondering how hard it would be to implement in gSuneido.

One of my concerns was performance. If coverage is too slow, it won't be used. And obviously, I didn't want to slow down normal, non-coverage execution.

While simple coverage is good, I find statement execution counts more useful. Statement counts verge on profiling, although profiling is generally more concerned with measuring time (or memory).

That got me wondering about approaches to counters. One interesting technique I came across is Morris approximate counters. That would allow an effectively unlimited count in a single byte. But I decided that the nearest power of two is a little too crude. Although it's usually not critical that large counts are exact, it's often helpful to see that some code is being executed N times or N+1 times relative to other code or to inputs.

16 bit counts (up to 65,535) are probably sufficient but I didn't want wrap-around overflow. I knew arithmetic that doesn't overflow was a standard thing but I couldn't remember the name. Eventually I found it's called saturation arithmetic. Sometimes people talk about "clamping" values to maximums or minimums (from electronics).

Often, to minimize coverage work, tracking is done on basic blocks. Normally that's part of the compiler, but I didn't really want to mess with the compiler. It's complex already and I didn't want to obscure its primary function. I realized that instead, I could get equivalent functionality based on the branches in the byte code. Obviously if you branch, that's the end of a basic block. And where you branch to is the start of a basic block. So I only need to instrument the byte code interpreter in the branch instructions. Basically the interpreter marked the start of executed blocks (branch destinations) and the disassembler identifies the end of blocks (branch origins).

I decided this would be a fun weekend project and a break from working on the database code. That didn't work out so well. At first I made rapid progress and I had something working quite quickly. Then things went downhill.

If I was tracking at the byte code level, I needed to connect that back to the source level. I had a disassembler that could output mixed byte code and source code, so that seemed like the obvious place to do it. Then I found that the disassembler didn't actually work that well. I'd only ever used it as a tool to debug the compiler. I spent a bunch of time trying to make the disassembler cover all the cases.

Meanwhile, I was starting to get depressed that it was turning into such a mess. The more I worked on it, the uglier it got. I don't usually mind when I take a wrong turn or decide to throw out code. That's just the way it goes sometimes. But when I can't find a good solution, and things keep getting worse, then I'm not a happy camper.

In the end, I had something that mostly worked. I checked it into version control so I'd have a record of it, and then I deleted it. My idea of using branches to identify basic blocks was valid, but the actual implementation (that I came up with) was far from elegant.

I maybe should have given up at this point. The weekend was over, I had more important things to work on. But I still thought it would be a worthwhile feature. And if I came back to it later I'd just have to figure it out all over again.

Once I abandoned the ugly code I felt much better. I decided to take a simpler approach. I'd add an option to the code generation to insert "cover" instructions (instrumentation) at the beginning of each statement. That was just a few lines of code. Then I just needed to implement that instruction in the byte code interpreter - a handful more lines of code. The overhead was relatively small, somewhere in the neighborhood of 5 to 10 percent.

And that was the core of it. A bit more code to turn it on and off, and get the results. Way easier, simpler, and cleaner than my first "clever" approach. Hopefully it will prove to be a useful feature.

Tuesday, December 01, 2020

Leaving HEY

TL;DR - I used HEY for five months, gave it a fair trial I think, had no huge complaints, but I've gone back to Gmail. For the details, read on.

I was excited when Basecamp (formerly 37 Signals) announced their HEY email. I asked for and eventually received an invitation. I’ve been a long time Gmail user, from back when you needed an invitation. Back in the days when Google’s motto was “do no evil”. They’ve since replaced that motto with “get all the money”. Which doesn’t make me comfortable giving them all my email.

Basecamp has a good record of supporting old products. (I still hold a grudge over Google Reader.) And they are more privacy minded than Google. I didn't have a problem paying for the service. Paying customers are often treated better than the users of "free" services.

I like a lot of the HEY features - screening emails, spy blocking, reply later, set aside, paper trail, feed, etc.

One thing that bothered me was that it was a closed walled garden. Traditionally, email has been built on standards (like SMTP, POP, and IMAP). You can use Thunderbird to access your Gmail, use Apple Mail to access your Hotmail, etc. HEY lets you forward mail in or out, but that's as open as it gets. You can't access your HEY email from another client, and you can't use the HEY client to access other email accounts. Their explanation for this is that their special features aren't interoperable. I'm not sure I totally buy that. It seems like a more believable reason is that it simplifies what is undoubtedly a large challenge. And of course, the HEY software itself is not open source. I prefer open solutions, but I use other walled gardens, like the Apple ecosystem.

It was easy to redirect my Gmail and start using HEY. It was easy to learn and use. I was quite happy with it, and I must admit, a bit excited to be in on the beginning of something. Like when I first started using Gmail. But it wasn't long before I started running into annoyances.

On the positive side, it was nice being able to report an issue and actually get a response from what appeared to be a human. (Hard to tell with some of the bots these days.) Good luck with trying to report a problem to Gmail.

One of my big frustrations was not being able to view attachments. You could argue that's not really the responsibility of an email client. But I was accustomed to being able to view pdf's (e.g. resumes on job applications) with a single click. That single click in HEY just took me to a file save dialog. So I could download it (cluttering up my download folder and taking up storage) then find the downloaded file and then open it in a separate file. No more quick glance at the contents of an attachment. That was using the standalone Mac app. If I accessed HEY from my browser it was a little better (if I can convince Firefox I don't want to download it). The funny part was that HEY displays a thumbnail, and on iOS you can zoom in and read it. So obviously they were already interpreting the content, they weren't just treating them as opaque blobs. I told myself this was a minor issue but it continued to bug me.

There were quite a lot of bugs at first. In some ways that's not surprising for a new ambitious project. But I have to admit I was initially a little disappointed. I guess I've drunk a little too much of the Basecamp/David/Jason kool-aid and had high expectations. I told myself they would fix them, give them time. And some did get fixed. But others didn't. For example, the touted Feed uses infinite scrolling, except when it needs to load more content there's a noticeable pause and afterwards the view is messed up. You lose your scroll position and all the items are doubled. Not exactly a great experience. I can imagine most of the testing happened without enough data to hit that. They even mentioned it in a job posting as the kind of thing you might work on.

Then I listened to a podcast with David where he talked about how hard they'd worked to fix bugs after the release. But that they'd had to put that aside to work on their promised HEY for Work. Great, too busy adding features to fix bugs. Heard that story before. Then he went on to talk about how bugs are overrated, they're not really a big deal. You shouldn't feel bad about your bugs, they're "normal". They should have been playing "Don't worry, be happy". I'm exaggerating, I understand where he's coming from. And I agree there's a big difference between cosmetic bugs and functional bugs. And bugs that affect a minority of users versus ones that affect the majority. But it's a slippery slope. Where does that Feed issue fit? Is that a "cosmetic" issue to be shelved? To me it was a major problem, but I realize that's a judgement call.

To me, telling programmers not to worry about bugs is just asking for a bug filled product. And once you have a lot of bugs, it starts to feel pointless to fix them. Personally, I'm ok with feeling a little bad about my bugs. Not to the point of flagellation, but enough to make me e.g. write more tests next time.

I also found I'd (not surprisingly) grown accustomed, if not dependent, on a whole assortment of Gmail features. I screwed up and sent an email to the wrong recipient, which I would have caught with Gmail's undo feature. I was used to typing a few characters of a contact and having Gmail suggest the right person, whereas HEY constantly suggested contacts I never used. The Feed is a nice idea, but it's (currently) a pretty minimal feed reader. It doesn't keep track of what you've read, and if you get interrupted, there's no way to pick up where you left off. You have to scroll down and try to remember what you've read. I've switched to using Gmail filters to forward feed type material to my Feedly. Filters are another feature missing (or omitted) from HEY.

I'm not writing HEY off. I have my account and I don't mind having paid for it. I think they're trying to do something worthwhile and I don't mind supporting that. I'll keep an eye on it for potential future use.

I'm not completely happy going back to Gmail. I don't have anything particular to hide, but I'm not a fan of surveillance capitalism - of companies like Google making their profits from selling my private information, or the ugly things done with that information by the companies that buy it.