The Software Life: d

Showing posts with label d. Show all posts

Wednesday, August 22, 2012

D-etractions

I love the potential of the D language, but I have to admit I'm becoming a little disillusioned with the current reality. This is of course, extremely subjective. I don't want to get in a fight with D supporters. I agree it's a great language. To me, most of the issues revolve around a lack of maturity and a small user base. If I was just playing, that wouldn't be a big deal. But for an implementation language for Suneido, I'm looking for something rock solid. Java may be old and boring and "weak", but it's solid.

One of the big issues for me is the lack of tools, especially good IDE support including refactoring. You know this is unlikely to be a priority when the main developers are old school text editor users. (In my mind I can hear them saying "we don't need no stinking IDE") Scala had the same problem until recently, also due to the main developers not using IDE's. But recently they've made IDE support more of a priority. Maybe not so coincidentally, this coincided with the formation of a commercial entity (TypeSafe) promoting Scala. I've done my share of old school editor + command line programming, but to me, a large part of the benefit of statically typed languages is that they allow powerful IDE manipulations.

A similar issue is the meager ecosystem e.g. books, libraries, tools, etc. I look at the libraries and tools I use with Java and few, if any, are available for D.

One thing that makes me a little nervous is D's fairly simple (from what I've seen) garbage collector. When I think about the effort that has gone into the JVM garbage collectors, I wonder how D will compare. D's garbage collector is also conservative in at least some situations (I'm not sure of the details). Based on my experience with conservative garbage collection in cSuneido, this increases my nervousnous.

D's templates and compile time function evaluation, and mixin's also worry me. They are very powerful and very cool, and they might be way better than C++ templates, but they're still complex. Of course, it's the usual story, you judge a new language when you're a newbie, so you don't necessarily have the basis to form a good opinion. But we seldom have the time to become expert before making a decision. I do have a fair amount of experience with C++ templates to judge by. Templates, as in D and C++, are undeniably powerful. Much more so than the generics in Java or C#. But I wonder if they are on the wrong side of the power / complexity balance. And these advanced features appear to be absorbing much of the energy of the development, to the neglect or even detriment of the overall maturity.

I love the potential of explicit "pure" and "immutable". I wish other languages like Java and C# had more of this. But from the meager exposure I've had, the reality is not as nice. I'm sure some of that is simply learning curve. And some of it may be improved by future language improvements. But when you can't put immutable values in a container without wrapping them in a "Rebindable" template, that again makes me nervous.

Of course, you could use a subset of D and try to ignore templates etc. but this is hard psychologically (I got sucked into mixins just to implement lexer tokens!), and also because the libraries make heavy use of templates. Then you'd primarily be consuming templates, not writing them, but you still need some understanding even to use them.

One of my concerns with Suneido implementation is making it easier for other programmers to get involved in the implementation. (i.e. improving the bus factor) In that respect, I think the jSuneido Java code is much better than the cSuneido C++ code (with templates!). And it's probably easier to find Java programmers than C++ programmers, let alone D programmers. (Of course, the challenge of finding good programmers remains.)

I plan to keep an eye on D and hopefully continue to play with it a bit. I wish the project good luck. But for now, I am going to put on hold any ideas of using it to replace cSuneido.

See also: D-tours and System Programming Languages

Sunday, August 05, 2012

D-tours

I've been continuing to spend a little time here and there playing with the D language. For a sample project I decided I would try implementing Suneido's lexical scanner. It's pretty simple, just a little string and character manipulation so I thought it would be easy.

I based the D version on the Java version since it's newer and cleaner than the C++ version.

The first thing I ran into was that the lexer returns "tokens", which in the Java code are not just integer enums, but actual enum classes with fields and methods. Java went from no enums in early versions, to quite sophisticated enum capabilities.

In typical new user fashion, I tried to reproduce the Java style token enum in D. (It's hard not to avoid "cutting against the grain" when learning a new language.) D does allow you to use structs for enums so it seemed this would work. But if you use structs then you lose the automatic assignment of consecutive integer values, and D struct's are value types so they aren't necessarily unique.

Then I went off an a tangent looking into D's "mixin" ability. D has the ability to run D code at compile time, generate a string, and insert that string into your source code. This is known as CTFE (compile time function evaluation). For example, you can write:

mixin(myfn(arguments...);

where myfn returns a string containing source code to be compiled into your code in place of the mixin "call". This is easier than trying to use C++ templates to do compile time calculations, but not as seamless as Lisp macros because you have to work at the level of strings.

An example of the power of this is the Pegged library for D. This allows you to include a grammar specification in your source file, have a parser generated from it, use it to parse DSL code also included in your source file, and then use the resulting parse tree to generate D code - all of this at compile time, without requiring any external tools.

It was fun looking at what you could do with mixin but it was overkill for what I was trying to do. I realized I could use another feature of D to use simple integer enum tokens, but still work with them as if they were objects with properties.

enum Token { IF, ELSE, AND, OR, ...
bool isKeyword(T token) { ...

D allows you to call functions like isKeyword as if they were methods on their first argument, so you can say token.isKeyword() even though token is an int, not an object.

This also led me to discover D doesn't have a "set" data structure, which surprised me a little. It does have hash maps built-in to the language, which can be used to emulate sets, and it has a red-black-tree implementation which could also be used. But it doesn't have an actual "Set" class. It was easy enough to throw together my own Set so I could make a set of keywords. (Which I didn't end up using because what I really needed was a map of keyword strings to tokens.)

What I initially wrote was:

static auto keywords = new Set!Token(IF, ELSE, ...);

But that won't work because static initializers can only be simple literals. However, you can write:

static Set!Token keywords;
static this() { keywords = new Set!Token(IF, ELSE, ...); }

I'm not sure why the compiler can't do this same translation as simple syntactic sugar. It's extra annoying because you can't use the "auto" type inference.

You could do it with a mixin like:

mixin(stat("keywords", "new Set!Token(...);");

but that's not exactly pretty.

Another minor annoyance was the lack of an equivalent to Java's static import. If I want to refer to the tokens as just IF or ELSE I could make them "anonymous". But then they don't have a specific type. Or I can give the enum a named type, but then I have to always reference them with Token.IF . In Java I could say import static Token.* and then just use the bare names.

I ended up going back to a mixin approach with each Token a static class. I used a class rather than a struct because I wanted to pass tokens around by reference, not by value. See token.d

I also ended up writing multiple versions of the lexer. As with most rewrites, I found a number of improvements I could make. Most of them aren't specific to D, I could apply them to the other versions as well.

After getting it working, I decided to go back and write it in a functional (rather than object oriented style). In this version the lexer is a function that takes a string and returns a struct with the token, its string value, and the remainder of the source string. This version should work with Unicode UTF8 strings, not just ASCII, because I only access the source string with front and popFront.

Along the way I also started a simple version of Hamcrest style asserts. I almost expected something like this to be available, but I couldn't find anything.

One of the features I was excited about with D was immutability. But when I tried to make Token immutable, I ran into all kinds of problems (as others have also). One of them being that you can't store immutable objects in a map! Hopefully this is something they'll figure out. I eventually gave up on immutable and managed to get const to work, although even that was a struggle.

I set up a dsuneido GitHub repository for my experimenting. I haven't used GitHub before, but I keep hearing how great it is so I thought I better give it a try. I used their Mac GUI app, which made it quite painless. However, I'm not sure it's the smartest idea to be using three different version control systems - Subversion for cSuneido, Mercurial for jSuneido, and Git for dSuneido. I chose Mercurial for jSuneido because they had better Windows support than Git at that time, but now that Eclipse comes with built-in Git support, I wonder if I made the right choice.

I'm not sure where I'm going with this, so far I'm just playing and learning D. So far, I like it, but it certainly has some rough edges.

Thursday, July 12, 2012

System Programming Languages

Or, more specifically for me, languages I'd consider using to implement Suneido. Here are some opinionated thoughts on C++, Java, Scala, Xtend, D, and Go.

First, I want garbage collection. Other than for things like constrained devices, as far as I'm concerned garbage collection has won.

For me, that's a major drawback to C++. So I wrote my own garbage collector for C++ which we used for a number of years. Eventually I replaced it with the Boehm garbage collector which has worked quite well, given the limitations of conservative garbage collection.

The last few years I've been using Java which gave me great garbage collection and one of the best virtual machines out there.

The big advantage to Java is the rich ecosystem. You have multiple choices for great IDE's (e.g. Eclipse, Netbeans, IntelliJ) with good refactoring support, lots of tools like JConsole and Visual VM, good libraries, both standard and third party (e.g. Guava and Asm), and tons of books and other sources of information.

On the other hand, the Java language itself is nothing to get excited about. Lately I've been looking at Xtend and Scala. Xtend is fairly modest, it's basically a better Java. See my First Impressions of Xtend

Scala is much more ambitious. It has some great features. The claim is that the language is simpler than Java. But I'm more than a little scared of the type system. Although it's that type system that allows some of the great features. Scala strikes me a bit like C++, you can do some amazing things, but it can also get pretty twisted.

I'd be more tempted by Scala, but as system programming languages, JVM language all have a major drawback - you can't write efficient low level code. Don't get me wrong, I have no desire to write "unsafe" code. I don't want to go back to my C and C++ days. Java has primitive (non-heap) types, but they don't work with generic code (without boxing onto the heap). I'd like to have "value" types (e.g. a pair of ints) that I could pass and return by value (not on the heap) and embed (not reference) in other data structures. Scala tries to unify primitives with the other types, but ultimately it comes down to the JVM, and there's still a lot of boxing going on.

Also, Java is primarily in the locks and shared state style of concurrency, a style whose drawbacks are well known. Scala is better on this front with its actor system, which admittedly is also possible from Java.

What else is out there? C# is a possibility, but I have to say I'm leary about Microsoft products. And portability means Mono, which is another question mark in my mind.

Recently I've been rereading the D book by Andrei Alexandrescu (also author of Modern C++). You can get a taste of his writing style in The Case for D. In some respects the D language could be called a better C++. But it has gone beyond that and has some very attractive features. It has value types as well as reference types. You can write efficient code like C++, but still make it safe.

D has garbage collection, although I suspect (with no evidence) that it's not as good as the JVM. Of course, part of the reason so much effort has gone into JVM garbage collection is that everything has to go on the heap. That doesn't apply quite so much in D (depending on the style of your code).

One of the features I really like about D is its support for immutability and pure functions. My feeling is that these are really important, especially for concurrent code. You can, of course, write immutable data structures and pure functions in any language, but D actually lets you enforce them. Unlike C++ const or Java final, D's immutable and pure are "deep". Making an array final in Java doesn't stop you from modifying the contents of the array. Making an array immutable in D does.

One minor disappointment is that D still requires semicolons. I think languages like Scala and Go have shown that it's quite feasible to skip the semicolons. But I could live with this.

I'm not against virtual machines like the JVM or CLR. But they have their drawbacks, especially for deployment. Ensuring that customers have the right version of .Net is bad enough and that's a Microsoft product on a Microsoft platform. We can require our customers install Java, but when they don't update it and there's a security hole, guess who they will blame? One of Suneido's main features is that it is self contained and easy to deploy. Requiring another runtime underneath it is not ideal. So D and Go generating native apps is attractive in this respect.

Another advantage to D is that (I think) it would allow accessing the Win32 API so Suneido's current user interface would still work. (albeit still only on Windows.)

Yet another possibility is Google's Go language. Personally I find it a little quirky. It's got some good features, but it's also missing things that I like (e.g. type safe containers). I'm currently reading Programming in Go but I'm finding it somewhat dry. I find it a lot easier to learn about a language from a well written book. I didn't really get interested in D until the book came out. And books got me interested in C and C++ and Scala.

One of the things discouraging me from Scala and D and Go is the lack of IDE support. That's gradually improving, but things like refactoring are still pretty limited. I programmed for years with just an editor, but once you get accustomed to a good IDE with refactoring support, it's hard to go back. To me, one of the big advantages of a statically typed language is that it allows great tooling. Unfortunately, Java seems to have a big lead in this area.

Currently, if I was to consider another language for implementing something like Suneido, I think I'd lean towards D.