The Software Life: Lexing in Go

Wednesday, April 09, 2014

Lexing in Go

One of my standard exercises when I'm looking at a new language is to implement the lexical scanner for Suneido. I've done this in quite a variety of languages - C++, Java, C#, D, and now Go. The scanner is simple, and generally the implementations are similar.

The Go code is longer (in lines) than most of the other implementations for a couple of reasons. One is that Go doesn't have ?: and you have to use if-else. Another is that gofmt puts enum type constants one per line.

Go only supports simple numeric constants to implement enums. That works ok, but it's awkward for debugging because if you print them, you just get a number.

One interesting thing about the Go implementation is that it handles unicode without really worrying about it too much.

I debated over whether to return the results as a struct or just as multiple return values. But that really depends on which is easier for the calling code.

There is a good talk by Rob Pike about Lexical Scanning in Go. If you're not interested enough to watch the video, you can skim the slides. I didn't need to use his fancier concurrent state machine design but it's an interesting example of using Go. You can see a full implementation in the Go template package.

Here's the code: (or view it on GitHub)

3 comments:

Raff said...: The statement "Go only supports simple numeric constants to implement enums. That works ok, but it's awkward for debugging because if you print them, you just get a number." is not completely true.

If you have a "typed" constant you can add a String() method. For example you should be able to do:

func (t Token) String() string {
switch t {
case VOID: return "void"
case SOMETHING: return "something"
...
}
}

Not the best, since it requires extra work, but it's possible.; 12:08 PM
Andrew McKinlay said...: Yeah, you'll see in the code I do that, although only for some of the values. But it's verbose and a lot of duplication. (!DRY) If you add a token you have to remember to add it to the String method as well. It's not a huge problem, just a nuisance. I'm not sure how'd you'd fix it without adding undesired complexity to Go.; 12:38 PM
notzippy said...: Something new with go is go generate, may be helpful
http://blog.golang.org/generate; 4:52 PM