The Software Life: Lexer in TypeScript

One of the standard things I do when I'm investigating a new language is to implement Suneido's lexical scanner. Not because this is necessarily the best test of a language. It's more that I'm usually evaluating the language in terms of how it would work to implement Suneido. I realize that's a very biased view :-)

I wasn't planning to do this with TypeScript / JavaScript because suneido.js isn't intended to be an "implementation" of Suneido. It's a transpiler that runs on jSuneido (using the jSuneido lexer and parser) that translates Suneido code to JavaScript. The only manually written TypeScript / JavaScript code is the runtime support library. (And that's partly a bootstrapping process. Once the transpiler is functional much of the support library could be written in Suneido code.)

But as I implement the runtime support library, obviously I need tests. And I'm getting tired of writing the same tests for each version of Suneido. (Or worse, different tests for each version.) That's why I started developing "portable tests" some time ago. So I decided I should implement the portable test framework in TypeScript. At which point I remembered that this required a Suneido lexical scanner.

So I decided to write yet another version. I took about a day, which is about what I expected. And some of that time is still learning TypeScript and JavaScript. I based it mostly on the C# version. The Go version is more recent but Go has more language differences.

Overall, I was pretty happy with how it went. I find the repetition of "this." a little ugly. In Suneido you can abbreviate this.foo as just .foo, which is a lot more compact and less "noise" in the code, while still making it clear it's an instance variable.

I find JavaScript's equality testing somewhat baroque, the recommendation is to use === most of the time, but that won't always work for strings (because of primitive strings and object strings). And then there's the quirks of -0 versus +0, and NaN.

ES6 arrow functions and the functional methods like map are nice. And spread / rest (...) and for-of are good. I only specified types on function parameters and returns, but this worked well and gave me very little grief.

I was pleasantly surprised to find that TypeScript enum's automatically create a reverse mapping from name to number. Although the gotcha is that if you make them const you don't get this because they are inlined. Using a simple numeric enum doesn't let me attach extra information for parsing, but I'm not planning to write a parser so that's not a problem. The other issue is debugging, where all you see is a number.

Removing const from my token enum exposed another issue. I was relying on Atom's auto-compile but it only compiles the file you are editing, not the other files that may depend on it. So I went back to running tsc -w in a separate terminal window. (Open question - does anyone know why tsc lists some of my files as relative paths and some as absolute? It's consistent about which files, except that gradually more are switching to absolute. It's not causing any problems, I just can't figure out why.)

Although it can mean shorter code, I'm not a fan of "truthy" and "falsey" values. It was a source of problems 30 years ago in C. Now I got caught by it with a function that returned a number or undefined. I expected undefined to be "falsey" but I forgot that zero is also "falsey". With Suneido I took a stricter approach that things like if and while only accept true or false and throw an exception for any other type of value.

Now that I have a lexical scanner I can implement the portable test framework, which also shouldn't take me too long.

Versions, in order of age. cSuneido and jSuneido are production code and more full featured than the other more experimental versions.

TypeScript tokens.ts, lexer.ts, lexer_test.ts
Go tokens.go, lexer.go, lexer_test.go
C# Tokens.cs, Lexer.cs (tests at the end)
D token.d, lexer.d (tests at the end)
Java (jSuneido) Token.java, Lexer.java, LexerTest.java
C++ (cSuneido) scanner.h, scanner.cpp (tests at the end)

The Software Life

Thursday, July 14, 2016

Lexer in TypeScript

No comments: