Tuesday, October 22, 2019

gSuneido Roller Coaster

One of the questions with the Go version of Suneido (gSuneido) was the Windows UI interface. In the long run I'm hoping to switch to a browser based front end, but that's a long term project and I would like to replace cSuneido sooner than that. One option is to implement cSuneido's general purpose DLL interface. But this involved some ugly low level code (even some assembler) that I didn't really want to port. Another option was to use Go's syscall facilities to write the specific Win32 functions that we needed. I did a quick check on how many functions we needed to open the IDE. It was about 100. I knew the final number would be higher, but it seemed reasonable to hand write that many. One advantage of doing this is that it moved the "unsafe" code from Suneido to Go, which seemed safer.

Of course, it took much longer than I expected to write (and debug) the interface functions. There were about 300 in the end. (It's not just the functions, it's converting to and from all the structs involved as well.) I definitely had a few doubts whether this was the right approach. In retrospect, I'm pretty sure I could have ported the general purpose interface in less time. It was also a bit depressing knowing that if/when we got the browser interface finished, then this code would all be thrown out. If I was to do it again, I might write a tool that would take the cSuneido definitions and generate Go language definitions.

But finally I got near to complete and I could actually run the Suneido GUI. That felt pretty good.

Except there were intermittent crashes. I didn't worry about it too much at first. This kind of low level unsafe code can easily crash if you have bugs. But even after I cleaned up the majority of the bugs, it was still crashing. It worked most of the time, so I didn't think it was blatant errors in the code. I spent days trying to track it down. Of course, like many elusive bugs, it almost never crashed in the debugger.

My first thought was something related to garbage collection. But disabling garbage collection didn't help.

Another possibility was stack movement. Because Go supports high numbers of goroutines, they start out with relatively small stacks, which grow as necessary. This is a tricky process because you have local variables on the stack, and pointers between them, all of which have to be adjusted when the stack is moved. I couldn't figure out how to debug this. There was no way to turn it off, and no way to trace when it happened. I tried to detect it by storing the address of a stack value in an integer where it wouldn't get adjusted. But this didn't seem to work and I never did figure out why. (Part of the problem is that unlike C(++), in Go you don't control whether things are on the stack or the heap. That's all handled automatically by the compiler. It's quite possible that my value was ending on the heap and therefore useless for detecting stack movement.)

Needless to say, this was another low point. For starters, all that work for nothing - it was useless if it crashed randomly. And secondly, what now? If I couldn't reliably interface with the Windows GUI from Go, then the only other option I could see was to write the GUI interface in a separate C(++) program and use some kind of interprocess communication. Presumably that would work, but it would be ugly.

It might seem odd that something so objective and abstract would involve so much emotion like elation and depression. But software development (at least so far) is a human activity, and human are nothing if not subjective and emotional.

One thing that puzzled me was that I knew there were several other Go projects that interfaced with the Windows GUI. (Like https://github.com/lxn/win and walk) Why didn't they have this problem? I started searching and soon found this issue:

don't use Go heap for MSG
It's not entirely clear to me what's going on here, and I'm only
half-certain that this fixes the issue, due to it being somewhat hard to
reproduce ... While this might point to deeper Go runtime issues, we
just work around this here by allocating the variable using GlobalAlloc.
At the very least, I haven't been able to reproduce the bug after
applying this patch.

Following the links and looking at the related issues it sounded very much like what I was seeing. I applied their fix/workaround and sure enough the crashes went away, accompanied by feelings of relief and joy.

Part of the problem/complexity is that the way the Windows GUI works, there are nested callbacks and DLL/sycalls. i.e. Windows calls application code which calls Windows code which calls application code, and so on. On top of that, Go syscalls and callbacks do a complicated dance to switch between Go stacks and syscall stacks, and attempt to protect non-Go code from garbage collection and stack movement. It's not at all surprising that there would be ugly corner cases.

Well, my happiness lasted all of a day before I had another crash. Hmmm... was that the old problem still there, or something else? I crossed my fingers that it was something unrelated. There's a fine line between wishful thinking and having a positive attitude. Sadly, the crashes kept happening and it became obvious the problem wasn't solved.

But the lxn/walk issue had seemed so similar. And the fix had certainly reduced the number of crashes. If the issue was passing pointers to stack values, then I was doing this in a lot more places than just the message loop MSG. Whereas the lxn win/walk code always took the pointers as arguments rather than declaring them locally. (Of course, I didn't figure this out quite as step by step as it sounds. There was a lot of frowning and frustration first. )

I tried changing a few of the common crash sites to avoid anything Go might put on the stack. That reduced crashes even further. Of course, once the crashes get rare, it becomes very difficult to know whether you've fixed anything or not. It always seems to work at first. Just when you're ready to declare success, then it crashes again.

It seemed like I was on the right track, so I took the plunge and changed all the functions to allocate temporary values on my own "heap/stack". This was a bit of a gamble since it was a lot of changes (several days work) with no guarantee it would help.

But the gamble seemed to pay off. No more crashes! I was a happy camper.

I soon had more or less the whole IDE working. There were numerous little glitches but they were mostly easy to fix. Until one glitch that was more problematic. I happened to try dragging some text within an editor window, and it didn't work. At first I thought the drop was failing since the text didn't appear at the destination. But I soon realized it was getting inserted at the very beginning or end of the text, instead of where I was trying to drop it. I couldn't remember how drag and drop worked so I had to do a little digging. At this point I was still assuming it was just some minor error in one of the functions. But it wasn't anything obvious. I narrowed it down to a couple of notification callbacks (EN_CHANGE or SCEN_MODIFIED). If those callbacks did anything significant then the drop would go to the wrong place. It didn't seem to matter what they did. I never did narrow it down to whether it was call stack or heap usage.

The problem was, this was a bit of a dead end. Inserting seemingly harmless, irrelevant code in between Windows and Scintilla (the editor control) caused drag and drop to fail. I looked at the Scintilla code but it didn't seem to be doing anything unusual or suspicious.

I spent another couple of days on this issue until I admitted defeat. Again, there seemed to be something going wrong in the depths of Go's complex syscall / concurrent garbage collection / stack movement. In theory I should submit a Go bug report. But, understandably, they'd want a small example that consistently reproduced the problem. And I don't have anything close to that. I have a large system that crashes rarely.

Another low point. Now what? Such a small symptom, but there's no such thing as minor corruption, it's a bit like a little bit pregnant.

Back to my previous idea of a separate GUI process written in C. Probably the fastest interprocess communication would be shared memory. But you'd still need some kind of synchronization. It still seemed ugly.

Did I really need a separate process? I want some isolation, but maybe I can do that within a single process. Go has a facility for incorporating C code in your Go program (Cgo). What if the C GUI code ran in a separate OS thread, but still within the same process? That would separate the Windows GUI code from all the Go issues of concurrent garbage collection and stack movement. But you wouldn't have the overhead and complexity of a separate process and interprocess communications. And you could, of course, share memory. (carefully!)

But you still need synchronization. When the Go side made a call to the C side, it would need to wait for the result. Similarly when the C side made a callback to the Go side, it would need to wait for the result. One option was to use a Windows synchronization facility. That would mean a syscall on the Go side. The other option was to use a Go synchronization facility, which would mean a callback from the C side. I decided it made more sense to use Go synchronization because I suspected that would play better with the Go scheduler and concurrency.

One way to handle the synchronization was to use Go channels. If it was pure Go code I might have gone that way. But with C involved it looked like sync.Cond would be simpler. When I searched for how to use sync.Cond, what I found was an issue where someone suggested the documentation needed to be improved. I wouldn't have thought that would be at all contentious. But  it seems some of the Go team doesn't like sync.Cond and don't think people should use it, and therefore don't want the documentation improved. Personally,  I question the logic of that. It seems like poor documentation just increases the chances of people misusing it. I appreciate the Go team and I'm grateful for all they've done, but sometimes there seems like a bit of a "we know better" attitude. (Of course, most of the time, I have no doubt they do indeed know better.) Another example is the refusal to supply a goroutine ID or any kind of goroutine local storage, again because "it might be abused". And yet internally, they use goroutine ID's. I'm just an ignorant bystander, but it seems like a bit of a double standard to me.

Despite the poor documentation (and with the help of Stack Overflow)  I figured out how to use sync.Cond. Whether I used it appropriately is a question someone else can debate. No doubt some people would say I should be using channels instead.

Using Cgo turned out to be fairly straightforward. It requires GCC so I installed Mingw64. I decided to stick to C rather than use C++. For the small amount of code I had to write, it seemed simpler and therefore safer. It took a few tries to get the synchronization working properly. The code wasn't complex, but as with all concurrency, the devil is in the details.

Thankfully, it didn't require a lot of changes to my existing Go code. I basically just swapped out the syscall and callback facilities.

Of course, it was another gamble to implement a whole approach on the basis of not much more than a gut feeling it should work. And gut feelings are notoriously unreliable in software development. But it only took a day or two to implement, and I was back to running the the Suneido IDE. And the moment of truth - would drag and drop work now? It did! By this point my reaction was more sigh of relief than elation.

Are there other bugs lurking? Almost certainly. But I'm cautiously optimistic that I've solved the crashing and corruption. The only thing talking to Windows is now a single C thread (with a regular dedicated fixed stack and no garbage collection or even heap allocation) which is a very tried and true setup. You can't get much more basic than that. The only interaction is C calling a Go function to signal and wait. No stack or heap pointers are shared between the two sides. Go's concurrent garbage collection and stack movement can work totally independently of the C thread. And as a bonus, it's more efficient to run the message loop in C with no Go syscall overhead.

I was all ready to post this, but I was a little nervous because I really hadn't tested much. I didn't want to declare victory and then find another problem. Of course, there were always going to be "other problems".

So I did some more testing and it seemed a little sluggish. At first I didn't pay much attention - there's so much background stuff going on in our computers these days that can affect speed. But it didn't go away. I did a quick benchmark of a simple Windows call and got 70 us per call. That seemed high but I wasn't sure. I checked cSuneido and it was 2 us. Ouch. Maybe it was just Go syscall overhead? (i.e. can I blame someone else?) No, directly from go it was .3 us.  My guess is that there is some kind of bad interaction with sync.Cond, but I don't really know.

Another downer, just when I thought I'd won. C'est la vie. The obvious alternative (that I had considered earlier) was to use Windows synchronization. Luckily Windows condition variables and critical sections were similar enough to Go sync.Cond that it didn't take long to switch.

I didn't actually hold my breath when I ran the benchmark, but I mentally had my fingers crossed. And it worked - now it was 1 us per call, better than cSuneido. And the IDE no longer felt sluggish.

I've been running this version for a couple of days, and it seems that this chapter of the saga is over, so I will post this and move on to other bugs.

Wednesday, October 09, 2019

Thread-Safe Tree Comparison

When I implemented jSuneido (Java) I ignored a lot of potential concurrency issues. (That's not as bad as it sounds - there is almost no concurrency in our application code.) But Go (gSuneido) is a lot stricter. I was surprised to get fatal potential data race errors even without the race detector.

So I was more or less forced to handle some of these issues. As usual, concurrency is trickier than you think, even if you think it's tricky.

One issue I ran into was comparing containers. The obvious approach is to compare recursively (which is what cSuneido and jSuneido did). My first attempt at making it thread-safe was to lock each container as you recursed into it. But then I realized that this would acquire multiple locks, which has the potential to deadlock. (I found sasha-s/go-deadlock useful.) The normal solution is to make sure you lock/unlock in a consistent order. But I did not have control over the order of recursion.

I searched the web for algorithms but I didn't find much. (Which was part of what prompted me to write this up.)

One option is to just use thread-safe low level operations e.g. to get a child. But locking at a low level means a lot of lock overhead. (Uncontended locks are fast these days, but they can still have performance implications.)

The only other decent solution I could come up with was to lock the object, copy the list of children, and then unlock it.

But I didn't want to add a bunch of heap allocation to copy the children. So I switched to an explicit stack and iteration instead of recursion. This may still allocate when it grows the stack, but after that it can reuse the space (which should be in cache). And it's much less than if you allocated a list for each child.

I implemented "equals" first. When I went to implement "compare" (less or greater than) it was a little trickier because I needed lexicographic order, whereas my stack was LIFO. That turned out to be relatively easy to handle by pushing the children onto the stack in reverse order.

(This code is further complicated because you have to detect recursive data. So you have to track the comparisons in progress.)

This approach is thread "safe" in terms of not crashing or getting random data. It's not necessarily "correct". But if you are making changes to containers at the same time as comparing them, I'm not sure what "correct" means. If you had immutable persistent data structures then you could grab a "snapshot" of the two containers. But even then, how do you get the two snapshots "at the same time"? Yet another lock?

Here's a link to the main part of the code as of this blog post. If you actually wanted to use this code it would probably be a good idea to look at the latest version in case I fix any bugs or come up with a better approach.

Tuesday, September 24, 2019

gSuneido Milestone


It's nowhere near finished but as of today gSuneido actually runs the Suneido WorkSpace.

There's a big pyramid of code (and hours) under this simple window. And it's a pyramid where one wrong brick at the bottom makes the whole thing come crashing down.

When you build bottom up, you spend a long time (years!) painstakingly laying foundations with not much to show for it.

Not surprisingly, it feels good to see some real results.

Thursday, August 08, 2019

gSuneido Progress

gSuneido (the Go language implementation of Suneido) finally passes all the standard library (stdlib) tests. I'm sure there are still bugs to find and issues to deal with, but it's a significant milestone.

Even better, it runs the tests faster than cSuneido (the original C++ implementation). jSuneido is slightly faster once it warms up, but that's not surprising since it compiles to Java byte code which is then JIT compiled by the JRE, whereas gSuneido always interprets byte code.

Unfortunately, this milestone isn't very useful. gSuneido can't access Windows API's yet so it can't run the front end user interface. And I haven't implemented the database, so it can't be used as the back end server either. In the long run, we're working on a browser based user interface but that's a ways off so I'll probably work on a Windows interface next because that will let us replace the aging cSuneido for the front end client. Also, I intend to overhaul the database rather than just port it, so that'll be a bigger job.

I'm still happy with the Go language. Occasionally I miss generics, but most of the time it's not a big deal. It's definitely my favorite language these days. When I started in 2014, Go was not as popular or as polished as it is now. For example, the garbage collector was fairly primitive and not really ready for what I needed. Luckily I picked a good horse in the language race and Go is certainly production ready now.

The first commit to the Github repo was in April of 2014. I've made 603 commits since then, most of them in the last year or so. It's currently about 22,000 lines of code. In comparison, cSuneido is 45,000 and jSuneido is 58,000. (via cloc) In modern terms those are tiny amounts of code. But it's plenty for me to develop and maintain.


Thursday, May 02, 2019

Merge Tree versus Block List

Looking for something else in the code, I happened to notice that temporary indexes in jSuneido were using a merge tree. Was that really the best data structure to use? Presumably it was better than what I used prior to that, but it didn't seem optimal.

A merge tree (at least my flavor) uses levels that double in size. Some levels may be empty. Each level is kept independently sorted. Items are inserted at the root and when the first levels fill up, they are merged into the next bigger level.
Merge Tree
Merge trees have a lot of good features:
  • Fast (amortized) insertion
  • Cache oblivious (i.e. don't need to be tuned for specific hardware)
  • Cache friendly - mostly sequential access
  • Optimal memory space (e.g. as opposed to array size doubling) 

But they keep the data ordered all the time. Which is necessary if you are interleaving reads and writes, but for my usage, I insert all the data before reading any of it. And in order to keep the data sorted, they do quite a bit of extra work. (e.g. a million items = 20 levels, which means on average each item is copied/merged about 19 times) And merge trees are write optimized, they're not as good at lookups or iteration.

Maybe a simple list with an explicit sort step would be better in this situation? But the usual array doubling list does a lot of allocation and copying and wastes 25% of the space on average. And allocating large contiguous chunks of memory can be hard on memory management. It seemed like a better alternative would be a list of fixed size blocks. (In the diagram the blocks are only two elements, in practice 4096 elements per block was a good size.) Iteration is simple, and lookups can do a simple binary search.

Block List (after sort)
This has most of the features of a merge tree - fast insertion, cache oblivious/friendly, and low space overhead. In addition, it does less copying, never allocates huge blocks, and has faster iteration and lookups. And to top it off, the code is simpler :-)

As usual, I debated whether to go off on this tangent. But I was in between tasks, and I figured it wouldn't take long to implement enough to benchmark and decide whether to proceed.

To sort my block list I decided to use a regular sort on each block, and then merge the blocks. By using a series of two way merges and recycling blocks I could limit the extra space needed for merging to two blocks.

Although the percentage space overhead is high for small lists, it's small in absolute terms. And for small lists we have a single block and a simple sort with no merging required (and therefore no extra temporary space for merging).

I implemented this in jSuneido since that's what we're using in production. But Java doesn't have an integer array sort with a custom comparator. (There are ways to do it, but not efficiently.) I ended up borrowing the sort from fastutils (which appears to have similar origins to the Java sort). I've been mostly programming in Go lately so it was a bit of a shift to be back in Java.

My initial intuition proved correct and overall it was about twice as fast. Of course, this is just one small part of a large system so I'm not sure how much speedup we'll see in actual use. Still, I figure it was probably worth the day and a half it took me. 

Tuesday, April 16, 2019

Regular Expression Implementation

This detour started with an out-of-memory error, which I tracked down to a runaway regular expression, which was a result of old C++ code expecting a nul terminated string, even though it was taking a pointer and a length.

It was easy enough to fix, but it got me looking at my regular expression implementation. Suneido has its own regular expression implementation both for historical reasons, and for portability. When I wrote jSuneido I tried to use Java regular expressions, but it proved too hard to make it compatible with cSuneido. And by now we have a lot of application code that depends on the particular flavor of regular expressions that I originally implemented.

Although it's a fairly clean object-oriented design, there was one part where it was switching on types rather than calling a virtual method. That was because it needed to update some state. No big deal, I’d just put the state on an object and pass it around.

I decided to work on this on the Go version, and Go made it very easy to write a quick benchmark to make sure my changes didn’t slow it down too much.

What I found was that it was already quite slow, and even slower with the changes. Suneido applications don’t tend to use a lot of regular expressions, so the speed wasn’t a huge issue, but it bothered me nonetheless.

The implementation compiles regular expressions to an intermediate form which is then interpreted. The speed of an interpreter depends primarily on how “large” the instructions are. In this case the instruction are very small, e.g. matching a single character. And my fancy object-oriented design added even more overhead in terms of indirection and virtual (interface in Go) calls.

What if I took a different approach and used the classic big-switch-in-a-loop style of interpreter with all the code inlined? 

I made a quick prototype and benchmarked it as 300 times faster. I was a skeptical of that result, but it seemed worth investigating. As the code grew from a trivial prototype, the results came down to earth, ending up about 3 or 4 times faster. That was more believable, and still a good improvement. I suspect my first prototype was a little too simple and the compiler optimization eliminated most of it. That’s a common problem with micro-benchmarks.

Without all the object-oriented wrapping overhead, the inline interpreter wasn’t even that big, around a hundred lines of code.

Of course, while I was in there I couldn’t resist a few other optimizations. Suneido’s regular expression implementation uses fairly naive backtracking, which has some ugly worst cases. It’s a bit ad hoc, but I added specific handling for some of those bad cases. That’s where I saw up to 1000 times speed improvement, although only for those specific cases.

So I was happy with the speed improvement from this rewrite, but was the code harder to follow? I’m not sure. The interpreter is more monolithic, but it’s still reasonably small. And a whole bunch of interconnected small objects is not always easy to follow either. For me, it was a reasonable trade off.

Code in Github as usual

Wednesday, March 06, 2019

Go small structs and interfaces

The normal rule of thumb is that if you have small structs, you should pass them by value, rather than by pointer.

So in gSuneido (the Go implementation of Suneido I'm working on) that's what I'm doing.

But I was thinking about the way interfaces work in Go. An interface can only hold a pointer. So if you do:

var i interface{} = x

and x is not a pointer, then x will be copied to the heap so that i can point to it.

Some simple cases avoid the allocation. For example, true, false, and empty string ("") can be assigned to an interface without causing allocation. Presumably because the runtime has a shared value it can point to.

Suneido is dynamically typed, so values are stored and passed around as Value interfaces. (The equivalent of an abstract base class in C++ or Java.)

Given this, does it still make sense to "pass" small structs by value? Or is that just going to cause heap allocation?

I made a quick benchmark.


And the results were:
BenchmarkSmallByValue     20.7 ns/op    16 B/op    1 allocs/op
BenchmarkSmallByPointer    2.09 ns/op    0 B/op    0 allocs/op

As expected, when we pass by value, we get a heap allocation every time, making it about 10 times slower. That's a bit misleading because the benchmark doesn't do any real work. e.g. if it was doing 500ns of work, then the comparison would be 520ns versus 502ns which is much less significant.

However, the allocation is more of a concern. Heap allocation has bigger, more widespread affects like garbage collection and cache pressure and memory bandwidth. These aren't always evident from a benchmark.

With these kinds of micro-benchmarks I'm always afraid the compiler has optimized away key parts of the code. But I used Compiler Explorer to look at the code generated, and it all seems to be there.

Somewhat surprisingly, I got the same results for a simple string rather than a struct - it's faster to pass a pointer to a string. Coming from C++ or Java, you think of a string as already being a pointer. But in Go it's more like a slice, i.e. it's actually a small struct containing a pointer and a length. Because it's a small struct, you normally pass it by value. But when you use interfaces, that means heap allocation (of the slice struct as well as the actual bytes of the content).

Unless I'm missing something, it seems like I should change gSuneido to use pointers for all values, even strings.

Thursday, February 21, 2019

Partial Incremental Encoding

I have been thinking about how to improve Suneido's Btree implementation.

Btree's were initially designed for disk storage. But Suneido memory maps the indexes and generally Suneido database servers have enough memory that the indexes are resident. Would it make sense to use a different Btree design for in memory?

One possibility is to not store the keys in the Btree, since there's no extra disk access to access the data if it's in memory. That's more like how you'd design a normal in-memory data structure. And it makes the Btree nodes much smaller.
But memory is not flat these days, we have multiple levels of caches. And in relative terms, cache misses are almost as bad as disk accesses. So what we really want is to minimize cache misses.

So we have a design continuum. No keys makes the index small but will mean a lot of indirection and therefore cache misses. Entire keys means no indirection, but indexes are big.

My first thought was to compromise and store a small fixed portion of each key in the index. This keeps the index small, makes nodes fixed size, and reduces indirection.

billbi
billybi
erikaer
eriner
ermaer

But ordered keys usually have common prefixes, so storing the first few bytes does not discriminate very well.

To handle that we can use prefix compression (aka incremental encoding or front compression)

bill0, bill
billy4, y
erika0, erika
erin3, n
erma2, ma

Prefix compression means we have to linear scan each node (i.e. you can't do a binary search). But Btree nodes are reasonably small and linear is actually the optimal access pattern for memory caches.

We can combine these techniques by only including the minimum information to differentiate each key from its predecessor.

bill0, b
billy4, y
erika0, e
erin3, n
erma2, m

Interestingly, this looks a lot like a radix tree, branching on a single character.

This nicely handles both dense (e.g. sequential values) and sparse (widely separated) distributions of keys. However, it does have a worst case - when we go from a sparse key (small common prefix) to a dense key (large common prefix) (e.g. bill to billy or erika to erin) we need more information than we have in the index. One option is indirection to look at the actual data. That's ok if it's rare, but not so good if it's common.

If we give up the fixed length idea (which is not as important for linear scan anyway) then we can store the extra information necessary for this case (the missing prefix information)

bill0, b
billy4, illy
erika0, e
erin3, rin
erma2, m

With this, only one indirection is required at the end of each search to compare the key to the actual data.

In my test data I averaged less than 1.1 bytes per entry. (Although obviously it can be more, as in this example.)

To search, we iterate sequentially, maintaining the known prefix after each entry, and comparing that to what you're looking for. Because we only have partial information, we may search one entry too far. e.g. if we're looking for "earl" we should stop at "erika", but since all we have for "erika" is "e", we'll look one past.

keyentryknown
bill0, bb
billy4, illybilly
erika0, ee
erin3, rinerin
erma2, merm

Inserting and deleting affect the encoding of the following values so you need to re-encode entries until you're back in sync. (e.g. hit a 0 entry) This requires looking at the actual data, but it's only for updates, and typically not that many entries.

I've prototyped the design and it seems to perform reasonably but I haven't implemented enough to actually benchmark it against Suneido's current implementation (which stores entire uncompressed keys).

Part of what triggered this was Design continuums and the path toward self-designing key-value stores that know and learn 

Wednesday, January 30, 2019

Scala Native & More

I recently discovered Scala Native, a version of Scala that compiles to a native executable rather than running on the JVM like the original Scala. (via Modern Systems Programming with Scala Native)

Scala is an attractive language. Scala is to Java as C++ is to C. Meaning it's much more complex. (Scala promotes its simple syntax, but that doesn't mean it's a simple language.)

In some ways I like Kotlin better than Scala. It seems to give a lot of the advantages of Scala, without Scala's complexity. And there are non-JVM versions of Kotlin. But the problem is that Kotlin relies on the Java libraries, which is a portability problem for the non-JVM versions. Whereas, one of the strengths of Scala is its libraries (e.g. collections).

Libraries bring up an interesting issue. To write good libraries seems to require more features and complexity than application code. A lot of the complexity of C++ is to enable writing good libraries, and I suspect the same is true of Scala. The other alternative is the Go approach. Go has a built in "map" collection. You can't write map in the Go language. If you want a slightly different style of map, what you can write in Go is nothing like the built in one. This is at the heart of the Go generics struggle. On one hand, you want to be able to write your own map in Go. On the other hand, you want to keep Go's simplicity and not end up with another C++.

Another factor is that people seem to like complexity, regardless of whether it's counter-productive, myself included, sadly. Languages like C++ are fascinating. There's always something more you can learn. Whereas simpler languages like Go are vaguely disappointing in that respect. I love how the C++ folks gleefully pile complexity on top of glittering complexity. But when it comes time to actually get some work done ... then maybe you want something simpler. (Especially when the wondrous pyramid is built on unsafe ground.)

Complexity also tends to affect build times. C++ and Scala are both known for slow compiles, whereas Go is extremely fast.

Unfortunately, from what I can see of Scala Native it's not very mature. It's currently using the a conservative Boehm garbage collector. That's what I use in cSuneido and from experience I know the weaknesses of conservative garbage collection. It's definitely not as strong as the JVM garbage collector. But they are working on a new garbage collector. Go (and Mono) also started out weak in this area but improved greatly. I can't tell whether Scala Native has the same libraries (e.g. collections) as regular Scala. The libraries are (mostly?) written in Scala so I would hope they would be portable.

Scala Native also appears to be weak in the concurrency area, which seems like a fatal flaw in today's multi-core world.

One feature mentioned on the web site is that Scala Native provides access to malloc and realloc. While this might appear attractive to C programmers, it seems a little odd in a garbage collected language. It also allows you to explicitly allocate data on the stack (like alloca). I'd much rather have escape analysis (like Go) that does that automatically. (I did see a mention that they are working on escape analysis.)

I'm not sure if it's indicative of the project as a whole, but the PragProg book seems focused on writing unsafe C code in Scala syntax. Personally, I have no desire to go back to the days of buffer overflows and mysterious crashes from dangling pointers and memory access violations. It might give higher performance, but I suspect you'd get similar performance from writing it in safe Go.

It's an interesting project that I plan to keep an eye on, but I don't think it's ready for my use yet.

Thursday, January 24, 2019

Dependencies

I've always been reluctant to take on third party dependencies in my projects. I do it, but not lightly, and preferably not often. I could justifiably be criticized for regularly reinventing the wheel. On the other hand, I've seen and experienced things like DLL version hell on Windows.

The web arena goes to the other extreme of embracing dependencies. You seemingly can't write a web page without multiple frameworks and libraries du jour.  I'm always horrified when I install a Node package and it installs hundreds of dependencies. It seems like asking for problems, although somehow it seems to work almost all of the time.

Dependencies also tend to add to bloat. Code intended for reuse has to satisfy many people's needs. Which means it has a lot of features that you don't need. Compilers and other tools can often strip out some of the unused code. But features that are interwoven into the code (e.g. extra fields in a data structure) are not easily stripped out. Often, the 80-20 rule applies and you can get what you need with a small fraction of the code of a general purpose library.

Complexity is the other side of bloat. Do you really understand that third party library? What are its performance characteristics? How does it handle errors? What algorithms is it using? What are the weaknesses? The documentation may cover some of that, but it's more likely to just tell you how to use it. None of that may be relevant at first, but eventually it probably will.

Longevity is another big factor. A dependency that isn't maintained over the long term sooner or later becomes a liability. Yet major long term projects don't seem to be worried about taking on dependencies that are maintained by one person in their spare time.

Errors are another issue. I can fix errors in my own code. Errors in third party code are not so simple. You can submit a bug report and wait, hoping that it might get fixed eventually. Or you can try to figure out the code, and if you can comprehend it, do your own patch or fork. But if it's a big complex code base, that's usually not feasible.

You can't escape dependencies - operating systems, languages, databases, etc. are all dependencies. But you can think about the costs and risks as well as the benefits

Here's a good discussion of dependencies from Russ Cox, one of the core Go language team. (the post is not Go specific) Fairly long, but worth reading.

https://research.swtch.com/deps