Tuesday, April 08, 2008

Too many heap sections

Recently, a couple of our clients with big databases (for us that's over 4 gb) have been unable to repair their database after crashing. (Why they crashed in the first place is another question - there may be hardware problems.) The repair aborts with "Too many heap sections". Luckily Thankfully, the software does automatic backups twice a day and the customer didn't lose much data. But the repair is still preferable as it generally only loses incomplete transactions that were in progress at the time of the crash.

This error comes from the Boehm memory manager/garbage collector we use. The repair process does keep a lot of information in memory and the amount of information is relative to the size of the database. The question was how to fix it? Did I need to rewrite the repair process to keep less information in memory, maybe use a temporary file? (Although that would make it slower.) Or was there something that could be adjusted in the Boehm code? There's also a newer version of the Boehm code - we're on 6.5 and the latest is 7.0

I searched on the web but didn't find any useful information about this error. Most mentions of it were pretty old. (I did find a reference to the Boehm code with the Mac OS X code - I wonder what part of OS X uses it?)

I searched the Boehm code for the error message and found it in several places (not very DRY). The error is caused when MAX_HEAP_SECTS is exceeded. I searched for where that is defined and the value seemed to depend on whether SMALL_CONFIG or LARGE_CONFIG (or neither) was defined. I wasn't specifically defining either, presumably leading to a medium setting.

I figured it was worth a try re-compiling with LARGE_CONFIG. I modified the makefile and re-built. Then I read a magazine while I tried running the repair process. (It takes a while to process a 4 gb database, even with a fast computer with lots of memory.) It reminded me of the "old" days when I'd have time to catch up on my reading while I compiled what today would be regarded as tiny C programs.

Damn, it still crashed with the same error. Oh well, should have known it wouldn't be that easy.

I went back to remove the setting from the makefile and glanced at the file name. Hey! I was editing the makefile for the MinGW version, but I'd been testing the VC7 version. Doh!

Try again, this time build the right version & test the version I build.

Eureka! It completed successfully.

Now we'll just have to test this version enough to be relatively sure that the change doesn't have any unwanted side effects. Maybe it will be that easy after all!

Sidenote: Considering the number of people and projects using the Boehm code (e.g. Mono) it seems odd that there isn't more documentation. I can understand Boehm not writing it, I'd rather he spent his time on the code. But you'd think someone along the way would have written some. Maybe no one else understands it well enough. Despite having written my own, I know I don't - I just treat it as a black box.

3 comments:

johno said...

Would you mind sharing the how-to of recompiling GC to support > 4GB memory?

andrew said...

All I did was compile with LARGE_CONFIG defined.

I am building a Win32 executable so it can only use 2gb of memory.

The 4gb reference was to the database which is in a file.

I would assume to access more than 4gb of memory it would have to be a 64 bit program.

johno said...

Well, I have 64bit program. Problem is that mono crashes after allocating 4gigs of memory, leaving the rest of 12 gigs without a touch ;(