I've written/ported about 4300 lines of Java code so far - roughly 100 lines per day since I started in mid May. That's average of course, some days none, some days a lot more. (And some days it's probably been negative when I threw out code!)
There's probably nothing wrong with 100 lines of code per day. If that was new code it'd be pretty good, but with porting I'd expect higher productivity. Sometimes the conversion from C++ to Java is very straightforward and quick, other times I have to do a fair bit of redesign.
Assuming the Java code will end up half the size of the C++ code, or about 30,000 lines, at the current rate I have another approximately 260 days to go, or about 8 months. (Although I still have no idea if "half the size" will be accurate. Some things are smaller or available pre-built in Java, but others, like the btree code, are pretty much the same size.)
It's a good example of how hard estimating is psychologically. Even though these numbers are quite clear, I still think "that can't be right". It seems like I'm making good progress. Yeah, there's lots left to do, but 8 months seems like such a long time. Then again, it's already been a month and a half and it feels like I just started yesterday.
This is when I start to rationalize/justify the lower (quicker) estimate that I want to hear. After all, at the start I was learning a new language and new IDE. And I had a lot of questions to figure out in terms of how I was going to handle the conversion. Once I get up to speed it should go faster, right? Well, maybe, but then again, things can slow down too, when you hit a snag or when you've got a million finishing touches to do.
Part of the problem is that it's much easier to maintain motivation and drive for a short period. Once a project extends as long as a year it's a lot tougher to keep up the momentum, and a lot easier to get distracted (like with replication!). However, the nagging issues with the current Suneido and the business drive towards larger clients will continue to provide motivation!
Saturday, June 28, 2008
Friday, June 27, 2008
Lightweight Database Replication for Suneido
Issues
Occasionally we get runaway memory usage in Suneido. This only seems to happen with lots of users, and seems to be related to running large reports. I'm assuming it's due to the inherent problems with conservative garbage collection. Unfortunately, the fix is to restart the Suneido server program, but this is ugly, especially with a lot of active users.
Suneido is single-threaded so we don't take advantage of multiple cores.
Suneido does not have an ODBC interface. One of the reasons for this is that ODBC is quite closely tied to SQL and Suneido isn't SQL. A common reason for wanting this is to use third party report generation or data analysis tools. Or sometimes to interface with other systems.
We generally set up our clients to do on-line backups twice a day. Although this does not stop them from working, it does tend to slow down the system, especially with bigger databases. But it would be nice to do backups more often, since, in the event of a hardware failure, no one wants to lose even a few hours work. (And if the hard drive fails totally, they usually have to fall back on the nightly backup we do to Amazon S3.)
It would be nice in some circumstances to run Suneido on Amazon EC2. But EC2 doesn't have persistent storage (although they're working on it). You can load your database from S3 when you start the server and save it back to S3 when you stop the server. But if you crash, you lose it.
Replication
At RailsConf several of the sessions talked about using MySQL master-slave replication, for example to run on EC2. This got me thinking about replication for Suneido. Replication would, at least partially, address all the issues above.
- it would maintain an up to date backup with (hopefully) minimal impact on performance
- we could replicate to an SQL database, allowing ODBC access etc.
- we could run reports off the replica/slave, making use of multi-core, and reducing the memory issue because you could more easily restart the slave
- we could run on EC2 with the replica/slave to protect the data in case of crashes
Initially, I thought we could get some of these benefits just running a secondary server using the backups we already make twice a day. A lot of big reports don't need up to the minute data. But the problem would be determining when to use the live database and when to use the slightly stale copy. You could let the user pick but that would be ugly. And even they might not know that someone had just updated old data.
You could actually do replication at the application level by creating triggers for every database table and sending the changes to the slave. But it would be error prone and hard to handle issues like transaction rollbacks. It would also be running in the main thread so it could have a performance impact.
My next thought was to take advantage of the way Suneido's database is implemented - when a transaction is committed successfully a "commit" record is added to the end of the database file. So you can simply monitor the data added to the end of the file and watch for new commits.
One concern, that I haven't completely addressed, is whether different "views" of the file will be "coherent". MSDN is a little unclear on this. MapViewOfFile says:
I thought I might be able to use named file mapping objects to make sharing easier but I couldn't get it to work. I'm not sure what I was doing wrong, I didn't sink a lot of time into it.
Initially I thought I'd just create a separate thread within the server process to send the updates to the replica. Simpler than running a separate process and perhaps more likely to be coherent. As long as the replication thread did not access any of the server's variables, then there shouldn't be any concurrency issues. I went a little ways down this road until I realized that my memory management was single threaded. I could try to avoid any allocation in the replication thread but it's too hard to know what might allocate, making this approach error-prone and brittle.
So my current thinking is to make the replication a separate process. For simplicity, I will probably just make this another "mode" of running the same suneido.exe (Rather than creating a separate executable.) Suneido already works this way for standalone, client, or server - they're just different modes of running the same exe. (The exe is only just over 1 mb so it's not like having multiple modes is bloating it!)
Of course, I still have to handle the coherency issue.
What about the slave/replica? I should be able to run a standard Suneido server. The replication process can connect to it as a normal client (over TCP/IP sockets as usual). The basic functioning shouldn't require any changes. I may want to add a slave mode to tweak it to give the replication connection priority and to not allow updates from other connections.
The other piece of the puzzle is allowing a Suneido client to access multiple databases. Currently a client can only be connected to a single server. This will require some changes but they should be fairly minimal.
Replicating to an SQL database will be a little more work and it's less of a priority. (Until we get a client that demands it!)
I may be overly optimistic, but I think I can set this up with minimal effort. We'll see!
Occasionally we get runaway memory usage in Suneido. This only seems to happen with lots of users, and seems to be related to running large reports. I'm assuming it's due to the inherent problems with conservative garbage collection. Unfortunately, the fix is to restart the Suneido server program, but this is ugly, especially with a lot of active users.
Suneido is single-threaded so we don't take advantage of multiple cores.
Suneido does not have an ODBC interface. One of the reasons for this is that ODBC is quite closely tied to SQL and Suneido isn't SQL. A common reason for wanting this is to use third party report generation or data analysis tools. Or sometimes to interface with other systems.
We generally set up our clients to do on-line backups twice a day. Although this does not stop them from working, it does tend to slow down the system, especially with bigger databases. But it would be nice to do backups more often, since, in the event of a hardware failure, no one wants to lose even a few hours work. (And if the hard drive fails totally, they usually have to fall back on the nightly backup we do to Amazon S3.)
It would be nice in some circumstances to run Suneido on Amazon EC2. But EC2 doesn't have persistent storage (although they're working on it). You can load your database from S3 when you start the server and save it back to S3 when you stop the server. But if you crash, you lose it.
Replication
At RailsConf several of the sessions talked about using MySQL master-slave replication, for example to run on EC2. This got me thinking about replication for Suneido. Replication would, at least partially, address all the issues above.
- it would maintain an up to date backup with (hopefully) minimal impact on performance
- we could replicate to an SQL database, allowing ODBC access etc.
- we could run reports off the replica/slave, making use of multi-core, and reducing the memory issue because you could more easily restart the slave
- we could run on EC2 with the replica/slave to protect the data in case of crashes
Initially, I thought we could get some of these benefits just running a secondary server using the backups we already make twice a day. A lot of big reports don't need up to the minute data. But the problem would be determining when to use the live database and when to use the slightly stale copy. You could let the user pick but that would be ugly. And even they might not know that someone had just updated old data.
You could actually do replication at the application level by creating triggers for every database table and sending the changes to the slave. But it would be error prone and hard to handle issues like transaction rollbacks. It would also be running in the main thread so it could have a performance impact.
My next thought was to take advantage of the way Suneido's database is implemented - when a transaction is committed successfully a "commit" record is added to the end of the database file. So you can simply monitor the data added to the end of the file and watch for new commits.
One concern, that I haven't completely addressed, is whether different "views" of the file will be "coherent". MSDN is a little unclear on this. MapViewOfFile says:
Multiple views of a file ... are coherent ... if the file views are derived from the same file mapping object.But CreateFileMapping says:
file views derived from any file mapping object that is backed by the same file are coherentIf I had a single file mapping object it might not be too hard to share it with the replication task. But because I map large files in chunks, I have a whole bunch of file mapping objects which makes it harder to share them.
I thought I might be able to use named file mapping objects to make sharing easier but I couldn't get it to work. I'm not sure what I was doing wrong, I didn't sink a lot of time into it.
Initially I thought I'd just create a separate thread within the server process to send the updates to the replica. Simpler than running a separate process and perhaps more likely to be coherent. As long as the replication thread did not access any of the server's variables, then there shouldn't be any concurrency issues. I went a little ways down this road until I realized that my memory management was single threaded. I could try to avoid any allocation in the replication thread but it's too hard to know what might allocate, making this approach error-prone and brittle.
So my current thinking is to make the replication a separate process. For simplicity, I will probably just make this another "mode" of running the same suneido.exe (Rather than creating a separate executable.) Suneido already works this way for standalone, client, or server - they're just different modes of running the same exe. (The exe is only just over 1 mb so it's not like having multiple modes is bloating it!)
Of course, I still have to handle the coherency issue.
What about the slave/replica? I should be able to run a standard Suneido server. The replication process can connect to it as a normal client (over TCP/IP sockets as usual). The basic functioning shouldn't require any changes. I may want to add a slave mode to tweak it to give the replication connection priority and to not allow updates from other connections.
The other piece of the puzzle is allowing a Suneido client to access multiple databases. Currently a client can only be connected to a single server. This will require some changes but they should be fairly minimal.
Replicating to an SQL database will be a little more work and it's less of a priority. (Until we get a client that demands it!)
I may be overly optimistic, but I think I can set this up with minimal effort. We'll see!
Thursday, June 26, 2008
Eclipse Ganymede
The latest version of Eclipse - Ganymede (3.4) is out. I couldn't figure out how to update/upgrade from within my existing installation. Maybe that's not possible between versions. So I just downloaded and installed the new version. The only problem with that was that I had to reinstall my plugins. But I could export/import my list of update sites which made it pretty easy.
Ganymede is a coordinated release of 23 projects representing over 18 million lines of code. I'm not sure I can really wrap my head around that. It makes my porting of a measly 60,000 lines of code sound like a drop in the bucket.
Check out the release notes - some cool new stuff. It's nice to see they've improved the plugin/update installation process.
I've been using TextMate to view the Suneido C++ code as I work on the Java version. It has a lot of devoted fans and some great features, but I've been having a hard time getting used to it. For example, the Find doesn't have a "whole words" option. I guess I could do it with regular expressions, but that seems awkward. It seems like every editor has that. And the normal Find also doesn't default to the selected text. Just little things, but they slow me down.
I've been getting more comfortable with Eclipse and getting spoilt by its features. I saw there was a new release of CDT (C & C++ Development Tools) with the new Eclipse so I thought I'd try it out. I still had to struggle to import the source into an Eclipse project. There is help for this but, as is so often the case, it isn't very helpful. I ended up importing from Subversion, which is probably good since I'll be able to check in any changes I make as I'm browsing the source. I didn't bother trying to set up a build environment since it won't build on the Mac anyway. But at least I can use the same tools to access both the Java and the C++ code.
Ganymede is a coordinated release of 23 projects representing over 18 million lines of code. I'm not sure I can really wrap my head around that. It makes my porting of a measly 60,000 lines of code sound like a drop in the bucket.
Check out the release notes - some cool new stuff. It's nice to see they've improved the plugin/update installation process.
I've been using TextMate to view the Suneido C++ code as I work on the Java version. It has a lot of devoted fans and some great features, but I've been having a hard time getting used to it. For example, the Find doesn't have a "whole words" option. I guess I could do it with regular expressions, but that seems awkward. It seems like every editor has that. And the normal Find also doesn't default to the selected text. Just little things, but they slow me down.
I've been getting more comfortable with Eclipse and getting spoilt by its features. I saw there was a new release of CDT (C & C++ Development Tools) with the new Eclipse so I thought I'd try it out. I still had to struggle to import the source into an Eclipse project. There is help for this but, as is so often the case, it isn't very helpful. I ended up importing from Subversion, which is probably good since I'll be able to check in any changes I make as I'm browsing the source. I didn't bother trying to set up a build environment since it won't build on the Mac anyway. But at least I can use the same tools to access both the Java and the C++ code.
jSuneido - JavaDoc
I've posted the JavaDoc for jSuneido so far:
http://www.suneido.com/jSuneido/doc/
I haven't spent a lot of time on JavaDoc comments, but there's already more than I had for the C++ version.
Reminder - you can browse the source at:
http://suneido.svn.sourceforge.net/viewvc/suneido/jsuneido/
Or check it out using Subversion from:
http://suneido.svn.sourceforge.net/svnroot/suneido
http://www.suneido.com/jSuneido/doc/
I haven't spent a lot of time on JavaDoc comments, but there's already more than I had for the C++ version.
Reminder - you can browse the source at:
http://suneido.svn.sourceforge.net/viewvc/suneido/jsuneido/
Or check it out using Subversion from:
http://suneido.svn.sourceforge.net/svnroot/suneido
Friday, June 20, 2008
jSuneido - server (yet again)
Ok, so the plan is to use Java Channel's, Selector, and Executor to build a Reactor style server. How hard can it be :-)
The problem is multi-threading concurrency. Why isn't that surprising?
It turns out that using Selector with multiple threads requires some tricky locking. At first, looking at Architecture of a Highly Scalable NIO-Based Server it didn't seem too bad.
But then, I found How to Build a Scalable Multiplexed Server With NIO Mark II (large pdf) and it started to seem harder. I downloaded his source code and at least it was small enough to be comprehensible. Full frameworks tend to be too big to easily grasp.
At first I was just going to use his code as an example. But after a half hour of copy and paste I came to my senses. Although I was simplifying it a little, I was basically ending up duplicating his code. Not a good plan. Especially since his code is still "preliminary" and could change.
I threw out what I'd done and dropped his code into my project. And so far it's working. We'll see.
The problem is multi-threading concurrency. Why isn't that surprising?
It turns out that using Selector with multiple threads requires some tricky locking. At first, looking at Architecture of a Highly Scalable NIO-Based Server it didn't seem too bad.
But then, I found How to Build a Scalable Multiplexed Server With NIO Mark II (large pdf) and it started to seem harder. I downloaded his source code and at least it was small enough to be comprehensible. Full frameworks tend to be too big to easily grasp.
At first I was just going to use his code as an example. But after a half hour of copy and paste I came to my senses. Although I was simplifying it a little, I was basically ending up duplicating his code. Not a good plan. Especially since his code is still "preliminary" and could change.
I threw out what I'd done and dropped his code into my project. And so far it's working. We'll see.
Wednesday, June 18, 2008
Antlr and Eclipse
I just finished reading The Definitive Antlr Reference published by Pragmatic Programmers. Antlr is a parser (& lexer) generator. It's written in Java but it supports other target languages like C++, Ruby, and Python. It looks like a pretty cool system.
I have an interest in this area that goes a long way back. In the early 80's I reverse engineered Yacc to create my own look-alike. (Since there was nothing available on MS-DOS.) I didn't have the source code so it was an interesting exercise to feed different inputs to a "black box" and see what the outputs were. I only had a crude understanding of how Yacc worked but I was successful enough that I was able to use my program to generate parsers for several applications. (Later, I switched to the version of Yacc that came with the MKS Toolkit.)
Although I'd seen some mentions of Antlr, I probably wouldn't have got very interested if the book hadn't come out. Although there's information on the web, I still find a good book the best way to learn something new. And a book adds a kind of "legitimacy" to things.
When I wrote Suneido I chose to hand write a recursive descent parser. Coincidentally, unlike Yacc (and the similar GNU Bison), Antlr generates ... recursive descent parsers.
I wondered if I could use Antlr to replace Suneido's hand written parser. I installed the AntlrWorks gui tool and quite quickly managed to create a grammar for Suneido's database language.
Next, I wanted to try to incorporate it into my Eclipse project. I found a plugin and installed it and added my grammar to the project ... and got a bunch of weird errors. Argh!
After a bunch of messing around I realized that the Eclipse plugin I was using was for the old version of Antlr. The next problem was how to uninstall the plugin. This isn't as obvious as you'd think. It's under Help > Software Updates > Manage Configuration. That gives you options for updating, disabling, and showing the properties, but not uninstall. After some further poking around I discovered Uninstall is on the right click context menu on the tree. I'm not sure why it's not listed under Available Tasks. Maybe because it doesn't work very well. I chose to Uninstall the Antlr plugin. But I couldn't get rid of the grammar errors. And when I looked, there were still Antlr related files hanging around. I cleaned up manually as best I could.
I found a new plugin and tried to install it. It had dependencies on another package - the Eclipse Dynamic Language Toolkit. I couldn't find an update site for the specific version it needed so I had to install manually. Then there was a dependency on the Graphical Editing Framework (GEF) so I had to install that. Then I could finally install the new Antlr plugin. And finally get my grammar to compile.
Even better, if I want this on my other computers, I have to go through the same process to install all the dependencies. Yuck.
All in all, it was a pretty frustrating process and it soured me a bit on Antlr, although it's really nothing to do with Antlr itself. I can blame some of it on Eclipse, but it's more just a result of a rich ecosystem of third party plugins. I guess you have to expect some problems.
The real test would be to implement the main Suneido language grammar in Antlr. I think this could be a little tricky because Suneido has optional semicolons, which make some newlines significant (but the rest of the time they're ignored). It was tricky enough kludging this in the hand written parser. The Antlr book does give an example of how to parse Python-like languages where indenting whitespace is significant, although that's not quite the same problem. There's also a simplified Ruby grammar on the Antlr website that might help. (Since Ruby has similar optional semicolons.)
I'm still not convinced that using Antlr is the way to go. Since it generates recursive descent grammars it's much easier to convert the grammar from the existing parser (than it would be to use Yacc or Bison). But it also wouldn't be hard to port the existing parser directly to Java. An explicit grammar is certainly easier to modify, but the language grammar doesn't change a lot.
I have an interest in this area that goes a long way back. In the early 80's I reverse engineered Yacc to create my own look-alike. (Since there was nothing available on MS-DOS.) I didn't have the source code so it was an interesting exercise to feed different inputs to a "black box" and see what the outputs were. I only had a crude understanding of how Yacc worked but I was successful enough that I was able to use my program to generate parsers for several applications. (Later, I switched to the version of Yacc that came with the MKS Toolkit.)
Although I'd seen some mentions of Antlr, I probably wouldn't have got very interested if the book hadn't come out. Although there's information on the web, I still find a good book the best way to learn something new. And a book adds a kind of "legitimacy" to things.
When I wrote Suneido I chose to hand write a recursive descent parser. Coincidentally, unlike Yacc (and the similar GNU Bison), Antlr generates ... recursive descent parsers.
I wondered if I could use Antlr to replace Suneido's hand written parser. I installed the AntlrWorks gui tool and quite quickly managed to create a grammar for Suneido's database language.
Next, I wanted to try to incorporate it into my Eclipse project. I found a plugin and installed it and added my grammar to the project ... and got a bunch of weird errors. Argh!
After a bunch of messing around I realized that the Eclipse plugin I was using was for the old version of Antlr. The next problem was how to uninstall the plugin. This isn't as obvious as you'd think. It's under Help > Software Updates > Manage Configuration. That gives you options for updating, disabling, and showing the properties, but not uninstall. After some further poking around I discovered Uninstall is on the right click context menu on the tree. I'm not sure why it's not listed under Available Tasks. Maybe because it doesn't work very well. I chose to Uninstall the Antlr plugin. But I couldn't get rid of the grammar errors. And when I looked, there were still Antlr related files hanging around. I cleaned up manually as best I could.
I found a new plugin and tried to install it. It had dependencies on another package - the Eclipse Dynamic Language Toolkit. I couldn't find an update site for the specific version it needed so I had to install manually. Then there was a dependency on the Graphical Editing Framework (GEF) so I had to install that. Then I could finally install the new Antlr plugin. And finally get my grammar to compile.
Even better, if I want this on my other computers, I have to go through the same process to install all the dependencies. Yuck.
All in all, it was a pretty frustrating process and it soured me a bit on Antlr, although it's really nothing to do with Antlr itself. I can blame some of it on Eclipse, but it's more just a result of a rich ecosystem of third party plugins. I guess you have to expect some problems.
The real test would be to implement the main Suneido language grammar in Antlr. I think this could be a little tricky because Suneido has optional semicolons, which make some newlines significant (but the rest of the time they're ignored). It was tricky enough kludging this in the hand written parser. The Antlr book does give an example of how to parse Python-like languages where indenting whitespace is significant, although that's not quite the same problem. There's also a simplified Ruby grammar on the Antlr website that might help. (Since Ruby has similar optional semicolons.)
I'm still not convinced that using Antlr is the way to go. Since it generates recursive descent grammars it's much easier to convert the grammar from the existing parser (than it would be to use Yacc or Bison). But it also wouldn't be hard to port the existing parser directly to Java. An explicit grammar is certainly easier to modify, but the language grammar doesn't change a lot.
Tuesday, June 17, 2008
jSuneido - Server (again)
First I thought the socket server portion of Suneido would be easy in Java. Then I realized it wasn't going to be as easy as I thought. Now, after reading Java Concurrency in Practice (recommended), I'm back to thinking it'll be "easy" (relatively).
I won't need to use Grizzly or MINA or any other library/framework. The key is the Executor system which was introduced in Java 5. I can use the Executors.newCachedThreadPool to process requests.
This will result in (pooled) thread-per-request which is what I want. So if there are 50 users connected, but only 4 requests active at a given time, there will only be 4 worker threads. Actually, it's a bit more complicated because inactive threads are kept around for a while (cached) in case they're needed.
CachedThreadPool uses an unbounded number of threads. This isn't so good for uncontrolled situations because it can lead to problems like denial of service attacks. But for the situations where Suneido applications are used - on local area networks - it should be a reasonable choice.
It doesn't use the Leader/Followers pattern but it does use a SynchronousQueue to pass requests from the "dispatcher" to the worker threads. This passes the requests "directly" from one thread to another, avoiding the overhead of adding and removing requests to a normal queue.
I find this flip-flopping between optimism and pessimism quite common, both on small scales and on larger scales. Learning often fits into one of two categories - learning how to do things, or learning that things are more complex than you thought.
I won't need to use Grizzly or MINA or any other library/framework. The key is the Executor system which was introduced in Java 5. I can use the Executors.newCachedThreadPool to process requests.
This will result in (pooled) thread-per-request which is what I want. So if there are 50 users connected, but only 4 requests active at a given time, there will only be 4 worker threads. Actually, it's a bit more complicated because inactive threads are kept around for a while (cached) in case they're needed.
CachedThreadPool uses an unbounded number of threads. This isn't so good for uncontrolled situations because it can lead to problems like denial of service attacks. But for the situations where Suneido applications are used - on local area networks - it should be a reasonable choice.
It doesn't use the Leader/Followers pattern but it does use a SynchronousQueue to pass requests from the "dispatcher" to the worker threads. This passes the requests "directly" from one thread to another, avoiding the overhead of adding and removing requests to a normal queue.
I find this flip-flopping between optimism and pessimism quite common, both on small scales and on larger scales. Learning often fits into one of two categories - learning how to do things, or learning that things are more complex than you thought.
Friday, June 13, 2008
jSuneido - concurrency strategies
I've been thinking about concurrency as I make progress on porting the btree indexing code (inserting and iterating are working).
The simplest, obvious way to handle concurrency would be to lock entire indexes. You could argue that there are lots of tables and each table has multiple indexes, so usage should be spread out enough to minimize contention. Unfortunately, the 80-20 rule (or worse) applies and activity tends to focus on particular tables and indexes.
Since there is a lot more "read" activity than "write" activity, an obvious improvement would be to use "many readers / single writer" locking.
There are strategies for locking individual index nodes in btrees, but you have to be very careful on the order of locking or you end up with deadlocks.
Most btree updates only update a single node - it would be nice if those cases just locked that one node. This would definitely reduce contention. But some updates require changes to more of the tree and then it makes more sense to lock the entire index. I think I can reconcile these by having single node updaters first acquiring a read lock on the entire index and then write locking the single node. The read lock will prevent anything else from acquiring a write lock on the entire index.
A complication is that you don't know if it will be a single node update until you get there. But if you discover that it will require larger scale changes, then you can upgrade the whole index read lock to a write lock.
This approach should be deadlock free since you're always locking in a fixed order - index as a whole, and then a single node.
There is, of course, another problem - there are no persistent in-memory objects for index nodes. They are mapped in and out of memory as needed. So it doesn't do any good to lock your node object because it's just a temporary object and other threads will have their own. (Locking the entire index is ok because there is a persistent shared object for the index as a whole.)
I could cache the node objects (with weak references so they could still be garbage collected) but then the cache would need synchronization.
Or I could use OS file locking but I've had bad experiences with that in the past.
Another option would be to associate a lock object with each memory mapped "section" and lock at that level. But chunks are 4 mb so this would be pretty coarse grained locking. (Index nodes are only 4k so you could potentially be locking a thousand of them.)
I'm busy reading concurrency books, maybe one of them will give me a better idea. To start I can always just use many-readers / single writer locking on the index as a whole.
The simplest, obvious way to handle concurrency would be to lock entire indexes. You could argue that there are lots of tables and each table has multiple indexes, so usage should be spread out enough to minimize contention. Unfortunately, the 80-20 rule (or worse) applies and activity tends to focus on particular tables and indexes.
Since there is a lot more "read" activity than "write" activity, an obvious improvement would be to use "many readers / single writer" locking.
There are strategies for locking individual index nodes in btrees, but you have to be very careful on the order of locking or you end up with deadlocks.
Most btree updates only update a single node - it would be nice if those cases just locked that one node. This would definitely reduce contention. But some updates require changes to more of the tree and then it makes more sense to lock the entire index. I think I can reconcile these by having single node updaters first acquiring a read lock on the entire index and then write locking the single node. The read lock will prevent anything else from acquiring a write lock on the entire index.
A complication is that you don't know if it will be a single node update until you get there. But if you discover that it will require larger scale changes, then you can upgrade the whole index read lock to a write lock.
This approach should be deadlock free since you're always locking in a fixed order - index as a whole, and then a single node.
There is, of course, another problem - there are no persistent in-memory objects for index nodes. They are mapped in and out of memory as needed. So it doesn't do any good to lock your node object because it's just a temporary object and other threads will have their own. (Locking the entire index is ok because there is a persistent shared object for the index as a whole.)
I could cache the node objects (with weak references so they could still be garbage collected) but then the cache would need synchronization.
Or I could use OS file locking but I've had bad experiences with that in the past.
Another option would be to associate a lock object with each memory mapped "section" and lock at that level. But chunks are 4 mb so this would be pretty coarse grained locking. (Index nodes are only 4k so you could potentially be locking a thousand of them.)
I'm busy reading concurrency books, maybe one of them will give me a better idea. To start I can always just use many-readers / single writer locking on the index as a whole.
Tuesday, June 10, 2008
jSuneido - Bye Bye Generics
Sometimes constraints are a good thing. They force you to look for alternatives. To think harder.
I started out by doing a fairly simple translation of C++ templates to Java generics in the Suneido btree code.
I immediately ran into problems because Java generics are very different from C++ templates.
I then proceeded to thrash and flail, trying one approach, abandoning it, throwing out the code, starting over with another approach. (This doesn't lead to much progress in terms of lines of code!)
The C++ btree design was fairly complicated. There were four Slot classes and four corresponding Slots classes. The Btree template was instantiated with four different combinations of these, for various purposes. Within Btree the slots were wrapped by LeafNode and TreeNode classes.
As I worked on it, I realized that three of the Slots classes were virtually identical. And on top of that, they were virtually identical to the Record code.
The fourth Slots/Slot classes were used in a single place in a specialized way that is much more suited for a hash table than a btree. At the time, I was probably so wrapped up in btree's that I didn't even think twice about it.
So in Java I've ended up with a non-generic Btree class, a single Slots class (that mostly delegates to the record class), and a single Slot class. Three classes instead of 10 or so in the C++ code. I have lost a little bit of type safety, but these classes are used in a pretty restricted way so this shouldn't be a problem. And I'll have tests anyway :-)
To be fair, some of the C++ use of templates was motivated by performance. Using templates avoids the overhead of virtual method calls. Since I never tried it without the templates it's hard to say whether that was justified.
Working with Java is forcing me to change my mindset a little. Coming from a time when memory, cpu, and disk resources were severely limited, and working on systems like Suneido where performance was important, I tend to always have "efficiency" in mind as I design.
So I'd try to avoid allocation, for example by putting objects on the stack instead of the heap. But you can't do that in Java.
I also worked hard in Suneido to make the mapping from database to memory as direct as possible. By carefully designing classes I could store them directly on disk, and with memory mapping access them directly without any reads/writes or translation. Again, this is a lot tougher in Java with no pointers. I can do a certain amount with memory mapped ByteBuffer's but it's definitely not as direct. On the positive side, the Java code is ending up a lot simpler than the C++ code. Java just doesn't provide the same opportunities for "clever" tricks.
It'll be interested to see what affect this has on the resulting performance. You might think that since hardware has improved so much that it doesn't matter. But the two versions are going to be compared side by side and if the Java version is slower, it'll be a hard sell. Although, if I can get the multi-threading working reasonably well, that will give the Java version a big advantage, even if it ends up slightly slower on a single core.
I think it's also going to be beneficial to be forced to not only read the C++ code, but to understand it. I've already discovered several sections in the code that turned out to be not used at all. (Nothing major, but still "junk".) And although, I'm not "finished", it's looking pretty clear that revisiting the btree code is going to lead to simplifying it a lot.
I started out by doing a fairly simple translation of C++ templates to Java generics in the Suneido btree code.
I immediately ran into problems because Java generics are very different from C++ templates.
I then proceeded to thrash and flail, trying one approach, abandoning it, throwing out the code, starting over with another approach. (This doesn't lead to much progress in terms of lines of code!)
The C++ btree design was fairly complicated. There were four Slot classes and four corresponding Slots classes. The Btree template was instantiated with four different combinations of these, for various purposes. Within Btree the slots were wrapped by LeafNode and TreeNode classes.
As I worked on it, I realized that three of the Slots classes were virtually identical. And on top of that, they were virtually identical to the Record code.
The fourth Slots/Slot classes were used in a single place in a specialized way that is much more suited for a hash table than a btree. At the time, I was probably so wrapped up in btree's that I didn't even think twice about it.
So in Java I've ended up with a non-generic Btree class, a single Slots class (that mostly delegates to the record class), and a single Slot class. Three classes instead of 10 or so in the C++ code. I have lost a little bit of type safety, but these classes are used in a pretty restricted way so this shouldn't be a problem. And I'll have tests anyway :-)
To be fair, some of the C++ use of templates was motivated by performance. Using templates avoids the overhead of virtual method calls. Since I never tried it without the templates it's hard to say whether that was justified.
Working with Java is forcing me to change my mindset a little. Coming from a time when memory, cpu, and disk resources were severely limited, and working on systems like Suneido where performance was important, I tend to always have "efficiency" in mind as I design.
So I'd try to avoid allocation, for example by putting objects on the stack instead of the heap. But you can't do that in Java.
I also worked hard in Suneido to make the mapping from database to memory as direct as possible. By carefully designing classes I could store them directly on disk, and with memory mapping access them directly without any reads/writes or translation. Again, this is a lot tougher in Java with no pointers. I can do a certain amount with memory mapped ByteBuffer's but it's definitely not as direct. On the positive side, the Java code is ending up a lot simpler than the C++ code. Java just doesn't provide the same opportunities for "clever" tricks.
It'll be interested to see what affect this has on the resulting performance. You might think that since hardware has improved so much that it doesn't matter. But the two versions are going to be compared side by side and if the Java version is slower, it'll be a hard sell. Although, if I can get the multi-threading working reasonably well, that will give the Java version a big advantage, even if it ends up slightly slower on a single core.
I think it's also going to be beneficial to be forced to not only read the C++ code, but to understand it. I've already discovered several sections in the code that turned out to be not used at all. (Nothing major, but still "junk".) And although, I'm not "finished", it's looking pretty clear that revisiting the btree code is going to lead to simplifying it a lot.
Monday, June 09, 2008
jSuneido - Server
I may have been a little premature when I said "writing a multi-threaded socket server seems a lot simpler in Java than in C++".
The examples I looked at that were so simple were for thread-per-connection. This is a reasonable model for low volume or for short connections (i.e. one request per connection).
But when you have persistent, intermittently active connections as Suneido does, then thread-per-connection doesn't scale very well. There are several problems - one is that each thread consumes resources so if you have a lot of inactive threads waiting for intermittent requests you tie up unnecessary resources. Another problem is that a thread that's been inactive will no longer be in the cpu caches and may even have been swapped out to disk, especially when resources are tight. This makes it slow to respond and can lead to thrashing.
One of the alternatives is the Reactor pattern. I was planning to use the ACE reactor implementation for the C++ version. The reactor model uses a smaller pool of threads that carry out requests, not tied to specific connections.
I could write my own reactor in Java, but it's tricky. The basic idea is simple, but getting the multi-threading right and getting good performance aren't so simple.
I may use Grizzly - a Java implementation of the reactor pattern, but it doesn't appear to support the Leader/Follower pattern.
The "obvious" approach to implementing a reactor is to have one "dispatcher" thread that listens for requests and a pool of worker threads to handle them. But this "forces" at least one context switch per request, not so good for "short" requests. With the Leader/Follower pattern there is no dispatcher - threads take turns accepting a request and then handling it. This avoids the dispatching context switch. If the operating system supports it, all the non-busy threads can simultaneously "wait" on all the connections and the OS will give an incoming request to one of them. If the OS doesn't support this, then one of the non-busy threads is chosen as the "Leader" and accepts the next request, and then another thread is promoted to "Leader".
Leader/Follower may not be that important for Suneido since most requests are large enough that the context switch overhead is probably insignificant.
There is also the Apache MINA network application framework. It seems a little "heavier" than what I need.
The examples I looked at that were so simple were for thread-per-connection. This is a reasonable model for low volume or for short connections (i.e. one request per connection).
But when you have persistent, intermittently active connections as Suneido does, then thread-per-connection doesn't scale very well. There are several problems - one is that each thread consumes resources so if you have a lot of inactive threads waiting for intermittent requests you tie up unnecessary resources. Another problem is that a thread that's been inactive will no longer be in the cpu caches and may even have been swapped out to disk, especially when resources are tight. This makes it slow to respond and can lead to thrashing.
One of the alternatives is the Reactor pattern. I was planning to use the ACE reactor implementation for the C++ version. The reactor model uses a smaller pool of threads that carry out requests, not tied to specific connections.
I could write my own reactor in Java, but it's tricky. The basic idea is simple, but getting the multi-threading right and getting good performance aren't so simple.
I may use Grizzly - a Java implementation of the reactor pattern, but it doesn't appear to support the Leader/Follower pattern.
The "obvious" approach to implementing a reactor is to have one "dispatcher" thread that listens for requests and a pool of worker threads to handle them. But this "forces" at least one context switch per request, not so good for "short" requests. With the Leader/Follower pattern there is no dispatcher - threads take turns accepting a request and then handling it. This avoids the dispatching context switch. If the operating system supports it, all the non-busy threads can simultaneously "wait" on all the connections and the OS will give an incoming request to one of them. If the OS doesn't support this, then one of the non-busy threads is chosen as the "Leader" and accepts the next request, and then another thread is promoted to "Leader".
Leader/Follower may not be that important for Suneido since most requests are large enough that the context switch overhead is probably insignificant.
There is also the Apache MINA network application framework. It seems a little "heavier" than what I need.
Sunday, June 08, 2008
jSuneido - C++ Templates versus Java Generics
I'm currently working on porting Suneido's btree code from C++ to Java.
It's challenging because the C++ version makes heavy use of templates (plus a few other pointer type tricks).
If I look at the C++ code one way it seems really elegant, but if I look at it another way it seems really twisted.
The problem is that Java generics are very different from C++ templates. I know that people complain about Java generics, but just using the standard generic collection classes it seems fine. It's only when you try to write a more complex generic class that you start to see the reason for the complaints.
Java generics work by "type erasure". In C++ terms this would be like compiling all your templates down to a single void* version. This had the big advantage that it was backward compatible and didn't require any VM changes. (It also results in smaller compiled code.) The disadvantage is that you "lose" all the type information at run-time.
For example, class can't do new T() because T has been "erased" (to Object).
The way around it is to create new objects via an instance. This instance can be the class - Class - in which case you use newInstance from the reflection API. Or it can be a "factory" object. Unfortunately, this instance has to be passed in (since the whole problem is that generic classes can't instantiate parameter types)
This will all be old hat to experienced Java programmers. And it wouldn't be a big issue for me, except that I'm trying to port C++ templates that have no problem instantiating parameter types ...
It's challenging because the C++ version makes heavy use of templates (plus a few other pointer type tricks).
If I look at the C++ code one way it seems really elegant, but if I look at it another way it seems really twisted.
The problem is that Java generics are very different from C++ templates. I know that people complain about Java generics, but just using the standard generic collection classes it seems fine. It's only when you try to write a more complex generic class that you start to see the reason for the complaints.
Java generics work by "type erasure". In C++ terms this would be like compiling all your templates down to a single void* version. This had the big advantage that it was backward compatible and didn't require any VM changes. (It also results in smaller compiled code.) The disadvantage is that you "lose" all the type information at run-time.
For example, class
The way around it is to create new objects via an instance. This instance can be the class - Class
This will all be old hat to experienced Java programmers. And it wouldn't be a big issue for me, except that I'm trying to port C++ templates that have no problem instantiating parameter types ...
Java Books
Here are the Java books I've bought in an attempt to go from 0 to 100 in a matter of weeks. Some of them are fairly specific to my needs, like the virtual machine internals.
The Definitive ANTLR Reference: Building Domain-Specific Languages (Pragmatic Programmers)
by Terence Parr
by Terence Parr
Effective Java (2nd Edition) (The Java Series)
by Joshua Bloch
This is helpful for learning the "right" way to do things in Java.
by Joshua Bloch
This is helpful for learning the "right" way to do things in Java.
Java(TM) Platform Performance: Strategies and Tactics (The Java Series)
by Steve Wilson
(old, but I picked it up used for $4 from Powells in Portland)
by Steve Wilson
(old, but I picked it up used for $4 from Powells in Portland)
Core Java(TM), Volume I--Fundamentals (8th Edition) (Addison-Wesley Object Technolo)
by Cay S. Horstmann
by Cay S. Horstmann
TCP/IP Sockets in Java: Practical Guide for Programmers (The Practical Guides)
by Kenneth L. Calvert
Annoyingly, I find there is a newer edition.
by Kenneth L. Calvert
Annoyingly, I find there is a newer edition.
powered by LibraryThing
Saturday, June 07, 2008
Java Version for jSuneido on Eclipse on Mac
The last time I opened up my jSuneido project in Eclipse on my Mac at home I had an error. But I hadn't had an error on Windows at work.
It turned out I was using a method (Arrays.copyOfRange) that was introduced in 1.6
The default for the Mac still appears to be Java 1.5
To "fix" this go into Eclipse > Preferences > Java > Installed JREs and pick 1.6
PS. Don't forget to enable asserts in the newly chosen JRE (like I did!)
It turned out I was using a method (Arrays.copyOfRange) that was introduced in 1.6
The default for the Mac still appears to be Java 1.5
To "fix" this go into Eclipse > Preferences > Java > Installed JREs and pick 1.6
PS. Don't forget to enable asserts in the newly chosen JRE (like I did!)
Friday, June 06, 2008
Fear
Fear is often a good thing. It makes us think twice. Ideally it can reduce mistakes.
But handled the wrong way, fear can backfire, it can be self-fulfilling.
For example, let's say you're afraid of upgrading your software. That's understandable - software upgrades can be painful.
But how you handle the fear is important. It's positive if it encourages you to be cautious, to test the upgrade first, to upgrade in a way you can rollback, to upgrade at a slow time when you'll have time to recover from any problems, to wait for version 2.1 instead of 2.0
Instead, you might let your fear stop you from upgrading. In the short term this saves you work and pain. But in the long term, you're going to have a problem. The longer you go without upgrading, the scarier it gets. Before you know it you're three versions behind and your version is no longer supported. Now you are forced to upgrade, and now you're pretty much guaranteed to have problems.
Or lets say you're afraid of your server crashing because you're not sure you know enough to re-install and re-configure it. Yikes! That's cause for fear. But the answer is not to cross your fingers. The answer is to solve the problem - get a spare machine, learn how to re-install and re-configure. Do a test run - pretend your server has crashed and go about re-building it.
Face your fears, use them, don't avoid them (or ignore them).
(Note: I'm not saying you should rush to do upgrades - that's just an example.)
But handled the wrong way, fear can backfire, it can be self-fulfilling.
For example, let's say you're afraid of upgrading your software. That's understandable - software upgrades can be painful.
But how you handle the fear is important. It's positive if it encourages you to be cautious, to test the upgrade first, to upgrade in a way you can rollback, to upgrade at a slow time when you'll have time to recover from any problems, to wait for version 2.1 instead of 2.0
Instead, you might let your fear stop you from upgrading. In the short term this saves you work and pain. But in the long term, you're going to have a problem. The longer you go without upgrading, the scarier it gets. Before you know it you're three versions behind and your version is no longer supported. Now you are forced to upgrade, and now you're pretty much guaranteed to have problems.
Or lets say you're afraid of your server crashing because you're not sure you know enough to re-install and re-configure it. Yikes! That's cause for fear. But the answer is not to cross your fingers. The answer is to solve the problem - get a spare machine, learn how to re-install and re-configure. Do a test run - pretend your server has crashed and go about re-building it.
Face your fears, use them, don't avoid them (or ignore them).
(Note: I'm not saying you should rush to do upgrades - that's just an example.)
Monday, June 02, 2008
RailsConf - Day 4
I started off the day with "The Worst Rails Code You've Ever Seen". Some of this was pretty basic mistakes. But not having written any Rails code myself, I couldn't even see what was wrong with some of the examples, let alone how to fix them. I'm sure it was obvious to most people.
The next session was supposed to be about Scaling Rails from the Inside Out by Engine Yard, but instead he talked about a new system for distributed application "cloud" computing. Interestingly, the system was based on XMPP (the Jabber instant messaging protocol) and much of it is written in Erlang.
Scaling was a common theme in the conference. Next I went to a talk by RightScale about using Amazon EC2. Pretty neat stuff. He talked about one case where a company got suddenly popular on Facebook and scaled from 80 servers to 3500 servers in a matter of days (although not without problems along the way!)
My last session was an open Q&A with four of the JRuby developers. Quite interesting, although not as much internals as I'd like. But then again, most people don't care so much about that end of things.
The conference wrapped up with a Q&A session with four of the Rails core developers.
Overall, a good conference, even though I'm not a Rails developer!
The next session was supposed to be about Scaling Rails from the Inside Out by Engine Yard, but instead he talked about a new system for distributed application "cloud" computing. Interestingly, the system was based on XMPP (the Jabber instant messaging protocol) and much of it is written in Erlang.
Scaling was a common theme in the conference. Next I went to a talk by RightScale about using Amazon EC2. Pretty neat stuff. He talked about one case where a company got suddenly popular on Facebook and scaled from 80 servers to 3500 servers in a matter of days (although not without problems along the way!)
My last session was an open Q&A with four of the JRuby developers. Quite interesting, although not as much internals as I'd like. But then again, most people don't care so much about that end of things.
The conference wrapped up with a Q&A session with four of the Rails core developers.
Overall, a good conference, even though I'm not a Rails developer!
Sunday, June 01, 2008
RailsConf - Day 3
The initial keynote was by Jeremy Kemper on Rails 2.1 - some interesting new stuff.
Next I went to a session on Git. Git is a new version control system. In this crowd it seems to be rapidly replacing Subversion. It seems like not that long ago that SourceForge got SubVersion, I wonder how long it'll be before they support Git? (However, it is possible to use Git together with Subversion.) It was a high speed overview of Git, but it did cover a little about how it works which I always like to know. I've been thinking about improving Suneido's version control system. Something like Git might solve a lot of the problems we have.
Then a session on Rails deployment with jRuby. jRuby on Rails lets you take advantage of the multi-threading in the JVM so you can run multiple Rails instances within one VM (rather than the conventional multiple processes.) There's a new tool called Warbler that makes it really easy to do the deployment on jRuby.
And if you want to make deployment really easy you can use Heroku - a Rails "hosting" solution that uses Amazon EC2 and S3 to scale up and down as your traffic requires. Very cool. We use S3 and have been pretty happy with it. Currently we're running our own Rails server for eTrux, but it's a hassle and it would be nice to have it hosted.
Then another session on jRuby by Ola Bini - more general but still interesting
My final session was a comparison of Ruby testing frameworks, primarily Test/Unit, Rspec, and Shoulda. I thought I might pick up some new ideas but they're all pretty similar, nothing too exciting.
Kent Beck gave the keynote at the end of the day. He talked about three "big ideas" he's been involved in - patterns, test driven development, and XP/Agile. I've heard him speak before and he's worth listening to.
When I first starting going to conferences a few years ago I felt under-privileged because I didn't have a Mac laptop like the cool people. Now it's not just the cool people - it's virtually everyone that has a Mac laptop - even me! But I still feel under-privileged because now everyone has MacBook Pro's and I only have a regular MacBook. One guy that sat down next to me even commented "oh, you have one of those old Macs". (Actually, they're not that "old" - they're still being sold, but I got the point.) There weren't many of the new Mac Air laptops - they're probably a little underpowered for this kind of programmer crowd.
I'm enjoying this conference more than I expected. The sessions have been varied enough that I can find stuff that I'm interested in. It's a pretty young group - both the presenters and the attendees. And there seems like a lot of energy and enthusiasm.
One thing that has struck me is just how rich and varied the software ecosystem is becoming. So many products - several in every niche it seems. maybe that's partly a result of how much lower the barrier to entry is. One or two people can produce a lot these days. And with the internet, they can spread the word about what they've done. It makes it a little confusing, but the end result is that there is an amazing amount of amazing stuff out there.
Next I went to a session on Git. Git is a new version control system. In this crowd it seems to be rapidly replacing Subversion. It seems like not that long ago that SourceForge got SubVersion, I wonder how long it'll be before they support Git? (However, it is possible to use Git together with Subversion.) It was a high speed overview of Git, but it did cover a little about how it works which I always like to know. I've been thinking about improving Suneido's version control system. Something like Git might solve a lot of the problems we have.
Then a session on Rails deployment with jRuby. jRuby on Rails lets you take advantage of the multi-threading in the JVM so you can run multiple Rails instances within one VM (rather than the conventional multiple processes.) There's a new tool called Warbler that makes it really easy to do the deployment on jRuby.
And if you want to make deployment really easy you can use Heroku - a Rails "hosting" solution that uses Amazon EC2 and S3 to scale up and down as your traffic requires. Very cool. We use S3 and have been pretty happy with it. Currently we're running our own Rails server for eTrux, but it's a hassle and it would be nice to have it hosted.
Then another session on jRuby by Ola Bini - more general but still interesting
My final session was a comparison of Ruby testing frameworks, primarily Test/Unit, Rspec, and Shoulda. I thought I might pick up some new ideas but they're all pretty similar, nothing too exciting.
Kent Beck gave the keynote at the end of the day. He talked about three "big ideas" he's been involved in - patterns, test driven development, and XP/Agile. I've heard him speak before and he's worth listening to.
When I first starting going to conferences a few years ago I felt under-privileged because I didn't have a Mac laptop like the cool people. Now it's not just the cool people - it's virtually everyone that has a Mac laptop - even me! But I still feel under-privileged because now everyone has MacBook Pro's and I only have a regular MacBook. One guy that sat down next to me even commented "oh, you have one of those old Macs". (Actually, they're not that "old" - they're still being sold, but I got the point.) There weren't many of the new Mac Air laptops - they're probably a little underpowered for this kind of programmer crowd.
I'm enjoying this conference more than I expected. The sessions have been varied enough that I can find stuff that I'm interested in. It's a pretty young group - both the presenters and the attendees. And there seems like a lot of energy and enthusiasm.
One thing that has struck me is just how rich and varied the software ecosystem is becoming. So many products - several in every niche it seems. maybe that's partly a result of how much lower the barrier to entry is. One or two people can produce a lot these days. And with the internet, they can spread the word about what they've done. It makes it a little confusing, but the end result is that there is an amazing amount of amazing stuff out there.
Subscribe to:
Posts (Atom)