After some digging, I found the speed issue on Windows.
It didn't turn out to be Parallels - it was just as slow on my Windows machine at work.
It didn't turn out to be anti-virus - it was just as slow with anti-virus turned off.
I thought it might be Nagle problems again, although last time it was the opposite problem - slow on OS X and fast on Windows. But nope, not that either.
I narrowed it down to sending the response. Debugging showed it was fast receiving the request and executing it.
I found I had two methods for sending responses. One was doing selector.wakeup after changing the interestOps (like you're supposed to) and the other wasn't. Of course, the incorrect code was the method being used. I fixed the method being used and got rid of the other one. Problem solved.
Now, the speed is reasonable. 75 seconds to run the test suite with the cSuneido server, 100 seconds with jSuneido. I'm pretty happy with that considering the cSuneido version is C++ code that has been tweaked and optimized for years, whereas jSuneido is my first Java program and has barely been debugged, let alone optimized.
I think jSuneido can probably be faster, but even if it can't, its ability to scale to multiple threads/cores should allow it to outperform cSuneido when there are multiple users. I say "should" because I've barely started testing this. One very preliminary test took about 40 seconds with one client, 50 seconds with two simultaneous clients (on two cores), whereas cSuneido would be pretty much linear i.e. 80 seconds with two clients. Of course, it would be nice to have a quad-core machine to test with, but I haven't quite convinced myself that's excuse enough to buy a new Core i7 iMac :-)