Thankfully, I found the problem. Apart from the time wasted, it's somewhat amusing because I was circling around the problem/solution but not quite hitting it.
My first thought was that it was Parallels but I quickly eliminated that.
My next thought was that it was a network issue but if I just did a sequence of invalid commands it was fast.
I woke up in the middle of the night and thought maybe it's the Nagle/Ack problem, and if so, an invalid command wouldn't trigger it because it does a single write. But when I replaced the database calls with stubs (but still doing similar network IO) it was fast, pointing back to the database code.
Ok, maybe it's the memory mapping. I could see that possibly differing between OS X and Windows. But when I swapped out the memory mapping for an in-memory testing version it was still slow.
This isn't making any sense. I still think it seems like a network thing.
I stub out the database calls and it's fast again, implying it's not the network. But in my stubs I'm returning a fixed size record instead of varying sizes like the database would. I change it to return a random size up to 1000 bytes. It's still fast. For no good reason, I change it to up to 2000 bytes and it's slow!
I seem to recall TCP/IP packet size being around 1400 bytes so that's awfully suspicious.
I insert client.socket().setTcpNoDelay(true) into the network server code I'm using and sure enough that solves the problem. (Actually first time around I set it to false, getting confused by the double negative.)
A better solution might be to use gathering writes, but at this point I don't want to get distracted trying to implement this in someone else's code.
This doesn't explain why the problem only showed up on OS X and not on Windows. There must be some difference in the TCP/IP implementation.
In any case, I'm happy to have solved the problem. Now I can get back to work after a several day detour.