Efficient Bitpacking in Lemma

Today I’ve been working on improving the voxel save format in Lemma.

For those who don’t know, Lemma is a first-person parkour game set in a voxel universe (no, not like Minecraft). Voxels are saved to map files, and savegames are actually just the map in its current state, saved to a map file. So it’s pretty clear why we’d want to optimize our save format.

Map files are in an XML format; simply a list of Entities, their Components, and their Properties. Voxels are entities with a base64-encoded string of voxel data ready to be serialized. The XML format is text-based, so it’d be a good candidate for gzipping, which we’ve been planning on implementing whenever we get around to it™. However, since the voxel data is really just a giant array of thousands of integers compressed into text, it’s not quite so perfect.

I have experience with writing multiplayer netcode from a few projects; the largest and one I cared any real amount about was a game called Minor Destruction (“MD”). Put simply, MD was a 2D Block-based game that focused on moddability and PvP class-based action. In order to facilitate the latency demands, optimization was crucial. For this reason I spent a lot of time trying to optimize the netcode.

For Lemma, I took the same approach—every single bit is precious and must be conserved. The old format for storing voxels was approximately:

  • int VoxelCount
  • Array of block data:
  1. int X
  2. int Y
  3. int Z
  4. int Width
  5. int Height
  6. int Depth
  7. int ID (type of the block)
  8. 4 UV ints for 6 surfaces (24 ints total)
  • Array of block connection data:
  1. int Voxel1
  2. int Voxel2

In total, that’s 31 integers per Block (which is the largest possible rectangle that can be built out of neighboring voxels of the same type). It’s also a crapload of block connection integers, but that’s not very easily estimated in this post.

This meant that our largest map currently in the game was ~8 megabytes large, including the rest of the XML. While this isn’t a terrible figure, saves can be made often so we need to minimize this as much as humanly possible.

My first step was to optimize the Width, Height, Depth and ID into one integer. To do this, I pack all the bits together into one integer (32-bits), leaving 8 bits for each value. This is rather simple. 

The next optimization is to pack the UV ints together as much as possible. Each UV int was estimated to have a max/min value of 10 bits, with 1 used to denote sign. While I can’t guarantee these ranges, I did some testing and no map had any values that exceeded that. Originally, my plan was to pack these into two integers—using 22 bits for each integer. However, this left an ugly 10 bits that were unused and taking up space.

So I wrote a BitPacker. Essentially, all it does is takes an array of integers you want to pack and the amount of bits per integer to take from it. It then returns an array of integers that has been packed with the bits of the other integers. At most, you will only ever waste 31 bits—if the last integer in the returned array only has 1 bit in use. 

This optimization means that instead of 24 integers being stored for the Surface UVs, it now uses 8.25 (really, 9) integers. Progress!

Finally, optimization of the data storing connections between blocks is needed. I simply bitpacked these integers together using 16 bits + 1 sign bit, and it automagically worked.

In total, this has over halved our final map file size. Each voxel now takes 13 integers to store, as opposed to 31. Our biggest map went from 8.3 megabytes to 3.8 megabytes. Not quite half, but other maps have much larger reductions.

Think of the savings!