Thursday, 12 November 2015

MIDI magic - reading midi files via a byte array

It's been a while since we did any MIDI firmware.
MIDI is one of those things that we didn't really care much for a few years back - old 80s synths and analogue oscillators to make a few bleepy bloopy farty noises didn't really hold much interest. But after discovering that you can send MIDI messages over serial/uart at a specific baud rate (31250bps) we actually had a bit of fun creating MIDI controllers - not so much to create the actual noise (that's what the synth part is for) but to create unusual interfaces to trigger the synth sounds.

To date, we've only really played with "live" MIDI messages - parsing incoming data as it arrives or creating "outgoing" MIDI messages to play sounds on a synth. With our light-up guitar neck, we'd like to be able to display notes from a MIDI sequence, not just in real time, but also to be able to scroll backwards and forwards along the "timeline".

This is to allow the player to find where to put their fingers, then wind the song forwards to find where the next note is, then back again, to work out the best way of fretting the notes. Whichever way we look at it, we're going to have to go beyond just "live" MIDI messaging and get our hands dirty with the old .mid file system!

MIDI files use a pretty basic protocol. We're parsing them by loading the file as a single byte array, then walking through it.

The first few bytes in an MIDI file should be the midi file header, 77,84,104,100 (these values relate to the ascii characters MThd).

With MIDI, getting data from one device to another as quickly as possible was always key, so there's very little overhead in the data. After each "block marker" comes a "length of data" value. How this works is key to how to read MIDI files:

For any variable length data, it's packed as a seven-bit value. The leading bit (MSB) is either set to indicate that there's another byte to come, or clear, if that's the end of the variable length data. So if you had a "chunk" of data with a length of 100 bytes, the variable data would be a single byte with the (decimal) value 100. Easy huh?

But if the length value was greater than 127, it would be transmitted as two (or more) bytes. For example, to store the value 180 as variable length data, it would be stored as 129, 32. Let's look at how that works:

Our value is split over two bytes. The first byte is sent, with the leading bit (MSB) set to indicate that the multi-part value is not yet complete. Just the same as when we receive two-byte values into word or integer data types, we're going to have to do a bit of bit-shifting to create the actual final value. When we receive the next byte, we bit-shift the current value then OR the following byte value (minus the leading bit) to the running total. If the next byte has it's MSB set, we continue doing this, until we receive a bit with the leading/MSB bit cleared (indicating the end of a multi-part value).

Unlike when we receive a 16-bit value as two 8-bit bytes, we're not simply bit-shifting eight places; because our values are only seven bits long, we have to remember to bit shift only seven places to the left. It's also important to remember to mask out the leading/MSB bit (as this is a flag or marker, and does not make up part of the value we're trying to read!)

It took us quite a while to understand this concept, and we struggled for a while, reading things like strings and copyright notices, and other variable-length values. But once we got over this little stumbling block, the rest of the parsing routine was very simple.

For no real reason, we built our midi parser using PHP.
The file is loaded and the data pushed into a byte array:

if(isset($_GET['uid'])){ $uid=trim($_GET['uid']);}

$handle = fopen($path, "rb");
$fsize = filesize($path);
$midi_string = fread($handle, $fsize);
$midi = unpack("C*",$midi_string);
$midi_file_length = count($midi);

We're using the variable $midi to store the byte array and the variable $p to indicate where in the MIDI file we're reading data from. Here's a generic function which reads a multi-byte value, and updates the "pointer" position once the "length value" has been read out of the byte array.

function getVariableLengthValue(){

      $pk=$midi[$p] & 127;
      while($midi[$p] & 128){
            $pk+=$midi[$p] & 127;


Using these functions and the MIDI messaging tables from here to read back the right number of bytes in each "chunk" we managed to get a quick-and-dirty midi parser up and running quite quickly:

Now we've successfully parsed our midi file, we can shove the note data into a couple of arrays and create an interface to convert the note on/note off messages into LED on/LED off signals for our light-up guitar neck.