Saturday, 10 September 2016

OSC UDP TCP/IP and choosing the right technology

Back in the real world, we occasionally have to venture out from behind the nerd bench and making stuff for fun and enter into the commercial arena - it's what some of us have to do in order to pay the rent. It's also what some of us do for the sake of "staying relevant", keeping up with different technologies, and applying it in a real-world application.

The wealth of different technologies for controlling hardware has exploded in recent years. Maybe it's the IoT (internet of things) "branding" or maybe it's platforms like Arduino and Raspberry Pi that have led to a renewed interest in making things talk to each other - whatever it is, there's never been a better time for making hardware and integrating it with cool technology like smartphones, tablets, handsets and PC computers.

But with all this choice comes responsibility.
A responsibility to choose the most appropriate technology for the job in hand. Sometimes "most appropriate" simply means "one I know how to use". And for many things, that's fine. But something that has become apparent, as the IoT revolution has exploded, is that more and more people are selling their services as industrial/commercial "specialists" - and many haven't the first idea what they are doing!

Over the years we've built layers of abstraction into technology. A bit like cars. They're simpler to use and easier to work on (since everything comes in a single, encapsulated module that can just be swapped out). But with this ease of use comes a cost - when things go wrong, it's not so easy to fix! With cars you take it to the garage; a part may no longer be "fixable" as it might have been 30 years or more ago, so it's just thrown out and a replacement part fitted. That is the cost of simplification.
With technology the same thing seems to be happening. More and more, projects are built from encapsulated modules of code (or libraries if you like) which make everything nice and simple. Until it goes wrong. But, unlike a poorly performing car, it's not always so easy to throw out the "broken" module and replace it with a new one - sometimes it's not immediately obvious which module is broken. Sometimes there may not be a replacement library/module available. What then?

But worse than this, there seems to be a fundamental mis-understanding of which technology to use and when. At best it's a lack of awareness. A gap in knowledge or understanding of how things work "under the hood" meaning that the best-fitting solution is either not-known or overlooked. At worst, it's a lack of what we might call "giving a shit".

An example springs to mind - it's difficult to be too specific, since this is a real-life, commercial ongoing project. But the outline is something like this:

We've been asked to build some hardware that a user interacts with. Our hardware connects to a central controller which, when certain combinations of events occur, sends a message to a video player to play a specific sequence. During this time our hardware should remain silent. When the video has finished playing, our controller receives a message from the video player to say it's ready to accept incoming messages again.

That's the jist of things. There's a bit more to it, but those are the basics. We've been asked to send our messages to the video player using the OSC protocol. On the face of things, it seems ok. But when we look a little further into things, questions start to raise about whether or not this would be the most appropriate technology for the job....

Now first up, let's just say that TouchOSC is a great product.

It works across multiple platforms, iOS/Android etc. The interfaces look great and it has an inbuilt editor that lets you put sliders, dials, buttons and the like together really easily. It's dead easy to use and the end result looks pretty good (a bit same-y; once you've seen the TouchOSC app all interfaces tend to look the same, but it's very good for what it does).

But it our case, is it really the most suitable solution?
Firstly - and this is our biggest bugbear - TouchOSC uses UDP to send messages. It's a fire-and-forget protocol. You press a button, a message gets fired, and you just hope that it gets there.

UDP has it's uses. It's relatively fast (compared with TCP/IP). It's great for realtime gaming. And OSC has proven very popular amongst musicians and even lighting engineers. There's an old joke that nerds often tell to explain UDP:

"I'd tell you a joke about UDP. but I'm not sure if you'd get it".

Understand the joke and you pretty much understand UDP.
For some applications, UDP is a great fit. For games, for example. If you're continually updating a number of other players across a network of your character's location, UDP is perfect. It's fast, lightweight and does the job. It doesn't just address one target - it can be broadcast across an entire network easily. TCP/IP would be a poor substitute, with it's latency and error-checking and resend-on-fail, only being delivered to one recipient at a time, all adding to the time it takes to update some game-world co-ordinates. Repeat that for multiple players sharing a game, and you've got a pretty slow, unresponsive game. Compared to TCP/IP then UDP is fast.

For realtime audio, UDP is also pretty good - for the same reasons.
When you're whirling an onscreen rotary encoder, and the app is blarting out a stream of values as the encoder position changes, you want to be able to send a high volume of messages quickly. You want the realtime feedback (the change in volume) to be almost instantaneous, not laggy.

For UDP's strengths as a high-volume, high-speed transport layer, it also has one major weakness: you never know if the message was received. It's a bit like shouting into the darkness. There's no specific end-point, you just put a message onto a port number and anyone who is listening gets the message.

In contrast, sending data via TCP/IP is a bit more akin to using Royal Mail's parcel tracking service. It's sent to a specific address. It's slower. When the parcel arrives, confirmation is requested and sent back to the sender to acknowledge everything arrived as it should. Sending data via TCP/IP has an "overhead" but at least you know your data has reached its destination.

So why is using OSC and UDP such a pain for our particular application?

It's important to understand that we're not saying UDP sucks and TCP/IP is great. Just that there are better reasons for choosing one over another to match the requirements of the project.

For realtime gaming, UDP is ideal. If a packet of data doesn't arrive at the destination (if you should into the darkness but your voice is drowned out by a passing drunk singing "Danny Boy" walking past your door at 2am in the morning) it doesn't really matter. Because any second now, there'll be another packet of data coming along. And another. And at the receiving end, the missed data doesn't really cause a problem. If, for example, you're updating an on-screen avatar, whose location is being updated many many times a second, the odd dropped set of co-ordinates is hardly noticeable. The on-screen character might jump four pixels instead of two, or it might be so far in the distance as the jump in game-world position is unnoticeable.

Similarly if you're using an OSC controller over UDP to, say, make some lights go up and down, the odd missed packet of data doesn't really matter. If you turn a virtual rotary encoder and the lights don't immediately respond, because you're looking at them for feedback, you know to turn the wheel a little bit more - a whole heap of data gets blasted towards the lighting controller and it takes just one packet of data to update the light.

For high volume, high speed data, UDP is very useful.

However, for two-way communication, with long delays between messages, it's not quite so robust. In our situation, we're having to use a fire-and-forget "shouting into the darkness" communication method when what we really need to know is that our messages have been received (and, similarly, it's really important that we don't miss any messages coming back).

For this particular project, TCP/IP would be a much better communications protocol. We're sending low volume messages. Latency isn't a problem - if the response time between a user interacting with our hardware and the video starting to play was as much as a few hundred milliseconds, the end result would be no different!

But not being able to guarantee that messages are received could cause all kinds of headaches. Here's how the mode of operation should go.

  • User interacts with hardware
  • Message sent to video player
  • Hardware stops responding to user while video plays
  • Video player sends message after video ends
  • Hardware responds to user input again
  • Repeat ad infinitum

 It only takes one dropped message to cause the whole system to appear to have stopped working. If we reach a trigger point and our device firmware says "tell the video to play and stop responding to user input until we get a message back" what happens if our message to the video player fails to arrive?

Our hardware is now in a "suspended state" and no return message is ever going to be received to turn it back on (since the video player hasn't been told to play a video and thus send an "end of video" message back.)

Or maybe the video player does get the "play video" message. Everything is fine. We stop responding to user input, while the video is playing, as required. But what if the "end of video" message coming back is lost? Our hardware never wakes up again!

In either scenario, we eventually end up with hardware that appears to have stopped working. And all for the sake of choosing one communications protocol over another. When queried, we were told that "both systems have worked well for us and our OSC libraries use UDP so that's what we're using".

Of course, it would be possible to implement a feedback loop, from devices to video player and back again - send a message and if no response is received within a certain timeframe, resend it. But then acknowledgements coming back from the video player are broadcast across the entire network, to every connected device. So how do we know which acknowledgement is for which device? By implementing some kind of address system? So... pretty much recreating the TCP/IP protocol. But by shouting. And resending a lot of data. Suddenly our fast, zippy UDP transport layer is bogged down with noise and multiple packets of data......

This isn't an anti UDP rant or an anti-OCS moan, far from it.
But it's just to highlight - because there seem to be an awful lot of "computer people" out there unaware of what's going on under the hood - that sometimes pre-built libraries and handy, encapsulated modules of code, are not always the "best" fit.

Sometimes you actually need to understand what is going on. Especially if you're working in a commercial environment and billing a client for your time. Simply falling back on someone else's code and assuming everything is going to be alright because it worked for someone else in the past isn't good enough. Because they may have used it under a completely different set of circumstances, to achieve an entirely different result.

Please people. from one bunch of nerds to another, be mindful of what you're doing, why you're doing it, and choose the most appropriate technology - not just the quickest/cheapest/easiest to prototype with. That way we can all build an internet-of-things to be proud of, not just a buggers-muddle of poorly-designed devices all fighting for our bandwidth!