Following on from our Twitter app recently, we were asked if it could be updated, to track an object or face around a screen. We're still not entirely sure about a practical use for face tracking with a tweet (although it was much easier to implement than simple object-tracking) but perhaps some kind of app with speech-bubbles might be on the cards?
The OpenCV asset for Unity is pretty spendy, but it offers a lot of potential for making cool interactive apps - not least of all AR (augmented reality). As we were looking for a "quick win" and to get something up and running quickly, we took the hit in the pocket and bought it, to see what it could do. The AR stuff would just have to wait - for now we wanted to track a canvas object against a moving shape in a webcam feed.
As it happens, the OpenCV asset comes with a load of examples, demonstrating what you can use it for. We didn't manage to get a shape-tracking example working (the ultimate aim of this particular project) but we did get face tracking working quite quickly.
The only thing we had trouble with was reflecting the co-ordinates from the OpenCV routines into co-ordinates for our Unity Canvas objects. Our OpenCV object works using a "top-left" co-ordinate system. But if we put our text box in the top left in Unity....
.... the y co-ordinate system retains the realword orientation, where an increase in Y causes the text box to go "up" not down.
We can use positive numbers for the y-axis to position our text box, by setting the position of the box to "bottom-left".
But this means that all our y-values are relative to the bottom-left corner, not the top-left.
So we either need to use a fixed height canvas, or know the canvas height so we can calculate the relative y-position for our test box. In the end we went for "go from the centre" and changed our OpenCV results to draw the position of the centre of the found object (a face) from the middle of the image, instead of the top-left corner.
The code was surprisingly simple to hack into the existing facial recognition examples.
Here's the result:
You can just about see the red dot drawn over the centre point of the detected face in the webcam stream. The text "tweens" to this point every 0.5 seconds, hence sometimes there's a bit of lag. The latency can be removed, by making the text jump straight to the new location, but this can make the text appear a bit "jumpy" if the centre point moves by a pixel or two, even when the target is perfectly still; tweening provides the best compromise between delay and smoothness.