All posts filed under “Computer Vision

Inside Project Natal

Project Natal
(image Popular Science)

Previously on Pixelsumo I posted a closer look at Project Natal, from hardware status & origins. As the post suggests, the key to the technical success is bringing complex algorithms that estimate body joint positions to a mass consumer level. Getting this right in every home & lighting condition is no easy task.

The processing of the depth image was going to be on a chip in the camera, but it’s now been reported that the processing will be done in software, to keep the costs of the hardware down.

Recently lots more behind the scenes videos and press revealed insights into how the body processing works.

Project Natal
This video shows some of the body estimation processing in the making.

Originally it was thought that Natal might use time of flight (like the 3DV ZCam) to measure the time it takes for infrared light to bounce from objects.

“Frances MacDougall, chief technology officer at GestureTek, said his company was also working on Project Natal. Asked why there were so many vendors on Natal, he said that Microsoft will be using a low-cost 3-D camera from PrimeSense. But it purchased 3DV because it had a strong patent portfolio. And GestureTek itself is providing a software layer that helps interpret the data coming in from the 3-D camera and makes it useful for the game machine”. (source Venture Beat)

Instead of time of flight, the PrimeSense camera projects a pattern (like a barcode) of infrared light, with the sensor reading back this pattern and computing the depth map (which they are calling Light Coding).

This video (linked above) shows how impressive the tracking is, calculating accurately body parts from a side angle, and when joints are occluded out of view.

Project Natal
Popular Science has a great article about how Microsoft are training the brain behind the pose estimation & how the algorithms work. I really hope to get my hands on this one day…

“Step 2: Then the brain guesses which parts of your body are which. It does this based on all of its experience with body poses—the experience described above. Depending on how similar your pose is to things it’s seen before, Natal can be more or less confident of its guesses. In the color-coded person above [bottom center], the darkness, lightness, and size of different squares represent how certain Natal is that it knows what body-part that area belongs to. (For example, the three large red squares indicate that it’s highly probable that those parts are “left shoulder,” “left elbow” and “left knee”; as the pixels become smaller and muddier in color, such as the grayish pixels around the hands, that’s an indication that Natal is hedging its bets and isn’t very sure of its identity.)”

Hand from Above

Hand from Above

I have just posted some documentation of my latest work, Hand from Above.

Hand From Above encourages us to question our normal routine when we often find ourselves rushing from one destination to another. Inspired by Land of the Giants and Goliath, we are reminded of mythical stories by mischievously unleashing a giant hand from the BBC Big Screen. Passers by will be playfully transformed. What if humans weren’t on top of the food chain?

Unsuspecting pedestrians will be tickled, stretched, flicked or removed entirely in real-time by a giant deity.

Hand from Above is a joint co-commission between FACT: Foundation for Art & Creative Technology and Liverpool City Council for BBC Big Screen Liverpool and the Live Sites Network. It premiered during the inaugural Abandon Normal Devices Festival.

Written using openFrameworks & openCV.

Watch video

A closer look at Project Natal

I pretty much always use cameras and computer vision in my work, so I am always keeping a close eye on game technology developments in this area. Once these devices come out, we are then able to hack and use them for our own artworks.

I wanted to take a closer look Natal (Sony controlled and Ubisoft ‘Your Body’ coming another time), in regards to its origins, whats real, what is marketing and whats exciting.

In case someone from Microsoft is readings this post, I am not being negative, I am super impressed by the software behind it and the accuracy of the tracking algorithms, I am looking forward to getting my hands on it as a consumer & hopefully developer.

If you haven’t seen Project Natal, read here and go watch this video
project natal
First of all, its not everything shown in the video real. There are a few instances where the tracking wouldn’t work (hands over skateboard, dad getting in the way of daughter etc). The clue is the text shown clearly at the beginning “Product vision: actual features and functionality may vary”.

This video shows a demo on stage at E3 is the live demo…
project natal
Very good estimation of body points, works very fast too. Microsoft Research you have some very clever people.

This is a concept of what the hardware *could* look like come release…
project natal
(was white before e3)

But right now it doesn’t (at least publicly). The camera has an rgb sensor just like your webcam, but also a depth perception, in the form of an infrared sensor. Infrared light needs to be projected out from the hardware, it hits the player and the room and is recorded by an infrared imaging sensor. This is called time of flight. The software gets a grayscale image, with the brightest pixels being closest to the camera, before being analyzed by the software.

I was wondering where the infrared emitters are on that hardware image above? The green xbox coloured circle on the left?

This is a closer look at the live demo. It looks like probably 3 devices, rgb cam, infrared emitter and infrared sensor…
project natal

Engadget said
“The first thing to note is that Microsoft is very protective of the actual technology right now, so they weren’t letting us film or photograph any of the box itself, though what they had was an extremely rough version of what the device will look like (not at all like the press shot above). It consisted of a small, black box aimed out into the room — about the size of a Roku Player — with sensors along the front. It almost looked a bit like a mid-size (between pico and full size) projector.”

That means that the Milo interaction demo shown here from Lionhead is pre-recorded, naughty naughty :) (p.s funny parody video)
project natal
Of course its real though, hands on writeups from eurogamer, kotaku and wired.

“The Milo demo was partially being manipulated by a developer who was sitting nearby, and I couldn’t tell if he was merely calibrating the game or how much he was pulling its strings.” – kotaku

These videos show the camera hardware blurred out even :)

Ok so the hardware isn’t finished, fair enough. The magic really is in the software. How good will the tracking work under different lighting conditions in my house?

Its all about getting a good clean depth image, unlike normal webcam games that are based on solely movement detection or background subtraction (nightmare). So how big will the infrared emitter need to be?

This is the zcam by 3DV Systems…. (pic shows depth image and hardware)
3dv zcam
Watch video, see how it works.

Microsoft bought 3DV Systems. I instantly thought they were going to use this camera and their own sdk after seeing the demos, however Eurogamer reports:

“Aaron Greenberg was even more direct. Asked whether Natal was derived from 3DV technology, he told Eurogamer: ‘No, we built this in house.’”

“Kim admitted ‘it’s a combination of partners and our own software’, and some have theorised that acquisitions like 3DV’s were designed to insure the company against similar patents. ‘You have to be very aware,’ Kim said. ‘We want to ensure that we have great intellectual property protection. You have to have a strong legal approach, and this is not easy stuff. It has to be all buttoned up, legally. We have had a very concerted focus on this.’ ”

An interview on Venture Beat…

“VB: Some people are very curious about the patents in the gesture control technology. Is there freedom to innovate here, or do you have to be very aware of the patents out there?

SK: You have to be very aware. We want to ensure that we have great intellectual property protection. You have to have a strong legal approach, and this is not easy stuff. It has to be all buttoned up, legally. We have had a very concerted focus on this.”

Eurogamer reports today… “All of the key image-processing is done by Natal’s in-built silicon, leaving the Xbox 360 free to power the game itself”.

There are rumors that Microsoft are working with & licensing patents from Prime Sense, another Israel company like 3DV making such technology. This image shows their hardware, which the Natal model looks much similar to…
prime sense

Funnily enough, both joystiq and n4g reported in 2006 that Prime Sense might have been working with Sony Eyetoy for the ps3 for 3d depth sensing :)

I don’t know if that is true or not, but there is a very detailed video on youtube of the history of eyetoy with tech demos shown to university students, along with demos Sony were working on using the 3DV Zcam, shown below (1 hour in to video) with full skeletal estimation and ball batting games for PS2.

Its a competitive market out there.

Er, anyway. This video from Eurogamer came out today, a hands on with Natal. No camera, but notice the large light source in the corner (*probably* normal light with filter on to only let through infrared spectrum, maybe to help ensure good coverage for the press).
project natal

I feel the hardware or who got it to release first isn’t as important as the software tracking. This is fairly interesting…

“Tsunoda made the point that Natal will continue to work even if someone walks in front of a player because it knows how the human body works. So, if a player had his or her arms blocked, but Natal’s cameras could still see part of their arm, it can fill in the rest based on algorithms that tell it how that arm should look.”

This splat video is great. At 3:05 it shows how the depth image threshold adjusts to a 2nd person entering the space. See at 3:31 he adjusts his mic and the avatar does the same :)
project natal

For me, Johnny Chung Lee has hit the nail on the head here

“The human tracking algorithms that the teams have developed are well ahead of the state of the art in computer vision in this domain. The sophistication and performance of the algorithms rival or exceed anything that I’ve seen in academic research, never mind a consumer product.” (source)

Its exciting times ahead for the consumer level of computer vision, can’t wait.

p.s small disclaimer. Everything posted above is publicly available by google, and is just speculation.

Audience (updated)


(Work in progress image)

“We are in the finishing stages of building Audience, a new installation that will premiere on September 12th, 2008 at Wayne McGregor’s Deloitte Ignite Festival at the Royal Opera House in London. As a complement to the installation, McGregor has choreographed a short piece. For the development of the interactive aspects of Audience rAndom International are working with designer Chris O’Shea.

The Deloitte Ignite Festival will feature new works by Blast Theory, Julian Opie, Scanner and others. It is free but visitors will need to register for tickets (click buy). The festival runs from Friday, September 12th 10AM through to Sunday, September 14th, 5PM. We are looking forward to seeing you there”.

Some documentation now online –


July Digest

I am finding less time to blog these days, and my list of things to blog keeps getting bigger and bigger. From now on I will do a Pixelsumo digest at the end of each month, containing projects that didn’t make it into full posts in time.

So to start off, here are projects I wish I had written about recently…

Image Fulgurator
The Image Fulgurator by Julius von Bismarck is a device for physically manipulating photographs. It intervenes when a photo is being taken, without the photographer being able to detect anything. The manipulation is only visible on the photo afterwards.

related project : the sms guerrilla projector

Playing the Building
Playing the Building is an installation by David Byrne, on in New York until 24th August, go and see it. (thanks Andy)

Primal Source
Following on from Evoke, Usman Haque has created Primal Source for the Glow festival in California. “making use of a large-scale outdoor waterscreen projection system, Primal Source will appear like a mirage, glowing with colours and ebullient patterns generated by the competing or collaborative voices, music and screams of people nearby”. Coverage and video on Notcot

Golan Levin has created Double-Taker (Snout)… “orients itself towards passers-by, tracking their bodies and suggesting an intelligent awareness of their activities. The goal of this kinetic system is to perform convincing “double-takes” at its visitors, in which the sculpture appears to be continually surprised by the presence of its own viewers — communicating, without words, that there is something uniquely surprising about each of us.”

Jason Bruges Studio
Jason Bruges Studio have created Applause, an array of computer controlled flags at the Goodwood Festival of Speed for Veuve Clicquot. The flags rotate to point at passing race cars or to watch polo matches. Concept render videos shown on Dezeen, documentation of project on main JBS site.

Karsten Schmidt (aka PostSpectacular) has been super busy with his Processing based works. These include generative book covers for Faber print on demand, a cover for Print magazine using genetic processes & 3d model printing, as well as a fiducial generator for use with Reactivision tracking software.

Snog is a new frozen yogurt shop in London. The branding concept & design by ico design, architecture and lighting design (including a led video display ceiling with moving clouds) by Cinimod Studio.

Theodore Watson, Emily Gobeille and Meredith Dittmar have a new show at the Riviera gallery in New York until 10th August. Well worth a visit.

Golan Levin at Bitforms


If you can get there, you really showed go to the first solo show of Golan Levin in New York, at the Bitforms gallery. Opens 30th Nov 2007 – 12 Jan 2008. Artist talk 12 Jan 2008.

Golan will be showing many new works, some of them the first time in public.

Opto-isolator (shown above) asks “What if artworks could know how we were looking at them? And, given this knowledge, how might they respond to us?” The sculpture presents a solitary mechatronic blinking eye, at human scale, which responds to the gaze of visitors with a variety of psychosocial eye-contact behaviors that are at once familiar and unnerving. Among other forms of feedback, Opto-isolator looks its viewer directly in the eye; intently studies its viewer’s face; looks away coyly if it is stared at for too long; and blinks precisely one second after its viewer blinks. See behind the scenes.

Another new work is Eyecode (following the blinking theme), as well as Ghost Pole Propagator, Interstitial Fragment Processor and Refacer (a new Tmema project with Zach Lieberman).

I am very jealous of anyone who can make it. If you do, post your comments about the show.




[update - video now online]

One of three digital commissions for Picture House at Belsay hall, this piece curated by Juha Huuskonen (Pixelache) for Dott07. Read about the rest of Picture House here.

Responding to the old Estate Office space within the 19th century hall, United Visual Artists created Hereafter, a history mirror. Using a hidden high speed video camera, a gigantic terabyte hard drive, a large flat video display built into a frame to create the illusion of a traditional mirror.

The high speed camera enables playback in super slow motion. Through a combination of recording and playback, you are able to see yourself in slow motion and also in semi-blurred realtime. It’s a really great effect. You can interact with the history of yourself, creating little animations between the two of your for example.

It will randomly play back slow motion footage taken from the period of the exhibition over a number of months. This creates an unusual ghostly form, making you like behind to see if someone is really stood behind you. It also makes you aware that anything you do will be left for future visitors, so I started doing things that might get a response from people later on. Along with showing visitors from the past, UVA also included objects from the use of the room back in the day, such as a desk and chair, a clock, some flowers on the floor and ever so often, a chicken will appear. This really gives a sense of looking into the past, and you can begin to interact with the objects that don’t even exist.

The super slow motion was beautiful, seeing changes to muscles in the face, or cloth slowly falling back into place. I spent an hour with the piece, trying out various dances and movements, interacting with people from the past and leaving future messages. It’s really hard to explain, so I will update the post when video documentation appears.

View my photos.
Video from UVA coming soon.

If you are interested in the use of slow motion, look at the work of Bill Viola. I am a big fan of his work, in particular ‘Five Angels for the Millennium’.

As always of course, there are also slow motion clips on Youtube of things exploding (unrelated to the above).

Big Shadow

Big Shadow

Blue Dragon is a role playing game for Xbox 360, from legendary Final Fantasy creator Hironobu Sakaguchi. In this game, your characters shadow grows into a blue dragon during turn based fight sequences.

Big Shadow was an interactive wall, OOH (out of home) advertising, that promotes the core principle of the game through engagement of people with their shadows. Located in the centre of Shibuya Tokyo, it projects participants shadows 40 metres on to a building. By waving your hands in the air, it will randomly stretch your shadow upwards and turn it into a dragon, then playing a predefined animation. The dragon could also appear as a minotaur or pheonix. Other (unrelated) animations include being squashed by a large foot, or water being poured onto participants from a cup.

Watch the video

As well as live real shadows, virtual participants could log on to the site, and take it in turns to augment a virtual shadow into the real space. A live video feed would show the street projection. A player would control a fake persons shadow, making it dance and turn into the creatures, which would be added to the street projection.


Other shadow related posts:
Shadow Monsters, Shadow Story, Takashi’s Seasons.

Big Shadow

openFrameWorks at Ars Electronica


Most of my time at Ars Electronica was spent in the Electrolobby, taking part in the OpenFrameWorks workshop, run by Zach Lieberman & Henrik Wrangel.

“OpenFrameWorks, is a new open source, cross platform, c++ library, which was designed by Zachary Lieberman (US) to make programming in c++ for students accessible and easy. In it, the developers wrap several different libraries like opengl for graphics, quicktime for movie playing and capturing, and free type for font rendering, into a convenient package in order to create a simple, intuitive framework for creating projects using c++. It is designed to work in freely available compilers, and will run under any of the current operating systems”.

I am new to C++, but have used many computer vision applications in the past, from Director Xtras, Eyesweb, Jitter etc. I was keen to understand the depths of computer vision, exactly how these applications analysed video and create my own code for better control.

I had only attended the workshop for two days, yet OpenFrameWorks was quite easy to use, once you get started and get your head around the C++ syntax & structures. The image above shows the debug mode in the program I created. My aim was to study motion, and work out direction of movement. The lower left square shows the live video, and right of this a reference frame for background subtraction. I then created a difference image (what has changed between the background and live video), and then any movement above a threshold. The top left image is a motion history, fading out over time. By analysing this data, it is possible to study every pixel (and those around it) and work out the direction of movement using a gradient. The top right is a vector field, showing the direction and magnitude of each pixel, although my volunteer is standing quite still here. From this I made a simple demo of snow falling particles that got displaced from the vector data.

Overall OpenFrameWorks was easy to use, and even in its early state, I can see potential for this becoming what Processing is to Java, an easier entry point for those wishing to learn c++. No website for OFW just yet, but will update this post when there is

OpenFrameWorks Demo


More from Ars Electronica
Electrolobby workshop photos.