I pretty much always use cameras and computer vision in my work, so I am always keeping a close eye on game technology developments in this area. Once these devices come out, we are then able to hack and use them for our own artworks.
I wanted to take a closer look Natal (Sony controlled and Ubisoft ‘Your Body’ coming another time), in regards to its origins, whats real, what is marketing and whats exciting.
In case someone from Microsoft is readings this post, I am not being negative, I am super impressed by the software behind it and the accuracy of the tracking algorithms, I am looking forward to getting my hands on it as a consumer & hopefully developer.
If you haven’t seen Project Natal, read here and go watch this video…
First of all, its not everything shown in the video real. There are a few instances where the tracking wouldn’t work (hands over skateboard, dad getting in the way of daughter etc). The clue is the text shown clearly at the beginning “Product vision: actual features and functionality may vary”.
This video shows a demo on stage at E3 is the live demo…
Very good estimation of body points, works very fast too. Microsoft Research you have some very clever people.
This is a concept of what the hardware *could* look like come release…
(was white before e3)
But right now it doesn’t (at least publicly). The camera has an rgb sensor just like your webcam, but also a depth perception, in the form of an infrared sensor. Infrared light needs to be projected out from the hardware, it hits the player and the room and is recorded by an infrared imaging sensor. This is called time of flight. The software gets a grayscale image, with the brightest pixels being closest to the camera, before being analyzed by the software.
I was wondering where the infrared emitters are on that hardware image above? The green xbox coloured circle on the left?
This is a closer look at the live demo. It looks like probably 3 devices, rgb cam, infrared emitter and infrared sensor…
“The first thing to note is that Microsoft is very protective of the actual technology right now, so they weren’t letting us film or photograph any of the box itself, though what they had was an extremely rough version of what the device will look like (not at all like the press shot above). It consisted of a small, black box aimed out into the room — about the size of a Roku Player — with sensors along the front. It almost looked a bit like a mid-size (between pico and full size) projector.”
That means that the Milo interaction demo shown here from Lionhead is pre-recorded, naughty naughty :) (p.s funny parody video)
Of course its real though, hands on writeups from eurogamer, kotaku and wired.
“The Milo demo was partially being manipulated by a developer who was sitting nearby, and I couldn’t tell if he was merely calibrating the game or how much he was pulling its strings.” – kotaku
These videos show the camera hardware blurred out even :)
Ok so the hardware isn’t finished, fair enough. The magic really is in the software. How good will the tracking work under different lighting conditions in my house?
Its all about getting a good clean depth image, unlike normal webcam games that are based on solely movement detection or background subtraction (nightmare). So how big will the infrared emitter need to be?
This is the zcam by 3DV Systems…. (pic shows depth image and hardware)
Watch video, see how it works.
Microsoft bought 3DV Systems. I instantly thought they were going to use this camera and their own sdk after seeing the demos, however Eurogamer reports:
“Aaron Greenberg was even more direct. Asked whether Natal was derived from 3DV technology, he told Eurogamer: ‘No, we built this in house.’”
“Kim admitted ‘it’s a combination of partners and our own software’, and some have theorised that acquisitions like 3DV’s were designed to insure the company against similar patents. ‘You have to be very aware,’ Kim said. ‘We want to ensure that we have great intellectual property protection. You have to have a strong legal approach, and this is not easy stuff. It has to be all buttoned up, legally. We have had a very concerted focus on this.’ ”
An interview on Venture Beat…
“VB: Some people are very curious about the patents in the gesture control technology. Is there freedom to innovate here, or do you have to be very aware of the patents out there?
SK: You have to be very aware. We want to ensure that we have great intellectual property protection. You have to have a strong legal approach, and this is not easy stuff. It has to be all buttoned up, legally. We have had a very concerted focus on this.”
Eurogamer reports today… “All of the key image-processing is done by Natal’s in-built silicon, leaving the Xbox 360 free to power the game itself”.
There are rumors that Microsoft are working with & licensing patents from Prime Sense, another Israel company like 3DV making such technology. This image shows their hardware, which the Natal model looks much similar to…
Funnily enough, both joystiq and n4g reported in 2006 that Prime Sense might have been working with Sony Eyetoy for the ps3 for 3d depth sensing :)
I don’t know if that is true or not, but there is a very detailed video on youtube of the history of eyetoy with tech demos shown to university students, along with demos Sony were working on using the 3DV Zcam, shown below (1 hour in to video) with full skeletal estimation and ball batting games for PS2.
Its a competitive market out there.
Er, anyway. This video from Eurogamer came out today, a hands on with Natal. No camera, but notice the large light source in the corner (*probably* normal light with filter on to only let through infrared spectrum, maybe to help ensure good coverage for the press).
I feel the hardware or who got it to release first isn’t as important as the software tracking. This is fairly interesting…
“Tsunoda made the point that Natal will continue to work even if someone walks in front of a player because it knows how the human body works. So, if a player had his or her arms blocked, but Natal’s cameras could still see part of their arm, it can fill in the rest based on algorithms that tell it how that arm should look.”
This splat video is great. At 3:05 it shows how the depth image threshold adjusts to a 2nd person entering the space. See at 3:31 he adjusts his mic and the avatar does the same :)
For me, Johnny Chung Lee has hit the nail on the head here…
“The human tracking algorithms that the teams have developed are well ahead of the state of the art in computer vision in this domain. The sophistication and performance of the algorithms rival or exceed anything that I’ve seen in academic research, never mind a consumer product.” (source)
Its exciting times ahead for the consumer level of computer vision, can’t wait.
p.s small disclaimer. Everything posted above is publicly available by google, and is just speculation.