Studio 3

Pointclouds with Kinect

1. Download SimpleOpenNI:

https://github.com/totovr/SimpleOpenNI

2. Save in the libraries folder of Processing application

3. Import inside processing

Downloading Kinect driver on Windows
Library openkinectforprocessing does not work.
Use SimpleOpenNI instead.

Code for Kinect camera:



The first thing we're going to do is plug in the Kinects and try out the Processing code to get and display images from the Kinect.

In Processing, first you need to import the library by typing:

import SimpleOpenNI.*;

Next we'll need to declare some variables. We're going to require that the code defines a SimpleOpenNI object:

SimpleOpenNI  kinect;

Now it's time to jump into the Processing setup function to initialize everything. The Kinect camera resolution is 640x480, so we're going to tell the function to display at that size. Then we're going to tell it we've got a Kinect, and to stop if if can't find the camera. Then we'll enable the IR camera and webcam inside the Kinect, and tell Processing to mirror the image it shows us, because it's incredibly freaky to look at webcam output that doesn't behave like a mirror. That's too much machine POV for comfort.

void setup()
{
size(640, 480);
kinect = new SimpleOpenNI(this);
if (kinect.isInit() == false)
  {
  println("Can't initialize SimpleOpenNI. Please check the camera.");
  exit();
  return;
  }


kinect.setMirror(true);
kinect.enableDepth();
kinect.enableRGB();
}

So far, so good. Now we need a draw function. So let's start by looking at the RGB image from the webcam.

void draw()
{
kinect.update();
background(200, 0, 0);


image(kinect.rgbImage(), 0, 0);
}

It's nice to be able to screenshots of whatever you've got, so let's add a function to make an output image every time you click your mouse on the image:

void mousePressed(){
save("kinectImage.png");
}

If you want to find where it's putting that image, go to the Sketch menu in Processing, and click on Open Sketch Folder. Note that the filename is going to be the same every time, so if you click your mouse twice, the second image will overwrite the first. So if you've captured something so awesome you can't live without it, move it somewhere else before you click again.

So let's run that. You should be looking at the webcam image from the Kinect. Great, but not so exciting, right? Because that's just a webcam, and the exciting bit of the Kinect is the depth sensor. So let's look at the IR camera. Remove this line:

  image(kinect.rgbImage(), 0, 0);

And replace it with this line:

  image(kinect.depthImage(), 0, 0);

Then run that again. There's your IR camera output. So that's kinda abstract. Let's take a look at what's really going on. The IR sensors are sending out infrared, which the infrared sensor is picking up as reflections. Basically, little points of reflected light. So let's try a different view of that depth data. Remove this line:

  image(kinect.depthImage(), 0, 0);

So first let's make our view bigger. Up top, with your import statements, add this:

import processing.opengl.*;

Next, go up to the setup where you set your size and remove that line. Replace it with this:

  size(640,480,OPENGL);

You'll notice an extra argument there - the "opengl" tells the image window we want it to use OpenGL to draw with.

Next, set your background to black (0, 0, 0) instead of red, and below the background add this code to assist the drawing:

translate(width/2, height/2, -250);
rotateX(radians(180));
translate(0,0,500);
stroke(255);

Okay, so now for some fun programming stuff. We want to see the actual little points that the IR camera sees. That means we're not viewing images anymore, but thousands of random points, so we need to create a data structure that'll hold them. Jump back down to your draw() function, below the call to kinect.update() and the background call, and let's get those points.

We're going to do it by creating an array, which is basically an indexed list, where each element can be retrieved by calling its index number in the list. (Remember that programmers start counting at 0, so if you want the first item in an array, it's #0, not #1. And if you want the last item in a ten-item array, it's #9, not #10.)

PVector[] depthPoints = kinect.depthMapRealWorld();

PVector[] means you're creating an array of PVector objects - the square brackets are what tells you that's an array. And instead of depthImage(), you're asking the Kinect for depthMapRealWorld(), which is the actual points it sees. That's cool, but how is your image window going to display that? PVectors aren't images. So we've got to tell it how to display that. Under that line, add this:

for (int i = 0; i < depthPoints.length; i++){
   PVector currentPoint = depthPoints[i];
   point(currentPoint.x, currentPoint.y, currentPoint.z);
}

Here's where it matters that programmers start counting at 0, and that the index of the last item of an array is one less than the number of items in that array. What you're doing with that "for" loop is taking your array of points and saying, "Start at index zero and keep going as long as the index is less than the number of items in the array, and run these two lines of code on every item at the indices in between". So basically for every item in the array, this code says "Call that the current point, grab it's XYZ coordinates, and draw a point there".

So run that, and see what you get. It's going to run slowly, and it's kind of a small image, so we're going to improve that in the next steps. Let's clean it up a little. First, in your setup, let's go for a bigger image:

  size(1024,768,OPENGL);

Next, we're going to make the drawing a little less intensive, so the frame rate can pick up a bit. Under the stroke call, add this:

 int skip = 5;

Then, in your "for loop, change "i++" to "i+=skip". Instead of adding 1 to i so it does every point, it's going to add 5, and do only every 5th point. Now run it again. And remember, if it's great art, you can click your mouse on your image window and get an output image.

So now we're going to put the two cameras' images together. This is a slightly tricky bit - if you inspect your Kinect, it's basically two cameras sitting side by side, so their images are from different points of view. They're going to be off by a couple of inches and a little bit of skew. You can see that if we add in the RGB image. First, under the kinect.update() call, let's grab the RGB image like this:

PImage rgbImage = kinect.rgbImage();

Next, we're going to use the RGB data to change the color of the points being drawn from the real-world IR points. In that for loop you wrote to draw the points, after you set currentPoint to the current point in the array, and before you tell the image window to draw it, let's change the color:

   stroke(rgbImage.pixels[i]);

You'll also want to change the for loop's "i+=skip" back to "i++" so you get a nice, dense image. Then run that. You'll see that the colors don't match up too well to the shapes that the depth camera is defining, because of the camera offset. That's no good. Fortunately, Kinects have a function that tries to mathematically compensate for the different POVs of the two cameras to give you images that can be overlaid on each other. This is called registering the two images. (If you've ever done color block printing in the art studio, when you line up two woodblocks to print colors, you're registering the image.) So let's tell the Kinect we want it to register the images. Head back up to your setup function. After you enable the depth and RGB images, turn on the registration like this:

kinect.alternativeViewPointDepthToImage();

It'd be nice if the Processing wrapper called it something obviously to do with registration, rather than something vague like that, but it's sort of along the lines of shifting the POV of the depth results to that of the RGB image, so that name almost makes sense.

Okay, run that again. A lot better, right? Still noisy, but the edges line up pretty well.

So let's look at some 3D space. For now, let's change "i++" in the for loop back to "i+=skip", just so we can speed things up a little. If you run the code now, you'll see a much less dense image that's more obviously composed of points.

The first thing we're going to do is add some zooming. We'll need a variable to hold the zoom value, so up at the top under your import statements, let's add one:

float zoom = 1;

Then let's add some keystroke controls to zoom in and out. These can be any two keys, but let's go with the up and down arrows for in and out. At the bottom of your code, under the mousePressed() function, we're going to add a new function:

void keyPressed(){
 if(keyCode == 38){
        zoom = zoom + 0.01;
 }


 if(keyCode == 40){
  zoom = zoom - 0.01;
 }
}

Great. So now we need to tell the drawing function that it needs to change the zoom when it draws. Underneath your calls to translate in the draw() function, we're going to tell it to scale:

 translate(0,0,zoom*-1000);
 scale(zoom);

Now when you run your code, you can use the up and down arrow to zoom in and out on the image.

That's great, but most cameras have a zoom. What we're interested in, when we add depth to an RGB image, is viewing in 3D. So we're going to add a way to move this image in 3D space, using the mouse as the POV controller. We'll start with the Y-axis. In your draw() function, under the block that calls translate/rotateX/translate functions, add this:

 rotateY(radians(map(mouseX, 0, width, -180, 180)));

The map() function is doing some fancy math that takes your mouse position on the X-axis of the image window and converts it to a circular measure (radians) of how much to rotate the image around that Y axis. Go ahead and run that, moving the mouse to the left and right. You can see how the points the Kinect is generating exist in 3D space, and also the blind spots created by objects blocking other objects.

So that's cool, but let's add the rotation around the X axis just for kicks. Replace this line:

 rotateX(radians(180));

With this line:

 rotateX(radians(map(mouseY, 0, height, -180, 180)));

Now run that. You can spin your point cloud on the X and Y axes and zoom; if you get lost, remember that the Y axis returns to the normal view in the middle of the image, and the X-axis returns to normal at the top and bottom of the image, so you can go halfway across the top of the image to return to your normal view. We could add spinning on the Z axis, but that just rotates your image in a circle, rather than showing off cool 3D things, so let's move on to making cool things. Let's get some video out of this. Processing creates videos in a way that's very similar to the Python code: it creates lots of individual frames, and then you use an included Moviemaker tool to turn them into a video file.

First, we'll add a variable to hold whether recording is on or not. We want to start with recording off, so the variable is going to be initialized as false. Go up to the top of your code, under where you declare your zoom float, and let's add a boolean:

boolean record = false;

Then we want to add a way to stop or start recording. We're going to do it with the 'r' key, so we need to add a new Key Press Event handler. Go down to your KeyPressed() function, and we're going to add this:

 if(key == 'r'){
  record = !record;
 }

Now that we have a way of telling the code whether we're recording or not, we need to tell the draw() function to record a frame of video when it draws. At the very end of the draw() function, after the for loop that draws your points, add this:

if (record) {
        saveFrame("frames/frames####.png");
 }

This has to be at the end, because it saves whatever has been drawn up to that point. If you make changes to the images after that, they won't be saved to file.

To view your frames, go to the Sketch menu, and click on Open sketch folder". You'll see a folder that has your frames in it. Go to the "Tools" menu, and select "Moviemaker". It's going to ask you to browse to the folder with your frames in it. The easiest way to get this is to look at the top of the file manager for the nested directories above the "frames" directory. When you've selected "frames" in Moviemaker, go ahead and tell it to make a movie for you.

You're going to find that Moviemaker has one big drawback: it doesn't let you control the fps of the video you're creating, and webcams - and the Kinect is basically a fancy webcam - don't have the expected standard video framerates. So let's find another tool to encode that videothat gives you more control. You may remember back in the Python OpenCV exercises, there was a final encoding step that ran in the terminal window after the frames were generated. You can use the same thing here. The program is called ffmpeg, and we can use the same arguments that the Python code used.

First, open up a terminal window. Next, take note of your frames directory - you'll need that here, too. This is the command to run ffmpeg - you may have seen it, if you looked at the Python code. Replace the things inside brackets with your custom arguments - only the capital letters, I've given you some hints on what your filenames should look like in the lowercase parts.

ffmpeg -i [FRAMES DIRECTORY/frames%d.png] -r [FPS] -vcodec png [OUTPUT FILENAME.mov]

So what's a good FPS for these? Well, it's really hard to know exactly; when we're starting and stopping the recording by hand, we don't have a great way to automatically get how many seconds long the record time is - or at least, I haven't written one. You could probably write a few lines of code to count and tell you. My suggestions are these: in the Kinect specs I've seen, 15fps is what they say each camera gives, since they're sharing a single data stream at 30fps. However, when I was encoding Kinect videos by hand a while back, I was getting about 7fps in practice. I'd try the encoding at 7fps and at 15fps, and see how close one of those gets you to natural-looking motion.

Now that you've got a few tools under your belt, you can experiment and create a small video in 3D space.

Bonus Section

If you raced through that, and you'd like to try something else, there's a feature of the Kinect that's good for thinking about animation and motion capture. This demonstrates the basics of using the Kinect to a) separate out the background of an image and remove it, and b) identify a human shape and track it In more complicated examples, the Kinect can estimate some rough skeleton positions that could hypothetically be used to capture motion to animate a 3D character. Give this code a try - based on the previous exercises, you should find it pretty readable.

import processing.opengl.*;
import SimpleOpenNI.*;


SimpleOpenNI kinect;
int[] userMap;


void setup() {


 size(640, 480);
 kinect = new SimpleOpenNI(this);
 if (!kinect.isInit()) {


        println("Kinect not connected.");
        exit();
 }
 else {
        kinect.setMirror(true);
        kinect.enableDepth();
        kinect.enableRGB();
        kinect.enableUser();
        kinect.alternativeViewPointDepthToImage();
 }
}


void draw() {
 kinect.update();
 background(0,0,150);
 PImage rgbImage = kinect.rgbImage();
 image(rgbImage, 0, 0);
 loadPixels();
 PImage users = new PImage(640, 480, RGB);
 userMap = kinect.userMap();
 for (int i=0; i < userMap.length; i++) {
         if (userMap[i] != 0) {
         //users.pixels[i] = rgbImage.pixels[i];
users.pixels[i] = color(50,50,255);
         }
         else {
           users.pixels[i] = color(255,255,255);
         }
 }
 users.updatePixels();
 image(users, 0, 0);
}

If you'd like to go deeper into this at home, talk to the TA - it's not too hard to set up, but there are a couple of little things to watch out for that we can help you through. (Windows users, I've found that SimpleOpenNI stopped working well on Windows in the last release, so it may take a little extra work getting it to work for you, but let us know if you're interested - we can figure out something.)