[Summaries Week 5]: Kinect Fusion; Going Out

KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera

In their paper Kinect Fusion, Izadi et al. describe their implementation of a system that uses a standard Kinect camera to generate a real-time, interactive 3D representation of the live camera feed that is robust to user intervention/interaction. They also explain their extensions to the system that makes it usable for real-time object segmentation and for appropriating arbitrary surfaces for touch.

This novel work draws on a number of areas and has a number of different applications, prominent among them being low cost scanning, geometry-aware augmented reality and effortless object segmentation.

The primary enabler for their work is the use of a full-frame (dense) Iterative Closest Point(ICP) algorithm for determining global camera pose at every position. Leveraging large scale GPU parallelism allows the authors to achieve interactive frame-rates. The authors describe their algorithm that obtains a set of vertices and normals from depth map data that is captured by the Kinect, operating in parallel on each depth map pixel. ICP is then used to determine corresponding points between appropriately transformed previous frame and current frame vertices, and this data is used to determine the new global camera pose. The authors also describe how they employ a volumetric description of the data captured thus far, using a Signed Distance Function(SDF) to compose voxels, and subsequently use ray-casting to perform the final rendering.

The authors mention a couple of novel interactions that are enabled by the design of their rendering system. They describe how a simple modification to the rendering pipeline enables synthetic geometry to be overlaid on the surfaces reconstructed from the video feed. This also allows one to simulate particle physics with synthetic particles capable of interacting with static particles corresponding to real-world objects.

The authors also describe a couple of applications made possible by the use of a dense ICP, including allowing the user to interact with the scene, without the failure of tracking, and the possibility of performing object segmentation by the use of ICP outlier data. In concluding, the authors also describe their initial forays in researching reconstructing moving surfaces by performing a dual pass dense ICP specifically for foreground points.


 Going Out: Robust Model-based Tracking for Outdoor Augmented Reality

In their paper Going Out, Reitmayr and Drummond describe their hybrid tracking model that combines inertial sensor data with data returned by a textured 3D model based vision tracking system. A significant portion of the authors’ work deals with mitigating problems encountered by tracking systems in an outdoor setting.

Post discussing relevant prior work, the authors describe their system in detail. Using a prior estimate on the camera’s pose, the authors project an existing model of the scene being tracked and, at each frame, determine the camera motion that best aligns the model’s projection with edges in the video frame. An appearance-based edge search method is used for matching the projection and the video frame for finding the best alignment, as opposed to a simple edge detector. The camera motion matrix thus determined can be used to obtain the new camera pose. The authors also model errors in their estimate of the new camera pose.

The authors describe how their system makes use of (readily available) textured 3D models, as opposed to detailed, edge-models, thereby avoiding problems associated with dense clustering of line features at a distance. This also allows them to avoid dealing with complex occlusion and hidden line removal algorithms, which are handled by the graphics pipeline itself.

The authors then describe how they fuse inertial measurements into their vision based tracking system. They obtain gyroscopic measurements of rotational velocity, acceleration and magnetic field, and fuse this sensor data with the camera pose measurement obtained earlier by using a Kalman filter.

The authors also describe how their system keeps track of some, temporally adequately spaced frames in a frame store to allow for recovery of the vision based tracking system due the transient occluders. If a tracking failure is detected, the system allows the current state of the system to be “reset” to that of a matching frame in the frame store from recent history.

As a demonstration of their work, the authors describe their implementation of an OpenGL based game that has the player partaking in amorous activity mimicking the tradition of Fensterln. Analyzing their results, the authors comment on their system’s accuracy and the gain in stability obtained by the use of a backing frame store. They also point out that the system is capable of real-time frame-rates of 15-17 fps.


Primesense sensor at CES 2013

Here’s the link to the page with the video (also embedded below) for the primesense sensor I mentioned in class.  I personally think the “story” (if you can call it that) in the video is pretty pathetic, but watch it and consider each use of the tech.  Some are good, some bad.  What I like are the ones that let you do things at a distance where no other interaction would be easy (e.g., interacting with the vacuum robot), or when your hands are otherwise occupied or messy/sterile (e.g., in the kitchen, the doctor).   I personally despise the opening “use gesture to control a presentation” (if he cared about emphasizing the importance of that pie chart, he would have scripted it, not trusted his success to a could-go-badly live “interactive demo”).  And some are silly:  would she really shop in front of her date in a public place, putting her personal info (even minimal) up for all to see?   Please consider all the proposed uses seriously when you watch.