Summaries for week 5

KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera

Depth cameras exist for quite a long time but Kinect cameras are now accessible to everyone. The Kinect system works with depth maps but this solution is not perfectly accurate and is noisy. Indeed, depth maps are converted into mesh representation, but the maps are taken from a single viewpoint which does not allow an accurate representation of the reality. The aim is to be able to generate automatically a 3D of what is being filmed by the Kinect camera.

KinectFusion has several advantages over previous tools. First, the interactive rates are real-time for camera tracking and 3D reconstruction. Furthermore, no feature detection (RGB detection for instance) is performed. The system is also able to reconstruct surfaces, on the contrary to point-based representations. Finally, KinectFusion is mobile and work on bigger areas than other tools.

The main principle of KinectFusion is to observe the scene from several viewpoints, either by moving the camera either by moving the scene. An object can also be separated from the rest of the scene, to achieve this the user simply has to take the object and to move it. Therefore, it requires no interface. For what concerns rapid movements occluding the view such as the hands of the user, they are ignored and do not affect the representation.

The GPU implementation uses two algorithms and the main system pipeline relies on four techniques. First, the depth map conversion computes 3D points and normals from coordinates. Second, the camera is tracked on its 6 DOF so that the current image and the previous one match. Third, there is a volumetric integration to get volumes from the surfaces, and finally raycasting is performed for rendering for the user and to be able to simulate real-world physics. All these techniques are executed in parallel and coded in CUDA.

Thus, KinectFusion achieved 3 main goals. To begin, using only a camera and graphics hardware, the system is able to perform 3D tracking, reconstruction, segmentation, rendering and interaction, all in real-time. Also, object scanning works efficiently and there are AR and physics-based interaction. Last but not least, the system enables multitouch from a user and can still contract the scene. Future works will scaling the system further, using less memory and have better interactions.

Now that modeling a static scene in 3D is almost perfectly accurate, would it be possible to have a similar system to model a human being in real-time?

 

Going out: Robust Model-based Tracking Model for Outdoor Augmented Reality

Real-time AR overlays on handled devices. Traditionally, GPS is used for position and magnetic compasses as well as inertial sensors are used for orientation. However, it is not good enough in town because of magnetic fields for example. Thus this paper describes an accurate system for urban tracking.

Previous related works involved, in addition to the traditional solution, laser gyroscope (TWON-WEAR system) and vision-based localization. Vision-based localization sees either the whole scene, either edges, either points. Here, the chosen solution is an edge-based system with sensors for inertial and magnetic field measurements.

The tracking framework is made of three parts. First, edge tracking uses a 3D model (built with CAD) and projection to decide whether the view matches the model or not. Second, edges are detected from a textured 3D model and are projected back to compute the coordinates. Finally, the sensors return gyroscopic measurements as well as 3D acceleration and magnetic field vectors. These techniques are used along with computer vision so that movements do not disturb the system. However, a bad visibility because of unknown objects (even mobile objects) lead to inaccurate results. An additional recovery mechanism based on pose estimated has been developed to enhance performances and keep accurate results in these cases.

To demonstrate the system, a location game has been developed, which goal is to deliver a love message. In order to achieve it, the player has to locate a window and then take a ladder to reach it. The hardware used is a tablet PC, a USB camera and an inertial sensor. Location has first to be initialized by the user, but the orientation does not have to since it is nearly absolute. Tests have been led on two locations : City of Cambridge and St Catherine’s College. The 3D model of the first site has been made with Photobuild, while a commercial software has been used for the college. Both models are shapes of building facades, but are not detailed. The accuracy is pretty good, although buildings seem to be closer and when the user is too close, the flat model is no more good enough. For what concerns robustness, vehicles in movement do not perturb the game, even if they occluded all the picture. Finally, the performance is between 15 and 17 frames per second, and always greater than 13 fps. The tracking operation is up to 25 Hz and 17 Hz for the overall system.

To conclude, this solution presents far better results than former systems. Other possible applications are HMD and optical see-through displays but this latter one would need calibration. Future works will involve initialization to reduce possibilities and thus have a quicker search to allow very good performances in town, as well as integration into a system with a control centre to enhance efficiency.

Personally, I think an important improvement would be to have better 3D models, so that the tracking is accurate no matter the distance.

Comments are closed.