RealPerspective: Head Tracking with Intel RealSense Technology

By Promotion Published Date
30 - May - 2016
| Last Updated
10 - Jun - 2016
RealPerspective: Head Tracking with Intel RealSense Technology


RealPerspective utilizes Intel® RealSense™ technology to create a unique experience. This code sample utilizes head tracking to perform a monoscopic technique for better 3D fidelity.

Using a system equipped with an Intel® RealSense® camera, the user can move their head around and have the game’s perspective correctly computed. The effect can be best described as looking into a window of another world. Traditionally this has been done with a RGB camera or IR trackers [3], but with the Intel RealSense camera’s depth information, the developer is provided accurate face tracking without any additional hardware on the user.

The sample accomplishes the effect by implementing an off-axis perspective projection described by Kooima [1]. The inputs are the face’s spatial X, Y position and the face’s average depth.

Build and Deploy

The Intel® RealSense™ SDK and the Intel RealSense Depth Camera Manager are required for development.
For deploying the project to end users, the matching SDK Runtime Redistributable must be installed.
To download the SDK, SDK Runtime, and Depth Camera Manager, go to:


For the Unity project, to ensure compatibility with the SDK installed on the system, please replacelibpxcclr.unity.dll and libpxccpp2c.dll in Libraries\x64 and Libraries\x86 of the project with the DLLs in bin\x64 and bin\x86 of the Intel RealSense SDK respectively.


Initialize Intel RealSense Camera

During start up, the Sense Manager initializes and configures the face module for face detection (bounding rectangle and depth). Once completed the Sense Manager pipeline is ready for data.

Process Input

The process input function returns a Vector3 normalized to 0 to 1 containing the face’s 3D spatial position from the Intel RealSense camera. If the Intel RealSense camera is not available, mouse coordinates are used.

The face’s x, y, z location comes from the Intel RealSense SDK’s face module. The face’s XY planar position comes from the center of the face’s bounding rectangle detection in pixel units. The face’s z comes from the face’s average depth in millimeters. The function is set to be non-blocking, so if data is not available the Update function is not delayed. Otherwise, the previous perspective and view matrices are unchanged.

Calculate Off-Axis Parameters

pa, pb, and pc are points that define the screen extents and determine the screen size, aspect ratio, position, and orientation in space. These screen extents are scaled based on the screen and come from the application’s window. Finally n and f determine the near and far planes, and in Unity, the values come from the Camera class.

For example, if the room is 16 by 9 units with an aspect ratio of 16:9, then pa, pb, pc can be set so the room covers the screen. Distance from pa to pb will be width of the room, 16 units, and the distance from pato pc is the height of the room, 9 units. For additional examples, see Kooima [1].

Off-Axis Parameters

Calculate Off-Axis Matrices

The goal of this function is to return the off-axis matrix. The projection matrix is essentially based on the OpenGL* standard glFrustum. The final step is aligning the eye with the XY plane and translating to the origin. This is similar to what the camera or view matrix does for the graphics pipeline.

Projection matrix

First, the orthonormal basis vectors ( vr, vu, vn) are computed based on the screen extents. The orthonormal basis vectors will later help project the screen space onto the near plane and create the matrix to align the tracker space with the XY plane.

Off-Axis Matrices

Next, screen extents vectors, va, vb, and vc, are created from the screen plane.

Screen extents vectors

Next, frustum extents, l, r, b, and t, are computed from the screen extents by projecting the basis vectors onto the screen extent vectors to get their location on the plane then scaled back by distance from the screen plane to the near plane. This is done because the frustum extents define the frustum created on the near plane.

frustum created on the near plane

Finally, once the frustum extents are computed, the values are plugged into the glFrustum function to produce the perspective projection matrix. The field of view can be computed from frustum extents [2].

Projection plane orientation

The foreshortening effect of the perspective projection works only when the view position is at the origin. Thus the first step is to align the screen with the XY plane. The matrix M is constructed to get the Cartesian coordinate system to screen-local coordinates by its basis vectors ( vr, vu, and vn). However, the screen space is what needs to be aligned with the XY plane, thus the transpose of matrix M is taken.

View point offset

Similarly, the tracker eye position, pe, must be translated to the frustum origin. This is done with a translation matrix T.


The computed matrices are fed back into Unity’s Camera data structure.


The test system was a GIGABYTE Technology BRIX* Pro with an Intel® Core™ i7-4770R processor (65W TDP).

In general, the performance overhead on average is very low. The entire Update() function completes in less than 1 ms. About 0.50 ms for consecutive detected frames with face and 0.20 ms for frames with no detected faces. New data is available about every 33 ms.

Use Case and Future work

The technique discussed in the sample can be used seamlessly in games when RealSense hardware is available on an Intel® processor-based system. The provided auxiliary input system adds an extra level of detail that improves the game’s immersion and 3D fidelity.

A few possible use cases are RTS (real-time strategy), MOBA (multiplayer online battle arena), and tabletop games, which let the user see the action as if they are playing a game of chess. In simulation and sandbox games the user sees the action and can get the perfect view on his or her virtual minions and lean in to see what they’re up to.

The technique is not limited to retrofitting current and previous games or even gaming use. For gaming, new uses can include dodging, lean-in techniques, and full screen HUDs movement (e.g., crisis helmet HUD). Non-gaming use can be used in digital displays such as picture frames or multiple monitor support. This technique can also be considered on a spectrum of virtual reality without using a bulky and expensive head-mounted display

For more such intel resources and tools from Intel on Game, please visit the Intel® Game Developer Zone