Home
Teaching
Portfolio & Projects
Media Video/Images/PDFs
Stereoscopic multi-perspective capture and volumetric scene reconstruction


Research project, interactive visualisation, 2009

This project combines multi-perspective stereoscopic video capture with volumetric scene reconstruction to demonstrate a novel method of documenting dance. The process of deconstruction and subsequent remodelling of the dancer's body in motion results in a fragmentation of the body into discrete volumes that are visualised within a computer graphic application. The fidelity and high level of detail in the video imagery is augmented and completed with the 3D voxel representation. By doing so it is possible to bypass the point-of-view restriction of traditional video/film recording, space and linear time become variable properties and multi-dimensional visualisation becomes reality. This process is utilised to create an abstract representation and depiction of the dance performance in form of a real-time 3D interactive installation and a filmic work.

Based on Double District by Saburo Teshigawara with Volker Kuchelmeister, 2008
Performer Saburo Teshigawara and Rihoko Sato
Co-produced by: Karas Tokyo, Epidemic (Paris, Berlin), Le Volcan Scène nationale, Le Havre, UNSW iCinema Centre, Sydney, and kindly supported by Museum Victoria.


Video: Composite Voxel Visualisation with live action video (monoscopic version)


Video Volumetric model visualisation


Video Universal Playback application


Highslide JS
Model of the scene with the six camera views and the voxel representation in the centre

Slideshow

 

Double District
The bases for this work are stereo/3D video recordings I captured during a week long studio session with Saburo Teshigawara in 2008 for the video installation Double District (Fig.1,2). The six-channel stereo video dance installation is configured in the ReActor environment, an hexagonal projection environment offering the audience a mobile and versatile platform for sophisticated artistic and cultural manifestation and a physically immersive three-dimensional space of representation that constitutes an augmentation and amalgamation of real and virtual realities. It consists of six back projection, passive stereo (linear polarization) screens. The audience can choose to move freely around the hexagon to view individual screens or step back and watch up to three screens simultaneously. Each screen displays the same scene from the dance performance, specifically choreographed and recorded for this installation, in time synchronicity but from a different perspective, analogous to the architecture of the space within which it is projected.


Double District in ReActor. As a model (l) and at the eArts Festival Shaghai, October 2008 (r)

Multi-perspective Capture
The modality in which the dance performance was captured, mirrors the physical configuration of the ReActor environment. Six evenly distributed stereo camera pairs encircle a stage. This configuration allows the observer to view the scene from multiple points-of-view, it constitutes multi-perspective capture.


Model of recording set-up (l) and in the studio (r)

Precise positioning and orientation of the camera heads is essential to recreate a believable illusion of the physical space on screens. To strengthen the imitation of real world perception on screen, the focal length of camera lenses where chosen to reflect the natural field of view of the human eye.


Multi-perspective scene

Stereographic imaging
The properties of a stereo image capture system are critical for the overall quality, depth perception and the sense of reality a viewer perceives. The relationship between inter-ocular distance, near and far plane, the range of subject movement, focal length and position of the zero parallax plane, all had to be defined. These parameters were generated in a theoretical mathematical modelfirst and its values confirmed in an experimental set up. The subjective qualities of the experiment results lead to a minor adjustment of some of the parameters.


Stereographic video stills in anaglyphic format. The original format is separate images for the left and right eye

Multi-perspective vs. Universal, Voxelization
This proposed method takes the concept of multi-perspective capture one step further. It uses real-time 3D computer graphic to transform the multi-perspective recording into a universal one. The performance can be observed from any point-of-view, not only from the position of the cameras encircling the scene. The number of cameras does not correlate with the number of possible viewpoints. This is facilitated through volumetric geometry reconstruction of the dance performance, a process named voxelization.


A frame of the video in comparison with the same frame and similar perspective for the voxel representation

By geometric calibration of the twelve cameras intrinsic and extrinsic parameters and employing computer vision and image processing algorithms, the parallel and synchronized video streams of the scene are used to synthesize a voxel (Volumetric Pixel) stream.


Close-up of a voxel model representing a dancers torso, head and arms

Voxels are points in 3D space with a volume attached to them. A larger number of voxels (<5000) defines the geometry accurately enough to be able to recognize elements in the scene and allows for visualisation. In this work, the scene was synthesized with a voxel resolution of ~1.5 cm, represented by a cube of this size as the smallest unit. Through averaging color values of the calibrated video stream pixels, RGB color values could be extracted for every voxel.


Diagram of the voxel density across time and scene. The y-axis represents the voxel count (x1000) and x-axis the frames in the video

The number of voxels or their density in the voxel space varies over time and with the complexity of the scene. A solo performance does use a lot less voxels then for instance a duet (Fig.9).
The original studio recordings were not lit to optimize voxel reconstruction, but for artistic and cinematographic reasons alone. Lighting and the less the ideal positioning of the cameras result in a relative low voxel count in some scenes, causing a degradation in reproduction quality. For instance a leg is not visible in voxel space due to the fact that is was not lit adequately. A selection process was necessary to pick scenes from the performance whith high enough voxel count (<5000 average). In Figure 9, only scenes 1,3 (solos) and 4 (a duet) where kept as bases to work with for the prototype application.
Even then, there are still moments in the performance where the voxel model deteriorates, but this has only limited relevance, the video and its parallel voxel stream do refresh with 30 frames per second and human perception is capable to reconstruct incomplete geometry in motion and make sense of the scene.

Ultimately a performance should be captured again, with a similar set-up for the video cameras, but additional multiple infrared cameras, distributed around the stage and pointing down from the ceiling. These cameras together with infrared lighting would produce a much more accurate, in terms of resolution and volume, voxel representation then only the video cameras. Both parallel lighting modes (artistic with theatre lights and infrared) would not interfere with each other due to different wavelength of the light.

Application
A application capable of displaying multiple channels of video (the six multi-perspective video streams) and simultaneously the 3D voxel representation was prototypical developed in Quartz Composer (a node-based visual programming language, part of the Xcode development environment in Mac OS X, based on Quartz and OpenGL).


Model of the scene with the six camera views and the voxel representation in the center

It does allow for navigation in the 3D scene of video and voxel model, keeps track of the synchronicity of the video and the voxel stream and time control functions (play, pause, previous/next frame). It snaps the virtual, by the user controlled, camera in place if it gets close to the position of a real video camera, so the perspective of the video image and voxel model is identical and a seamless fade can be performed.
A list of parameters can be set during runtime: frames per second, point of view, field of view, lighting of the scene and a range of other variables manipulating the aesthetics of the scene and the voxel render style (Fig. 12). The prototype does do all of this in real-time on a MacBook Pro and with a good frame rate, the video resolution is 1024x768 pixel.


Different voxel render styles.

To be able to take advantage of the full video resolution (1400x1050) and presenting the stereoscopic video and voxel model in stereo/3D (through a passive stereo two projector set-up with polarized filters, glasses and a silver screen) it will be necessary to upgrade to a high performance computer with high-end graphic board and perhaps a different software development environment has to be utilized.

Potential Installation and Interaction Modalities
The installation consists of a single stereo-3D projection screen, a console with user interface and two to four sculptures of voxel models made with a rapid prototyping 3D printer (~20 cm high).
The projection shows one of the six camera views full-screen in stereo/3D. The three and a half minute long video segment does run in a loop. With the interface, a visitor is able to change his perspective on the dance scene and is either presented with the real video recording or the voxel representation. The transition between the two modes is seamless due to identical positioning of the virtual and real camera, time synchronicity and equivalent stereo perception parameters.


Installation model with passive stereo screen, user interface and voxel sculptures on shelves

To lessen the time a visitor needs to understand the interaction modalities and his cognitive load, he has only limited freedom to interact with the 3D Video geometry of the scene. A simple rotary controller with push button functionality (Griffin Powermate) allows the user to rotate the scene 360 degree, by doing so the scene will snap into place at the position of a real camera. Using the push button will translate the gaze to a bird’s eye view.

Conclusion
Multi-perspective recording in combination with voxelization offers a universal view on a scene. A viewer is not limited to one point-of-view or moment in time, but he can explore and analyze the scene freely and without space or time restrictions. The event is captured four-dimensional (x, y, depth and time) through the stereoscopic video recording and in post-processing, three additional dimensions are added (x, y, z of the voxel space).
The fidelity and high level of detail in the video imagery is augmented and completed with the voxel representation. Both have different qualities and these are clearly perceived by a viewer, but the fact that the scene is in motion and everything runs in time and space synchronicity helps to get past the gap in visual depiction.
The proposed method constitutes a novel way of recording and documenting motion. It enables detailed analyzation after the event happened. Potential areas of application are in performing arts, in professional sport and the Movie FX industry. This method has the potential to evolve quickly with technological advances. Cameras with higher resolution and depth sensor, better computer vision algorithms and faster processors will eventually be able to create a 3D model with enough detail so video imagery is no longer needed, but for the moment this method delivers universal view today.


Exhibitions:

SEAM Agency and Interaction, Somatic Embodiment, Agency & Mediation in Digital Mediated Environments (Critical Path, University of Western Sydney). Drill Hall, Rushcutter Bay, Sydney. Oct 2010.


Acknowledgements

Double District, 2008
Direction, choreography, lighting design and costumes: Saburo Teshigawara.
Developed with: Volker Kuchelmeister
Performed by: Saburo Teshigawara and Rihoko Sato
Production manager, technical director, stereoscopic cinematography, video and audio post-production: Volker Kuchelmeister (iCinema)
Lighting design: Paul Nichola, Lighting technician: Rob Kelly (NIDA), Production assistant: Sue Midgely (iCinema)

Producer: Richard Castelli (Epidemic)
Co-produced by: Karas, Tokyo, Epidemic (Paris, Berlin), Le Volcan Scène nationale, Le Havre, UNSW University of New South Wales iCinema Centre, Sydney, and kindly supported by Museum Victoria.
Voxel reconstruction: Anuraag Sridhar, UNSW School of Computer Science and Engineering
ReActor Hexagonal Sterescopic Projection Environment
Conceived by Jeffrey Shaw and Sarah Kenderdine

Diagramm of voxel density across time and scene.
Diffrenet voxel render styles
Video still in comparison with the voxel model
video still
voxel model
video still
voxel model
close up voxel model, rendered as cubes
voxel model
voxel model
voxel model
voxel model