multi-perspective capture and volumetric scene reconstruction
Research project, interactive visualisation,
This project combines multi-perspective
stereoscopic video capture with volumetric scene reconstruction to demonstrate
a novel method of documenting dance. The process of deconstruction and subsequent
remodelling of the dancer's body in motion results in a fragmentation of the
body into discrete volumes that are visualised within a computer graphic application.
The fidelity and high level of detail in the video imagery is augmented and
completed with the 3D voxel representation. By doing so it is possible to bypass
the point-of-view restriction of traditional video/film recording, space and
linear time become variable properties and multi-dimensional visualisation becomes
reality. This process is utilised to create an abstract representation and depiction
of the dance performance in form of a real-time 3D interactive installation
and a filmic work.
Based on Double District by Saburo Teshigawara with Volker Kuchelmeister, 2008
Performer Saburo Teshigawara and Rihoko Sato
Co-produced by: Karas Tokyo, Epidemic (Paris, Berlin), Le Volcan Scène nationale,
Le Havre, UNSW iCinema Centre, Sydney, and kindly supported by Museum Victoria.
Video: Composite Voxel Visualisation with live action video (monoscopic
Video Volumetric model visualisation
Video Universal Playback application
The bases for this work are stereo/3D video recordings I captured during a
week long studio session with Saburo Teshigawara in 2008 for the video installation
Double District (Fig.1,2). The six-channel stereo video dance installation
is configured in the ReActor environment, an hexagonal projection environment
offering the audience a mobile and versatile platform for sophisticated artistic
and cultural manifestation and a physically immersive three-dimensional space
of representation that constitutes an augmentation and amalgamation of real
and virtual realities. It consists of six back projection, passive stereo
(linear polarization) screens. The audience can choose to move freely around
the hexagon to view individual screens or step back and watch up to three
screens simultaneously. Each screen displays the same scene from the dance
performance, specifically choreographed and recorded for this installation,
in time synchronicity but from a different perspective, analogous to the
architecture of the space within which it is projected.
Double District in ReActor. As a model (l) and at the eArts Festival Shaghai,
October 2008 (r)
The modality in which the dance performance was captured, mirrors the physical
configuration of the ReActor environment. Six evenly distributed stereo camera
pairs encircle a stage. This configuration allows the observer to view the
scene from multiple points-of-view, it constitutes multi-perspective capture.
Model of recording set-up (l) and in the studio (r)
Precise positioning and orientation of the camera heads is essential to recreate
a believable illusion of the physical space on screens. To strengthen the imitation
of real world perception on screen, the focal length of camera lenses where
chosen to reflect the natural field of view of the human eye.
The properties of a stereo image capture system are critical for the overall
quality, depth perception and the sense of reality a viewer perceives. The
relationship between inter-ocular distance, near and far plane, the range
of subject movement, focal length and position of the zero parallax plane,
all had to be defined. These parameters were generated in a theoretical mathematical
modelfirst and its values confirmed in an experimental set up. The subjective
qualities of the experiment results lead to a minor adjustment of some of
Stereographic video stills in anaglyphic format. The original format is separate
images for the left and right eye
Multi-perspective vs. Universal, Voxelization
This proposed method takes the concept of multi-perspective capture one step
further. It uses real-time 3D computer graphic to transform the multi-perspective
recording into a universal one. The performance can be observed from any
point-of-view, not only from the position of the cameras encircling the scene.
The number of cameras does not correlate with the number of possible viewpoints.
This is facilitated through volumetric geometry reconstruction of the dance
performance, a process named voxelization.
A frame of the video in comparison with the same frame and similar perspective
for the voxel representation
By geometric calibration of the twelve cameras intrinsic and extrinsic parameters
and employing computer vision and image processing algorithms, the parallel
and synchronized video streams of the scene are used to synthesize a voxel
(Volumetric Pixel) stream.
Close-up of a voxel model representing a dancers torso, head and arms
Voxels are points in 3D space with a volume attached to them. A larger number
of voxels (<5000) defines the geometry accurately enough to be able to recognize
elements in the scene and allows for visualisation. In this work, the scene
was synthesized with a voxel resolution of ~1.5 cm, represented by a cube of
this size as the smallest unit. Through averaging color values of the calibrated
video stream pixels, RGB color values could be extracted for every voxel.
Diagram of the voxel density across time and scene. The y-axis represents the
voxel count (x1000) and x-axis the frames in the video
The number of voxels or their density in the voxel space varies over time
and with the complexity of the scene. A solo performance does use a lot less
voxels then for instance a duet (Fig.9).
The original studio recordings were not lit to optimize voxel reconstruction,
but for artistic and cinematographic reasons alone. Lighting and the less the
ideal positioning of the cameras result in a relative low voxel count in some
scenes, causing a degradation in reproduction quality. For instance a leg is
not visible in voxel space due to the fact that is was not lit adequately.
A selection process was necessary to pick scenes from the performance whith
high enough voxel count (<5000 average). In Figure 9, only scenes 1,3 (solos)
and 4 (a duet) where kept as bases to work with for the prototype application.
Even then, there are still moments in the performance where the voxel model
deteriorates, but this has only limited relevance, the video and its parallel
voxel stream do refresh with 30 frames per second and human perception is capable
to reconstruct incomplete geometry in motion and make sense of the scene.
Ultimately a performance should be captured again, with a similar set-up for
the video cameras, but additional multiple infrared cameras, distributed around
the stage and pointing down from the ceiling. These cameras together with infrared
lighting would produce a much more accurate, in terms of resolution and volume,
voxel representation then only the video cameras. Both parallel lighting modes
(artistic with theatre lights and infrared) would not interfere with each other
due to different wavelength of the light.
A application capable of displaying multiple channels of video (the six multi-perspective
video streams) and simultaneously the 3D voxel representation was prototypical
developed in Quartz Composer (a node-based visual programming language, part
of the Xcode development environment in Mac OS X, based on Quartz and OpenGL).
Model of the scene with the six camera views and the voxel representation in
It does allow for navigation in the 3D scene of video and voxel model, keeps
track of the synchronicity of the video and the voxel stream and time control
functions (play, pause, previous/next frame). It snaps the virtual, by the
user controlled, camera in place if it gets close to the position of a real
video camera, so the perspective of the video image and voxel model is identical
and a seamless fade can be performed.
A list of parameters can be set during runtime: frames per second, point of
view, field of view, lighting of the scene and a range of other variables manipulating
the aesthetics of the scene and the voxel render style (Fig. 12). The prototype
does do all of this in real-time on a MacBook Pro and with a good frame rate,
the video resolution is 1024x768 pixel.
Different voxel render styles.
To be able to take advantage of the full video resolution (1400x1050) and
presenting the stereoscopic video and voxel model in stereo/3D (through a passive
stereo two projector set-up with polarized filters, glasses and a silver screen)
it will be necessary to upgrade to a high performance computer with high-end
graphic board and perhaps a different software development environment has
to be utilized.
Potential Installation and Interaction Modalities
The installation consists of a single stereo-3D projection screen, a console
with user interface and two to four sculptures of voxel models made with
a rapid prototyping 3D printer (~20 cm high).
The projection shows one of the six camera views full-screen in stereo/3D.
The three and a half minute long video segment does run in a loop. With the
interface, a visitor is able to change his perspective on the dance scene and
is either presented with the real video recording or the voxel representation.
The transition between the two modes is seamless due to identical positioning
of the virtual and real camera, time synchronicity and equivalent stereo perception
Installation model with passive stereo screen, user interface and voxel sculptures
To lessen the time a visitor needs to understand the interaction modalities
and his cognitive load, he has only limited freedom to interact with the 3D
Video geometry of the scene. A simple rotary controller with push button functionality
(Griffin Powermate) allows the user to rotate the scene 360 degree, by doing
so the scene will snap into place at the position of a real camera. Using the
push button will translate the gaze to a bird’s eye view.
Multi-perspective recording in combination with voxelization offers a universal
view on a scene. A viewer is not limited to one point-of-view or moment in
time, but he can explore and analyze the scene freely and without space or
time restrictions. The event is captured four-dimensional (x, y, depth and
time) through the stereoscopic video recording and in post-processing, three
additional dimensions are added (x, y, z of the voxel space).
The fidelity and high level of detail in the video imagery is augmented and
completed with the voxel representation. Both have different qualities and
these are clearly perceived by a viewer, but the fact that the scene is in
motion and everything runs in time and space synchronicity helps to get past
the gap in visual depiction.
The proposed method constitutes a novel way of recording and documenting motion.
It enables detailed analyzation after the event happened. Potential areas of
application are in performing arts, in professional sport and the Movie FX
industry. This method has the potential to evolve quickly with technological
advances. Cameras with higher resolution and depth sensor, better computer
vision algorithms and faster processors will eventually be able to create a
3D model with enough detail so video imagery is no longer needed, but for the
moment this method delivers universal view today.
SEAM Agency and Interaction, Somatic Embodiment, Agency & Mediation
in Digital Mediated Environments (Critical Path, University of Western Sydney).
Drill Hall, Rushcutter Bay, Sydney. Oct 2010.
Double District, 2008
Direction, choreography, lighting design and costumes: Saburo Teshigawara.
Developed with: Volker Kuchelmeister
Performed by: Saburo Teshigawara and Rihoko Sato
Production manager, technical director, stereoscopic cinematography, video
and audio post-production: Volker Kuchelmeister (iCinema)
Lighting design: Paul Nichola, Lighting technician: Rob Kelly (NIDA), Production
assistant: Sue Midgely (iCinema)
Producer: Richard Castelli (Epidemic)
Co-produced by: Karas, Tokyo, Epidemic (Paris, Berlin), Le Volcan Scène nationale,
Le Havre, UNSW University of New South Wales iCinema Centre, Sydney, and kindly
supported by Museum Victoria.
Voxel reconstruction: Anuraag Sridhar, UNSW School of Computer Science and
ReActor Hexagonal Sterescopic Projection Environment
Conceived by Jeffrey Shaw and Sarah Kenderdine
Diagramm of voxel density across time and scene.
Diffrenet voxel render styles
Video still in comparison with the voxel model
close up voxel model, rendered as cubes