Based on Double District by Saburo Teshigawara with Volker Kuchelmeister, 2008
This project combines multi-perspective stereoscopic video capture with volumetric scene reconstruction to demonstrate a novel method of documenting dance. The process of deconstruction and subsequent remodelling of the dancer’s body in motion results in a fragmentation of the body into discrete volumes that are visualised within a computer graphic application. The fidelity and high level of detail in the video imagery is augmented and completed with the 3D voxel representation. By doing so it is possible to bypass the point-of-view restriction of traditional video/film recording, space and linear time become variable properties and multi-dimensional visualisation becomes reality. This process is utilised to create an abstract representation and depiction of the dance performance in form of a real-time 3D interactive installation and a filmic work.
This proposed method takes the concept of multi-perspective capture one step further. It uses real-time 3D computer graphic to transform the multi-perspective recording into a universal one. The performance can be observed from any point-of-view, not only from the position of the cameras encircling the scene. The number of cameras does not correlate with the number of possible viewpoints. This is facilitated through volumetric geometry reconstruction of the dance performance, a process named voxelization.
By geometric calibration of the twelve cameras intrinsic and extrinsic parameters and employing computer vision and image processing algorithms, the parallel and synchronized video streams of the scene are used to synthesize a voxel (Volumetric Pixel) stream.
Voxels are points in 3D space with a volume attached to them. A larger number of voxels (<5000) defines the geometry accurately enough to be able to recognize elements in the scene and allows for visualisation. In this work, the scene was synthesized with a voxel resolution of ~1.5 cm, represented by a cube of this size as the smallest unit. Through averaging color values of the calibrated video stream pixels, RGB color values could be extracted for every voxel.
The number of voxels or their density in the voxel space varies over time and with the complexity of the scene. A solo performance does use a lot less voxels then for instance a duet (Fig.9).
The original studio recordings were not lit to optimize voxel reconstruction, but for artistic and cinematographic reasons alone. Lighting and the less the ideal positioning of the cameras result in a relative low voxel count in some scenes, causing a degradation in reproduction quality. For instance a leg is not visible in voxel space due to the fact that is was not lit adequately. A selection process was necessary to pick scenes from the performance whith high enough voxel count (<5000 average). In Figure 9, only scenes 1,3 (solos) and 4 (a duet) where kept as bases to work with for the prototype application.
Even then, there are still moments in the performance where the voxel model deteriorates, but this has only limited relevance, the video and its parallel voxel stream do refresh with 30 frames per second and human perception is capable to reconstruct incomplete geometry in motion and make sense of the scene.
Ultimately a performance should be captured again, with a similar set-up for the video cameras, but additional multiple infrared cameras, distributed around the stage and pointing down from the ceiling. These cameras together with infrared lighting would produce a much more accurate, in terms of resolution and volume, voxel representation then only the video cameras. Both parallel lighting modes (artistic with theatre lights and infrared) would not interfere with each other due to different wavelength of the light.
A application capable of displaying multiple channels of video (the six multi-perspective video streams) and simultaneously the 3D voxel representation was prototypical developed in Quartz Composer (a node-based visual programming language, part of the Xcode development environment in Mac OS X, based on Quartz and OpenGL).
It does allow for navigation in the 3D scene of video and voxel model, keeps track of the synchronicity of the video and the voxel stream and time control functions (play, pause, previous/next frame). It snaps the virtual, by the user controlled, camera in place if it gets close to the position of a real video camera, so the perspective of the video image and voxel model is identical and a seamless fade can be performed.
A list of parameters can be set during runtime: frames per second, point of view, field of view, lighting of the scene and a range of other variables manipulating the aesthetics of the scene and the voxel render style (Fig. 12). The prototype does do all of this in real-time on a MacBook Pro and with a good frame rate, the video resolution is 1024×768 pixel.