Short teaser of the "Who Cares" music video production in Braunschweig, Germany.

And now the detailed description.

The storyboard showed the intended graffiti motif, the foreground performance and the camera movement.

In the beginning of pre-production, the storyboard was the first visual sketch and foundation of what lay ahead. The basic idea was to combine the foreground performance of Chris and Buddy with a graffiti lapse in the background. The camera movement was also roughly planned at this stage.

Set of "Who Cares", 3D blueprint (left) and actual set (right)

The set and camera rig were jointly built in an old assembly hall ("Montagehalle"). Eleven Canon XHA-1 HDV camcorders were tightly attached to the wooden rig. Additional DSLR cameras and Microsoft Kinects were placed on the rig to deliver some additional footage.

Sample of a (short) shot. Each shot was captured with 11 camcorders.

Several layers of green paint turned the set into a green screen studio. Eleven scenes of Buddy and Chris performing to "Who Cares" were shot during a single evening. Considering the eleven camcorders used, a total of 121 captures were needed. In post-production, Adobe After Effects and Keylight were used for chroma keying. After the last shot was taken, the set was painted dark grey and left to dry over night.

Several days of graffiti painting

Over the course of four days, Chris mesmerized us with his stunning graffiti skills. Several motives were sprayed on top of each other. The cameras were set to timelapse code and only saved one frame every 10 secs to disk. While some graffitis were performed in freestyle, others had to be carefully sprayed. They were meant to reveal their actual content only when viewed from a certain viewpoint.

Video preview of camera movement BEFORE Virtual Video Camera comes into action

To give a rough preview of the camera movement, an initial composite was done. It showed the timing of foreground and background as well as the camera that came closest to the desired virtual video camera movement.

Basic idea of Virtual Video Camera. From a set of cameras (white), in-between views (red) in both time and space are rendered.

The Virtual Video Camera allows to render images of a Virtual Camera (red) in between the original images (white). The Virtual Camera can be moved on the blue area.

Now it was time for the Virtual Video Camera to turn the imagery of eleven static cameras into video that seemed to be recorded by a single moving camera. We used state-of-the-art multi-view interpolation techniques to achieve visually plausible in-between images. One obvious benefit is that we can align the foreground and background captures, i.e., we can render the same camera movement for two completely different timelines.

Creating virtual camera paths with our spacetime editor.

Using both the storyboard and the rough composite as a guideline, we created the actual camera trajectory in our interactive spacetime editor. Every rendered image is a warped combination of the originally captured images. In order to compute these warps, dense pixel correspondences are needed. For a first preview, we use Flowlib to create an initial estimate on-the-fly.

After computing background correspondences, the camera trajectoy (right) for the graffiti timelapse can be rendered (left).

In order to compute more accurate pixel correspondences needed for our rendering, we constructed a geometric model of the background. We estimated the position, orientation and other properties of the cameras with bundler. We imported this data into blender and reconstructed the background geometry. Knowing the geometry and the camera properties, we could compute the position of every pixel in 3D space. From this we could derive both the depth of the pixel as well as its position in other images.

Correction of faulty correspondences with our interactive tool.

For the correspondences between the foreground frames we used our in-house software. Faulty correspondences were corrected with our interactive correspondence editor Flowlab . Finally, both background and foreground scenes were rendered and composited.

Depth-Image Based Rendering: A 2D image and a depth map are turned into a left and right eye view.

After the 2D version was finalized, we decided to convert the material also into stereoscopic 3D. We used Depth-Image Based Rendering (DIBR) to create a left and a right eye view from the original footage and depth images.

For more information on our Virtual Video Camera technique, visit our project website and read the in-detail technical papers, our Youtube channel, or sign up to study computer science at TU Braunschweig.

For the original "Who Cares?" track and countless more dubstep tunes visit the symbiz sound homepage.