Multiperspective imaging - Computer Graphics and Applications, IEEE Our eyes have evolved with perspective optics.Because of this, perspective images seem some- what natural to our eyes; they’re well tailored for human vision. In a perspective image, the objects close to us appear large and in detail, yet we enjoy sweeping wide-range views of distant scenery. Cameras have also evolved with perspective optics. It’s natural for the optics of cameras to mimic the human eye—after all, a camera’s primary function is to produce images that humans can interpret and enjoy. However, our perspective has some unfortunate shortcomings. In particular, our eyes have a limited field of view, and we can only see the world in front of us. Ideally, we could see in all directions at once. Additionally, we can only see one side of an object at a time—for example, the front or the back. But suppose you could see all sides at the same time? In the last several years, some researchers (including ourselves) have investigated techniques that capture mul- tiple perspectives into a single image—a problem known as multiperspective imaging. Multiperspective images are useful for several reasons. The ability to capture a panoramic field of view or both the front and back of an object leads to richer and more complete visualizations. At the same time, these images are well suited for pro- cessing in computer vision problems such as stereo recon- struction and motion analysis. This article presents an overview of our work in this area, and our view of multi- perspective imaging in general. References to additional research are available at http://grail.cs.washington. edu/projects/stereo/cga.htm and elsewhere.1 Beginnings Multiperspective imaging has a long and interesting background. Indeed, before the Italian Renaissance, vir- tually all paintings were multiperspective. Purposefully bending the laws of perspective is a common theme in modern art as well, for instance in the work of Picasso and Cezanne. A particularly striking example is M.C. Escher’s Print Gallery (see http://escherdroste.math. leidenuniv.nl/). Outside of art, multiperspective projections are com- mon in cartography and in aerial and satellite-sensing applications. You can find a fascinating range of multi- perspective optics in biological systems; perhaps the best-known example is the common house fly’s com- pound eye. Studying these biological systems has inspired man-made devices, including a cosmic ray detector known as “The Fly’s Eye” (see http://www. cosmic-ray.org/reading/flyseye.html). Plenoptic function An image captures light emanating from a scene in certain directions—that is, along a distribution of light rays. We may characterize an image based on which dis- tribution of light rays it captures. In particular, a per- spective image captures only the light in the scene that hits the focal point, as Figure 1 shows. Other light ray distributions give rise to multiper- spective images. Generally, we can define an image to be any 2D distribution of rays in space. A 5D function known as the plenoptic function p(x, y, z, θ, φ) describes the set of all light rays. This function specifies each ray’s origin (x, y, z) and direction (θ, φ).2 The light along each ray is defined by additional parameters of wavelength λ and the time t at which point the light was sensed. The plenoptic function provides a mathematical framework for categorizing different varieties of images. In partic- ular, we can represent any image as a 2D subset, or slice, of the plenoptic function. Path images How do you actually produce multiperspective images? Unlike perspective images captured with con- ventional cameras, producing multiperspective images requires specialized optical devices, arrays of conven- tional cameras, or moving cameras in special ways. The easiest way to capture multiperspective images Steven M. Seitz and Jiwon Kim University of Washington Multiperspective Imaging____________________________ Projects in VR Editors: Lawrence Rosenblum and Michael Macedonia 16 November/December 2003 Published by the IEEE Computer Society 0272-1716/03/$17.00 © 2003 IEEE (a) (b) 1 Set of rays corresponding to (a) a perspective image and (b) a multiperspective image. is to move a regular video camera along a path and assemble the resulting image sequence into an x-y-t block of pixel data. The resulting pixel data is known as the spatiotemporal volume, or simply, the video cube. Once assembled, you can slice the video cube to pro- duce different types of multiperspective images, as Figure 2 shows. We call these slices path images. As a concrete example, Figure 3 shows a video cube created by pointing a camcorder out a car window and driving slowly down a residential street. The cube’s left face is the last image of the input sequence, an x-y slice with a constant value of t. The video cube’s top face is an x-t slice, corresponding to y = 1. This image contains the first row of all of the input images, stacked one on top of the next. We gen- erally refer to this as an epipolar plane image (EPI) in computer vision literature. Each scene point traces out a linear path in the EPI. Furthermore, the line’s slope is proportional to scene depth, a useful property for image analysis. Notice the cube’s front face, a y-t slice containing the last column of all of the input images. This image—known as a pushbroom image—provides a panoramic view of the street. Although it looks similar to a perspective image, each column of a pushbroom image is acquired from a different point along the camera’s trajectory. It therefore depicts a continuum of camera viewpoints. We can create a pushbroom image from any column of the image—that is, any y-t slice. We can achieve an inter- esting effect by viewing all the y-t slice images as a movie sequence, in order of increasing x. The street scene appears to rotate in place from left to right. To see this movie, visit http://grail.cs.washington.edu/projects/ stereo/cga.htm. Pushbroom images yield superior visualizations of streets, landscapes, and other long linear scenes. We can produce different types of video cubes and multiperspective images by moving a camera on a curved path instead of a line. For example, consider moving a camera in a circle around an object of interest, with the camera facing in toward the center of the circle. If the image is assembled into a video cube, y-t slices capture an inward-facing panorama of the object or scene within the circle. Archeologists sometimes use these images (known as cyclographs) to create unwrapped views of ancient pot- tery. Traditional cyclographs are produced by pho- tographing a rotating object through a narrow slit placed in front of a length of moving film—a technique that dates back to the late 19th century. We can simu- late the same effect with a regular video camera, as we show in the “Multiperspective stereo” section. Multiperspective stereo Sometimes we can view two perspective images with the right characteristics stereoscopically. Our brain fuses the two images to produce a sensation of depth. Interestingly, the same is true for certain types of mul- tiperspective images. For example, any two pushbroom images created from different y-t slices of the same video cube may be fused stereoscopically. Figure 4a (next page) shows an example of a stereo pushbroom image created in this manner from a longer version of the same sequence shown in Figure 3. It’s displayed as an anaglyph, viewable using red-blue glasses. Stereo images may also be created by moving a cam- era on a circle instead of a line. If the camera is facing outward, the resulting images are often referred to as stereo panoramas. If the camera is facing inward, the results are stereo cyclographs. Figure 4 shows a stereo cyclograph anaglyph image created by moving a cam- era on a rotary arm around a person’s head. Also shown is a stereo cyclograph of a toy horse, generated by rotat- ing the horse on a turntable, in front of a stationary video camera. Note that the head and horse stereo cyclo- graphs let you see both the front and back of the subject in the same image. We can usually generate a stereo pair by moving a camera along any conic path—for example, a line, cir- cle, ellipse, hyperbola, or parabola. For more informa- IEEE Computer Graphics and Applications 17 y x t (a) (b) (c) 2 (a) A camera moves along a path and captures light rays. (b) Stacking the images one on top of another yields (c) an x-y-t video cube. Each slice of the video cube produces a path image and represents a subset of the captured light rays (shown figuratively in red). 3 Video cube captured by driving a car down a residential street with a camera pointed out the window. tion on multiperspective stereo images and how to cre- ate them, see our related article.1 Beyond their use for 3D visualization, stereo images also enable 3D measurement and reconstruction using computer vision algorithms. Traditional stereo match- ing algorithms operate on perspective images. However, we can easily adapt and apply the same tech- niques for multiperspective stereo pairs. Figure 5 shows a texture-mapped mesh model reconstructed from the horse stereo pair in Figure 4. Observe how the front, back, and both sides of the horse are recon- structed from a single stereo pair—a capability not pos- sible with perspective images. The top-down view (see Figure 5c) is hollow, since the top of the horse wasn’t visible. Looking ahead and all around So far, we’ve only considered axis-aligned planar slices of the video cube—that is, x-y, y-t, or x-t slices. To see the effects of other planar slices, we recommend down- loading the video cube application (available at http://research.microsoft.com/downloads/VideoCube/ VideoCube.asp). The application lets you view any video as an x-y-t cube and slice it interactively. Nonplanar slices enable other visualization types. We’ve developed an interactive tool that lets users spec- ify any vertical (composed of columns from the input images) video cube slice and display the result as a multiperspective image. Users specify slices through two mechanisms. The first option is to draw a curve in the x-t plane, specifying what the slice looks like from the Projects in VR 18 November/December 2003 4 (a) Stereo pushbroom of a residential street. Stereo cyclographs of (b) a human head and (c) a toy horse. All are 3D viewable with red-blue glasses. (a) (b) (c) 5 Renderings of a 3D model reconstructed from the horse cyclograph stereo pair in Figure 4: (a) front, (b) back, and (c) top-down view. (a) (b) (c) (a) (b) (c) 6 (a) Multi- perspective view showing three aisles of a supermarket at once. (b) Strip image of a train, horizontally compressed to fit on this page. (c) Expanded view of four train cars. top down. The second option is to click on regions from a set of input images that should be included in the panorama. The tool interpolates these samples via an optimization procedure to produce a smooth slice through the video cube. Figure 6a shows an image of a supermarket (created using this tool) in which the con- tents of three aisles are visible at once. We captured the input sequence by mounting the camera on a shopping cart and rolling it in a straight line in front of the aisles. We can also apply these techniques to moving scenes. For example, Figure 6b shows a pushbroom-like image of a moving train, captured from a stationary video cam- era. David Dewey created this image by taking a narrow vertical strip from the center of each image and com- positing them. Because the scene background doesn’t move, it’s repeated in each image and gives rise to the texture pattern seen in Figure 6c’s background. There’s still much room for improvement and growth in the area of multiperspective imaging. While it’s bet- ter suited than perspective images for stereo processing, problems still exist. For example, the best way to effi- ciently capture multiperspective images using special- ly designed sensors or arrays of cameras is under debate and remains an important and active topic of research. The images shown in this article are only examples and not representative of the full range of image vari- eties. Researchers are still investigating the range of images we can create as well as identifying their practi- cal uses. We believe that multiperspective images will have promising applications to a wide range of computer vision and visualization problems. � Acknowledgments Kiera Henning and David Salesin helped us develop the tool used to create the supermarket image shown in Figure 6a. We thank David Dewey for providing the train images shown in Figures 6b and 6c. References 1. S.M. Seitz and J. Kim, “The Space of All Stereo Images,” Int’l J. Computer Vision, vol. 48, no. 1, 2002, pp. 21-38. 2. E.H. Adelson and E.H. Bergen, “The Plenoptic Function and the Elements of Early Vision,” Computation Models of Visual Processing, M. Landy and J.A. Movshon eds., MIT Press, 1991. Readers may contact Steven M. Seitz and Jiwon Kim at {seitz, jwkim}@cs.washington.edu. Readers may contact the department editors by email at rosenblu@ait.nrl.navy.mil or michael_macedonia@ stricom.army.mil. IEEE Computer Graphics and Applications 19 Coming in 2004… A Free CD-ROM with Your CG&A January/February issue on Emerging Technologies IEEE AN D A PPL IC AT IO N S The January/February 2004 special issue covers the Siggraph 2003 Emerging Technologies Exhibit, where the graphics community demonstrates innovative approaches to interactivity in robotics, graphics, music, audio, displays, haptics, sensors, gaming, the Web, artificial intelligence, visualization, collaborative environments, and entertainment. The bonus CD will feature peer- reviewed, interactive demos from the exhibit. Index: CCC: 0-7803-5957-7/00/$10.00 © 2000 IEEE ccc: 0-7803-5957-7/00/$10.00 © 2000 IEEE cce: 0-7803-5957-7/00/$10.00 © 2000 IEEE index: INDEX: ind: