Whether you’re on desktop or mobile, if you’re on Facebook and see one of these new 360-degree videos published by media outlets, you can tap, click, or drag the video to change your viewing angle and see the action happening all around you.
But you may still be wondering: just how does this new 360-degree video work? Facebook answered that question with a post on the company’s Code website for software developers. In it, Facebook’s Evgeny Kuzyakov and David Pio explain how they went about creating the new video format for Facebook, the challenges they ran into, and how they overcame them.
“To create a 360 video, either you use a special set of cameras to record all 360 degrees of a scene simultaneously or you have to stitch together angles from, say, four GoPros on a stick.
We were excited to roll out 360 video for News Feed recently. With 360 video, you can choose which angle you want to view the video from. It’s like turning your head to look around a room — you control the perspective.
Under the hood: Building 360 degree video.
It’s an immersive addition to the ways that people can share and connect on Facebook, but it was a launch with an abundance of technical challenges to overcome. From mammoth incoming raw file sizes to warped, stitched-together imagery, 360 video is a generous playground for an engineer looking to solve a broad set of challenges. Here’s how we approached building this experience.
The problem with equirectangular layouts
The first thing we wanted to tackle was a drawback of the traditional layout used for 360 videos, called an equirectangular layout. The problem is that this layout can contain redundant information at either end. Think of it in terms of a map of the globe. Antarctica is really a circular landmass, not a drawn-out linear one. How you display the map affects how much extra Antarctica there is in the image.
We found our solution to this problem by remapping equirectangular projection layouts to cube maps.
Cube map solution
Cube map projection is a combination of six faces of the cube. Cube maps have been used in computer graphics for a long time, mostly to create skyboxes (six-sided cubes that are drawn behind all graphics) and reflections. There are a few benefits of using cube maps for videos:
- Cube maps don’t have geometry distortion within the faces. So each face looks exactly as if you were looking at the object head-on with a perspective camera, which warps or transforms an object and its surrounding area. This is important because video codecs assume motion vectors as straight lines. And that’s why it encodes better than with bended motions in equirectangular.
- Cube maps’ pixels are well-distributed — each face is equally important. There are no poles as in equirectangular projection, which contains redundant information.
- Cube maps are easier to project. Each face is mapped only on the corresponding face of the cube.
In order to implement our transformation from equirectangular layout to cube map, we created a custom video filter that uses multiple-point projection on every equirectangular pixel to calculate the appropriate value for every cube map pixel. That’s a lot of calculations. As an example, a 10-minute 360 video has 300 billion pixels that have to be mapped and stretched all over the cube.
- We do this by transforming the top 25 percent of the equirectangular video to one cube face and the bottom 25 percent to another cube face. The middle 50 percent is converted to four cube faces. Later, we realign cube faces and put them in two rows (but the actual order is not important). The cube map output contains the same information as the equirectangular input, but it contains 25 percent fewer pixels per frame, an efficiency that matters when you’re working at Facebook’s scale.
Simply put, to convert videos, for each frame, we:
- Put a sphere inside a cube.
- Wrap an equirectangular image of this frame on the sphere.
- Expand this sphere in every direction until it fills in the cube.
Dealing with bit rate, file size, and encoding
Let’s take a moment to think about how big these files can be. To create a 360 video, either you use a special set of cameras to record all 360 degrees of a scene simultaneously or you have to stitch together angles from, say, four GoPros on a stick. Incoming 360 video files are 4K and higher, at bit rates that can be over 50 Mb per second — that’s 22 GB per hour of footage. And 3D 360 Stereo videos are twice that — 44 GB for an hour of footage. We tried to do a few things when working with file size: We wanted to decrease the bit rate and save storage, but we wanted to do it quickly so people wouldn’t have to wait for the video, and we didn’t want to compromise the video quality or resolution.
Part of this work was accomplished in the custom video filter we applied; we also attacked it during the encoding process. Videos are sent to us in every format imaginable. Chopping a video up, processing it on multiple machines, and stitching it back together without any glitches or loss of audiovisual synchronization is tricky. This is conceptually simple but difficult in practice. In order to process large 360 videos in a reasonable length of time, we use distributed encoding to split the encoding process across many machines.
We use a dedicated tier of machines for this work and are lucky to be able to leverage Facebook’s extensible infrastructure to distribute the workload. We created a custom video filter to process the 360 videos because a typical video-processing job is very different from this workload. These processing jobs have a long duration and are CPU and memory intensive, but they aren’t very I/O intensive, so we optimized our task-loading and hardware for this work.