ISO/IEC MPEG and ITU-T VCEG have recently jointly issued a new multiview video compression standard, called 3D-HEVC, which reaches unpreceded compression performances for linear, dense camera arrangements. For instance, 80 full-HD views for so-called Super-MultiView (SMV) autostereoscopic 3D displays can be transmitted by 3D-HEVC at 15 to 60 Mbps, comparable to the bandwidth requirements of 4k/8k video. Novel SMV displays capable of displaying a couple of hundreds of full-HD views, that are already prototyped in R&D labs, would benefit from an additional two-fold compression gain. Transmitting depth maps along with coded video in a single 3D-HEVC stream and synthesizing additional output views using Depth Image Based Rendering (DIBR) techniques, opens opportunities for omitting some camera views for higher compression. However, high quality-bitrate penalties have been observed in applications where the multiview content is captured by an arc camera arrangement surrounding the scene, e.g. in sports events. Moreover, there is currently no out of the box technology that can provide high quality virtual views synthesized from relatively sparse, arbitrarily arranged cameras in Free Navigation (FN) for e.g. the Matrix bullet effect with only a dozen of cameras. The MPEG standardization committee has therefore issued a Call for Evidence in June 2015 [N15348], calling for improved compression technologies to support near-future SMV and FN applications. OBJECTIVE: The main objective is to improve view prediction/synthesis for better SMV compression performance when omitting/decimating some of the input views during transmission, as well as supporting FN functionalities in non-linear, sparse camera arrangements. Visually-pleasant DIBR view synthesis methods therefore require multi-camera depth estimation and inpainting approaches that are currently not supported in the MPEG reference software, which historically has mainly been confined to stereoscopic scene analysis/prediction/synthesis methods. METHOD: Multi-camera plane sweeping, epipolar plane image and inpainting techniques that coherently integrate all available camera information into a single data representation, drastically improve the visual coherence between successive virtual views. Moreover, Human Visual System (HVS) masking effects in spatio-temporally adjacent views provide a high degree of forgiveness in decimating the multi-camera input information, similar to what has been done in the TV pioneering era for inserting low-bandwidth chrominance data into the settled luminance spectrum bandwidth of B&W TV. RESULTS: While omitting some input views in the transmission chain and resynthesizing these views at the decoder represents a huge objective PSNR penalty (5 to 10 dB), limited subjective MOS impact has been observed with improved, non-linear multi-camera processing tools (color calibration, depth estimation and view synthesis), proper view decimation and Group of Views (GoV) data interleaving, cf. graph in attachment. NOVELTY: Continued work on [Jorissen2015] and [Dricot2015] with the inclusion of aforementioned tools, deep into the 3D-HEVC coding chain, provides substantial visual quality gains. New subjective quality metrics with stereoscopic and angular velocity viewpoint transition considerations - as opposed to a fixed viewpoint in traditional TV – give additional HVS masking, reaching higher MOS scores. Further validation on Holografika SMV displays with a more extensive set of dozens of video sequences is pursued. © 2016, Society for Imaging Science and Technology (IS&T). |