The Bjøntegaard Bible Why Your Way of Comparing Video Codecs May Be Wrong.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Institute of Electrical and Electronics Engineers Country of Publication: United States NLM ID: 9886191 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1941-0042 (Electronic) Linking ISSN: 10577149 NLM ISO Abbreviation: IEEE Trans Image Process Subsets: PubMed not MEDLINE; MEDLINE
    • Publication Information:
      Original Publication: New York, NY : Institute of Electrical and Electronics Engineers, 1992-
    • Abstract:
      In this paper, we provide an in-depth assessment on the Bjøntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as classic perspective video, 360° video, point clouds, and screen content. As compression technology, we consider multiple hybrid video codecs as well as state-of-the-art neural network based compression methods. Using additional supporting points in-between standard points defined by parameters such as the quantization parameter, we assess the interpolation error of the Bjøntegaard-Delta (BD) calculus and its impact on the final BD value. From the analysis, we find that the BD calculus is most accurate in the standard application of rate-distortion comparisons with mean errors below 0.5 percentage points. For other applications and special cases, e.g., VMAF quality, energy considerations, or inter-codec comparisons, the errors are higher (up to 5 percentage points), but can be halved by using a higher number of supporting points. We finally come up with recommendations on how to use the BD calculus such that the validity of the resulting BD-values is maximized. Main recommendations are as follows: First, relative curve differences should be plotted and analyzed. Second, the logarithmic domain should be used for saturating metrics such as SSIM and VMAF. Third, BD values below a certain threshold indicated by the subset error should not be used to draw recommendations. Fourth, using two supporting points is sufficient to obtain rough performance estimates.
    • Publication Date:
      Date Created: 20240117 Latest Revision: 20240126
    • Publication Date:
      20240126
    • Accession Number:
      10.1109/TIP.2023.3346695
    • Accession Number:
      38231816