YouTube Audio Quality: An In-Depth Analysis
This blog post was automatically generated (and translated). It is based on the following original, which I selected for publication on this blog:
YouTube audio quality.
YouTube Audio Quality: An In-Depth Analysis
YouTube serves as a vast platform for audio distribution, prompting questions about the technical quality of its audio offerings. A discussion regarding the 'opus' audio codec versus the 'aac/mp4' alternative sparked an investigation into the audio quality provided by YouTube videos.
Methodology
The ideal method for assessing audio quality involves comparing the uploaded source audio with YouTube's output. The Ralph Vaughan-Williams Society (RVWSoc) provided source files alongside their YouTube uploads, enabling a direct comparison of audio quality and codec choices.
Focusing solely on the audio aspect, the analysis examined the variety of audio sample rate (ASR) options offered by YouTube. Notably, the presence of ASRs as low as 22.05kHz, limiting the frequency range to approximately 10kHz, raised concerns. To ensure a reliable test, the study concentrated on the higher, more conventional rates of 48kHz and 44.1kHz, while avoiding versions that had undergone sample rate conversion.
Analysis of 'Pan’s Anniversary'
An excerpt from the CD “Pan’s Anniversary,” originally encoded in the aac(LC) codec at 194 kb/s with a 48kHz sample rate, was compared with YouTube's offerings. YouTube provided both aac(LC) and opus encoded versions. The YT-251 version transcoded the input aac(LC) into the opus codec, while keeping the sample rate.
To assess the similarities between the source and YT-251 versions, sample-by-sample subtraction was performed after time-aligning the audio. The results indicated a residual 'error' typically around 20dB below the input musical level, which would traditionally be considered a very poor signal/noise ratio.
Further analysis using Crest Factor, which measures the peak-rms power level difference, revealed a different distribution in the YT-251 output compared to the source file. This suggests that the audio passing through YouTube's processing undergoes more than just a simple change in the signal level.
Spectral analysis showed that the YT-251 output essentially removed frequencies above 20kHz, replacing them with a high-frequency 'noise floor.' This indicates a loss of original details above this frequency, replaced by generated content, likely to suppress quantization distortion.
Analysis of 'Vaughan-Williams on Brass'
A source file with aac audio content at 44.1kHz was chosen to match the sample rate of YT-140, avoiding sample rate resampling effects. After sample-aligning the YT-140 output to the source, time-averaged spectra were compared, revealing an error level around -30 to -35dB, an improvement over the YT-251 example. However, the output's high-frequency range was cropped off at just under 16kHz, lower than the range present in the 44.1kHz aac input.
BBC iPlayer Comparison
To provide a comparative benchmark, the analysis extended to BBC iPlayer, which experimented with streaming BBC Radio 3 using flac at 48k ASR 320 kb/s aac alongside their standard iPlayer output formats. Examination of a “Record Review” program revealed that the error residual was lower relative to the source and output compared to the YouTube examples, particularly above a few kHz. The output also extended to higher frequencies, with the error level remaining low over a wider range of frequencies.
Conclusions and Further Questions
While the YT-140 example might be technically more accurate than the YT-251, this conclusion warrants further investigation with more examples. The changes in the applied high-frequency cut-off suggest that allowing the sample rate to be altered might yield different outcomes. The results raise questions about whether YouTube's audio quality is being limited by its choice of bitrates and codecs. This area merits further examination.
Is the convenience of accessing audio on YouTube overshadowing the potential for higher fidelity? Which compromises are acceptable in the balance between accessibility and audio quality?