我正在尝试将音频文件中的语音片段分成几部分并将它们合并在一起。源音频是 16Khz 和 16 位的单轨 PCM Wav 文件。
使用-ss -i input.wav -t output.wav制作的切片与使用-ss -i input.wav -t -acodec copy output.wav制作的切片的持续时间不同
确切地说:
- -ss 20.125 -i input.wav -t 10.125 output-reencode.wav 生成一个持续时间为 10s:125ms 的切片
- -ss 20.125 -i input.wav -t 10.125 -acodec copy output-copy.wav 生成一个持续时间为 10s:176ms 的切片
为什么会有51ms的差异呢?
是否可以肯定地说,由于不涉及 PCM 压缩,重新编码版本不会更慢?
C:\Users\someuser>ffmpeg.exe -ss 20.125 -i "C:\input.wav" -t 10.125 "C:\output-reencode.wav"
ffmpeg version 7.0-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 13.2.0 (Rev5, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
[aist#0:0/pcm_s16le @ 000001923d036b40] Guessed Channel Layout: mono
Input #0, wav, from 'C:\input.wav':
Metadata:
encoder : Lavf61.1.100
timecode : 00:00:00:00
Duration: 00:27:33.20, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Output #0, wav, to 'C:\output-reencode.wav':
Metadata:
ISMP : 00:00:00:00
ISFT : Lavf61.1.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc61.3.100 pcm_s16le
[out#0/wav @ 000001923d029e80] video:0KiB audio:316KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.030247%
size= 317KiB time=00:00:10.12 bitrate= 256.1kbits/s speed=1.25e+03x
C:\Users\someuser>ffmpeg.exe -ss 20.125 -i "C:\input.wav" -acodec copy -t 10.125 "C:\output-copy.wav"
ffmpeg version 7.0-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 13.2.0 (Rev5, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
[aist#0:0/pcm_s16le @ 000001ad5f9c9b80] Guessed Channel Layout: mono
Input #0, wav, from 'C:\input.wav':
Metadata:
encoder : Lavf61.1.100
timecode : 00:00:00:00
Duration: 00:27:33.20, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, wav, to 'C:\output-copy.wav':
Metadata:
ISMP : 00:00:00:00
ISFT : Lavf61.1.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Press [q] to stop, [?] for help
[out#0/wav @ 000001ad5f9c9d40] video:0KiB audio:318KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.030095%
size= 318KiB time=00:00:10.24 bitrate= 254.5kbits/s speed=2.76e+03x
PCM 音频流通常以帧为单位进行处理,每帧 1024 个样本。
当使用持续时间说明符重新编码时,ffmpeg 会根据需要截断最后一帧,以尽可能满足持续时间要求。当进行流复制时,它只会处理整个帧。
您的音频流为 16000 Hz,因此一帧测量值为 1024/16000 = 64ms。10.125s 跨越 158.203125 帧。当进行流复制时,第 159 帧将为全尺寸(1024 个样本),因此输出持续时间将为 159 * 64ms = 10176ms
当您复制音频流时,ffmpeg 将仅在关键帧上进行分割。由于关键帧不位于您开始和停止的特定时间戳上,因此剪切不精确。
当您重新编码流时,ffmpeg 不会关心现有的关键帧,因为它会通过编码器创建新的关键帧。
您应该能够使用类似以下方法查看媒体文件的关键帧时间戳
ffprobe -loglevel error -skip_frame nokey -select_streams v:0 -show_entries frame=pkt_pts_time -of csv=print_section=0 input.mp4
更多信息: