Thursday, March 31, 2022

FFmpeg - Concatenating Videos with Blend Transition


 

I received an anonymous comment on a past post FFMpeg Blending specifically asking how two video files could be concatenated with a blending effect transition.  

Note: anonymous commenter/reader, if you find yourself returning and find this post useful please drop a quick comment so I know the effort in addressing your question wasn't a lost. Thanks!

In past posts I feel we've covered most of the necessary filtering examples as well as concatenation of videos, it's a not-so-simple matter of putting them all together.  Let's start with constraints/complications that can bite you.


Constraints/Complications

Input Files

If your input files come from a uniform source (e.g. your phone, video camera, snippets from a larger video file) you may not have to worry about these factors.  For this example however, we will grab a couple videos from YouTube from different sources so we will have to address some issues.  

In general, your input files need to be similarly formatted.  Similar frame rates, video resolutions and audio tracks.  I've wasted more hours than I care to mention forgetting to take care in this step, let that be a word of caution to emphasize take a few minutes to check this beforehand to avoid making the mistakes I've made.  Best-case, if your formats don't match you get slapped in the face with an error from FFmpeg, worse case it succeeds in delivering a video that looks like garbage leaving you with an Easter Egg Hunt in finding the issue.  When in doubt, enforce some standard format prior to doing anything more sophisticated.

Let's start with video resolution; your input files (for concatenation and blends) need to be identically sized, not kinda similarly sized, but identically sized.  Source files that have different aspect ratios are particularly problematic because resizing them may not give you the required sizing leaving you to crop/pad accordingly.  

A consistent frame rate (FPS) is necessary for time-based filters, like the blending, make sure your frame rate (FPS) is identical for both input videos, otherwise you'll encounter unexpected results that may resemble this;

A consistent auto track existence; just shy of a bazillion times I've attempted to concatenate two videos only to find my audio tracks absent/stalling.  For example, applying an image snippet before/after a video results in stalling or absent audio until I relearn that I need to apply an empty audio track to the image video, otherwise concatenation doesn't know how to properly handle concatenation of one video with audio and another without.  All your source media for concatenation should all have an audio track or none have an audio tracks.  Mixing and matching will only give you headaches in the end.


Example

Let's move on to our example.

Let's snag two videos from YouTube

$ youtube-dl -f mp4 -o yt-stingray.mp4 https://www.youtube.com/watch?v=aXTk9VPZ4Gg


$  youtube-dl -f mp4 -o $@ https://www.youtube.com/watch?v=_oHpWw7L3d4

Examining these files, you'll find they are aren't uniform in resolution or framerate.
$ ffprobe -i yt-stingray.mp4 
ffprobe version 4.2.2 Copyright (c) 2007-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --enable-libx264 --enable-nonfree --enable-gpl --enable-libfreetype --enable-libmp3lame --enable-libzmq
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'yt-stingray.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2019-05-22T15:56:25.000000Z
  Duration: 00:47:09.44, start: 0.000000, bitrate: 397 kb/s
    Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(tv, smpte170m), 480x360 [SAR 1:1 DAR 4:3], 299 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      creation_time   : 2019-05-22T15:56:25.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 05/22/2019.
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-05-22T15:56:25.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 05/22/2019.

$ ffprobe -i yt-hardcastle01.mp4 
ffprobe version 4.2.2 Copyright (c) 2007-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --enable-libx264 --enable-nonfree --enable-gpl --enable-libfreetype --enable-libmp3lame --enable-libzmq
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'yt-hardcastle01.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Google
  Duration: 00:01:48.62, start: 0.000000, bitrate: 1362 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 1232 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.

Concatenation 

Even with differing video resolutions and framerates, you can find reasonable success with simple concatenating of the video files.  FFmpeg will adopt the video resolution and framerate of the 1st input file and apply it throughout the output file.  Swapping the order of the above files results in the end resolution of the output file.
$ cat videoXX.mp4.txt
file 'yt-stingray.mp4'
file 'yt-hardcastle01.mp4'

$ ffmpeg -y -f concat -i videoXX.mp4.txt videoXX.mp4

This concatenation order results in 480x360 23.98 fps, while reversing the file order results in 1280x720 29.97 fps; often leaving you scratch your head.

Sometimes this will give you the result you're looking for, other times it may not.  I find it's in my best interest to do the scaling prior to concatenation, cropping and/or padding as I prefer.  Additionally, becoming overly reliant on FFmpeg 'doing it for you' avoids understanding what's going on only to bite you in the butt when you try something more complicated (e.g. like blending).


Blending

The blending effect we will be utilizing is a pixel-wide operations, one-by-one each frame is generated by 'blending' the corresponding pixels from each input file into the pixel in the destination frame.  If the two input media aren't identically scaled the blending falls apart because it lacks corresponding pixels.  FFmpeg errors out with an error resembling this;

[Parsed_amerge_0 @ 0x2e8b1c0] No channel layout for input 1
[Parsed_amerge_0 @ 0x2e8b1c0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[Parsed_blend_2 @ 0x2dc7c80] First input link top parameters (size 480x360) do not match the corresponding second input link bottom parameters (size 1280x720)
[Parsed_blend_2 @ 0x2dc7c80] Failed to configure output pad on Parsed_blend_2
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #1:0
Conversion failed!

This is FFmpegs way of saying that your input videos aren't uniformly sized.

Scaling input files with a variety of aspect ratios is always challenging, I find for me scaling to a specific video size w/optional padding seems to work well for me.

Let's scale our input videos to 1280x720, applying padding if necessary, and force a uniform 30/1 frame rate;

$ ffmpeg -i yt-stingray.mp4 -vf "scale=-1:720,pad=1280:ih:(ow-iw)/2" -r 30/1-strict -2 yt-stingray-scaled.mp4
$ ffmpeg -i yt-hardcastle.mp4 -vf "scale=-1:720,pad=1280:ih:(ow-iw)/2" -r 30/1 -strict -2 yt-hardcastle-scaled.mp4

We are left with uniformly 1280x720 30/1 fps videos that can be blended.

Before we demonstrate the use of the blending filter, simple blending 'generally' begins being applied right away, not quite what we want if we want to play the first video until near completion, then blend into the next video.  While you absolutely can do it in a single sequence of commands, it would require adjusting the 1st video presentation time stamp (PTS) to start immediately, adjust the 2nd video PTS to nearly the duration of video1 (play this, then this) and then apply blending at the end of video1.  Personally, that's a recipe for a migraine, so an alternative is to break the two videos into segments and stitch them back together, so that's the tactic we'll use in our example.

Since these video source files are so large, let's shrink them down a bit for our example.  Let's grab 10 sec intervals to work with as they'll be processed quicker and a bit easier to understand.

$ ffmpeg -i yt-stingray-scaled.mp4 -ss 4 -t 10 -strict -2 clip01-scaled.mp4
$ ffmpeg -i yt-hardcastle-scaled.mp4 -ss 4 -t 10 -strict -2 clip02-scaled.mp4

So, we are looking for an end effect of playing the first 8 seconds of clip01-scaled.mp4, then apply a 2-sec blending transition, then continue to play video clip02-scaled.mp4.  That's our bogey.

We can achieve this by splitting the videos into 4 segments;
  • segment1.mp4 : first 8 seconds of clip01-scaled.mp4
  • segment2.mp4 : last 2 seconds of clip01-scaled.mp4
  • segment3.mp4 : first 2 seconds of clip02-scaled.mp4
  • segment4.mp4 : last 8 seconds of clip02-scaled.mp4
With these 4 segments, we will apply the blending transition to segment2 & segment3, then restitch the blended clip between segment1 and segment4.  That's the plan.  Let's walk thru that.

$ ffmpeg -i clip01-scaled.mp4 -t 8.032000 -target ntsc-dvd -strict -2 segment1.mp4
$ ffmpeg -i clip01-scaled.mp4 -ss 8.032000 -target ntsc-dvd -strict -2 segment2.mp4
$ ffmpeg -i clip02-scaled.mp4 -ss 0 -t 2 -target ntsc-dvd -strict -2 segment3.mp4
$ ffmpeg -i clip02-scaled.mp4 -ss 2 -target ntsc-dvd -strict -2 segment4.mp4

Let's blend segment2 and segment3; at the beginning of the output file we are attempting to utilize video1 frame blend factor of 100%, video2 frame of 0%, at the end of the clip reversing to 0% video1, 100% video2.  A time-based linear progression over the duration of the video clip.

$ ffmpeg -i segment2.mp4 -i segment3.mp4 -filter_complex "[0:v]setpts=PTS-STARTPTS[v0],[1:v]setpts=PTS-STARTPTS[v1],[v0][v1]blend=all_expr='A*(1-(T/2))+B*(T/2)'" -filter_complex "amerge=inputs=2" -ac 2 -shortest -target ntsc-dvd segment2a.mp4

It's possible that the input files have a non-zero PTS start time, which is later used (as time variable T), so we're forcing both video file PTS values to start at zero.  That way, the timestamp from both videos is uniform.  The blend filter A*factor1+B*factor2, each factor in the range of [0,1].  The amerge command similarly attempts to blend both audio tracks together.

With a clip duration of 2 seconds, T/2 is a linear progression from 0 to 1 working well for our 2nd video blending factor.  Wishing a decreasing blending factor for the 1st video (1-(T/2)) works.  Pay specific attention to the video duration component (e.g. 2), it needs to align with the length of the videos, otherwise the blending factors will exceed [0,1] and give you an effect that can only compare to dropping acid.  The ''-shortest' option may, or may not, be necessary, your mileage may vary.  I've added because without it sometimes I observe a slight delay in the re-assembled video.

Reassembling Video Clips

Let's reassemble our intro, outtro and blended clips into our final video.

$ cat video.mp4.txt 
file 'segment1.mp4'
file 'segment2a.mp4'
file 'segment4.mp4'

$ ffmpeg -y -f concat -i video.mp4.txt video.mp4

We're left with our final video.




The full Makefile, for those interested in re-creating a little less manually and may ease applying to your own source files.  Notice the ffprobe commands are used to automate the extraction of video durations, the TransitionDuration=2 implies a 2-second blending duration between the two videos.

$ cat -n Makefile 
     1 TransitionDuration=2
     2 all: video.mp4
     3
     4 segment1.mp4 : clip01-scaled.mp4
     5 ${SH} ffmpeg -i $< -t $(shell echo $(shell ffprobe -loglevel quiet -show_format $< -show_entries format=duration | grep duration | cut -f 2 -d '=') -${TransitionDuration} | bc) -target ntsc-dvd -strict -2 $@
     6
     7 segment2.mp4 : clip01-scaled.mp4
     8 ${SH} ffmpeg -i $< -ss $(shell echo $(shell ffprobe -loglevel quiet -show_format $< -show_entries format=duration | grep duration | cut -f 2 -d '=') -${TransitionDuration} | bc) -target ntsc-dvd -strict -2 $@
     9
    10 segment3.mp4 : clip02-scaled.mp4
    11 ${SH} ffmpeg -i $< -ss 0 -t $(shell echo ${TransitionDuration} -1 | bc) -target ntsc-dvd -strict -2 $@
    12
    13 segment4.mp4 : clip02-scaled.mp4
    14 ${SH} ffmpeg -i $< -ss ${TransitionDuration} -target ntsc-dvd -strict -2 $@
    15
    16 segment2a.mp4 : segment2.mp4 segment3.mp4
    17 ${SH} ffmpeg -i $(shell echo $^ | cut -f 1 -d ' ') -i $(shell echo $^ | cut -f 2 -d ' ') -filter_complex "[0:v]setpts=PTS-STARTPTS[v0],[1:v]setpts=PTS-STARTPTS[v1],[v0][v1]blend=all_expr='A*(1-(T/${TransitionDuration}))+B*(T/${TransitionDuration})'" -filter_complex "amerge=inputs=2" -ac 2 -shortest -target ntsc-dvd $@
    18
    19 video.mp4: segment1.mp4 segment2a.mp4 segment4.mp4
    20 ${SH} rm $@.txt | true
    21 ${SH} for f in $^; do echo "file '$$f'" >> $@.txt; done
    22 ${SH} ffmpeg -y -f concat -i $@.txt $@
    23 ${SH} mplayer $@
    24
    25 clip01.mp4: yt-stingray.mp4
    26 ${SH} ffmpeg -i $< -ss 4 -t 10 -strict -2 $@
    27
    28 clip02.mp4: yt-hardcastle01.mp4
    29 ${SH} ffmpeg -i $< -ss 9 -t 10 -strict -2 $@
    30
    31 yt-stingray.mp4:
    32 ${SH} youtube-dl -f mp4 -o $@ https://www.youtube.com/watch?v=aXTk9VPZ4Gg
    33
    34 yt-hardcastle01.mp4:
    35 ${SH} youtube-dl -f mp4 -o $@ https://www.youtube.com/watch?v=_oHpWw7L3d4
    36
    37 %-scaled.mp4: %.mp4
    38 ${SH} ffmpeg -i $< -vf "scale=-1:720,pad=1280:ih:(ow-iw)/2" -strict -2 $@
    39
    40
    41 clean:
    42 ${RM} *.mp4


Cheers.




No comments:

Post a Comment