Tuesday, March 24, 2026

Extending Video Introduction

 Sometimes it's useful to slow, or extend the introduction of a video.  For example, suppose you're utilizing a slow video transition (like a fade out/in effect) but don't want to miss the beginning frames of the second video.  By extracting the first frame of the video, elongating it to X seconds we can preserve the transition effect without losing video content.


Let's take a peek at how to accomplish this;


$ cat -n Makefile
     1    all: postVideo.mp4
     2   
     3    video.mp4: BigBuckBunny.mp4
     4        ${SH} ffmpeg -i $< -codec copy -strict -2 -t 10 $@
     5   
     6    image.jpg: video.mp4
     7        ${SH} ffmpeg -i $< -vf "select=eq(n\,0)" -q:v 3 $@
     8        ${SH} display $@
     9   
    10    preVid.mp4: image.jpg
    11        ${SH} ffmpeg -loop 1 -i $< -f lavfi -i aevalsrc=0 -t 3 $@
    12   
    13    postVideo.mp4: preVid.mp4 video.mp4
    14    #    ${SH} ffmpeg -i preVid.mp4 -i video.mp4 -filter_complex "[0:v] [0:a] [1:v] [1:a] concat=n=2:v=1:a=1 [vv] [aa]" -map "[vv]" -map "[aa]" $@
    15        ${SH} ffmpeg -i preVid.mp4 -i video.mp4 -filter_complex "[0:v] [0:a] [1:v] [1:a] concat=n=2:v=1:a=1 [vo] [ao]" -map "[vo]" -map "[ao]" $@
    16   
    17    clean:
    18        ${RM} *.jpg
    19        ${SH} find . -name "*.mp4" -not -name "BigBuckBunny.mp4" -delete

The input video (video.mp4) is generated by grabbing the first 10 seconds of BigBuckBunny.mp4; refer lines 3-4

Then, the first frame of the input video (video.mp4) is extracted and saved as image.jpg (ref lines 6-8)

Then, a new introduction video clip is generated by converting the image into a video clip of X seconds; ref lines 10-11).  Note, a null audio track is created to preserve the audio in the outgoing video.

Lastly, a new video is created by concatenating the intro clip with the original clip, the result as a video with extended first frame. 

One other trick worth mentioning with this makefile, I frequently want a clean target to clobber any video files created along the way.  However, you don't want to delete the original source file, this can be accomplished by utilizing a find+not condition; ref line 19.  This proves useful for many projects.

And, just like that, you've got a video with an elongated intro.

Tuesday, March 17, 2026

Youtube-Dl Broke/Fixing

 'Some folks' use youtube-dl on a regular basis, to pull raw video and create new content.  Early in 2023 it appeared to stop working.  


$ youtube-dl https://www.youtube.com/watch?v=8QWjCzULyNA
[youtube] 8QWjCzULyNA: Downloading webpage
ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.


Forums seemed to imply the issue was known and a fix was in play, so I put my feet up and figured I'd check back in a few weeks.  A month, or two, goes by and reinstallation/retrying didn't seem to resolve.  So I revisited the fix posts and found it referenced a different package/utility so I tried that out.


$ sudo pip3 uninstall youtube-dl

$ sudo pip3 install yt-dlp


$ yt-dlp https://www.youtube.com/watch?v=8QWjCzULyNA

Hope this helps someone else.

Cheers

Thursday, February 1, 2024

Yolo AutoCropping Presentation Videos


 

As camera resolutions continue to improve the feasibility of capturing a full scene of a classroom, lecture, presentation hall, or the such and autonomously focusing attention on the presenter becomes more practical.  Generally, a camera operator pans and zooms in on the presenter as they make their way around the stage to draw the audience attention to the intended target.  Professionally filmed videos draw the audiences attention to the speaker and their production quality contributes to a more informative presentation.

Wide-angle, static camera positions are an alternative for capturing presentations but generally fail to draw the audience attention to the speaker.  With robust object-detection, the position of the presenter can be automated and thru the use of auto-cropping the presenter can offer a budget-friendly alternative to more professional video production facilities.


YOLO (You Only Look Once) takes a different approach from classic computer vision by utilizing a classifier as a detector.  Authored by Joseph Redmon at the University of Washington, YOLO sub-samples and image into regions, assumes each region has an object and executes a classifier on each region, then merges the classifier groups into a list of final objects.  

 

Below is a proof-of-concept utilizing YOLO in an auto-cropping manner.  The wide-angle source video is used as input, object-detection is focused on the front of the room, detects the presenter and auto-cropped around the presenter.  Once the presenter location is available, we use a variety of means to 'pan the camera', the first by snapping to the presenter location, the second by smoothing the camera motion by incorporating a 2-dimensional shaper, the third using the shaper but only moving the camera when the presenter nears the edges of the current crop window.

Each mechanism is a rough implementation, focused on rapid proof-of-concept rather than optimal results, but you get the idea.  


The source video was found on here; Minnebar7

 

 



Tuesday, January 30, 2024

Published My First Python Package


 

 

I started dabbling with Python back in 2012'ish, using it pretty regularly over the years but generally keeping my projects close to home.  Recently, I dipped my toe into publishing a Python package, out to the known universe.

Back in the late 90's, the Precambrian Digital Age, I took a couple courses that continue to pique my interest time and time again.  Parallel processing was primarily constrained to supercomputers like the Cray-1 that was homed in a nearby lab on campus, on full display behind a full-glass wall.  A workhorse which eagerly awaited computationally intensive parallelized programs. 

A customized version of Fortran, its vocabulary, the Computer Science department rarely used it, aerospace and atmospheric sciences most heavily used the system.

The second course, Distributed Operating Systems, taken a bit later seemed to pair well with this budding interest in high performance computing.  Beowulf clusters, commodity-grade networked computers running Linux, could be created from RadioShack-provided equipment fueled by inspiration.   Cloud computing, virtual machines and even network-intensive applications hadn't breached the digital horizon, but small-cluster networked labs provided inspiration that one day multitudes of computing assets would one day join hands in forming highly networked, parallel, distributed systems that can be considered common today.

While robust and reliable distributed systems are highly sought after, engineering them is plagued with challenges.  Failed requests could be due to loss of the sent message, the loss of the response, the destination service abruptly terminating, relocation of the service, a over-tasked memory/cpu that slows the response,....or any number of other factors.  Python and ZeroMQ pair well to allow the creation of a distributed system framework which inspired my budding project.

dividere UG

The public project repository is located at:

https://github.com/lipeltgm/dividere


This is my first cut at publishing a python package, I tried to apply good design, test and documentation principles along the way.  One particular challenge I encountered is that the package dependencies require a version of Protobuf that isn't currently available via 'normal channels'.  I'm hoping in time that complication will self-correct when compliant versions become the default.

Until then, it likely will require manual installation of protobuff-v3.19 (or later) before installing via pip3 from pypi:

https://pypi.org/project/dividere/ 

$ pip3 install dividere


With the foundation in place, I'm intending on extending the framework to support more reliable messaging, database components, robust failover detection and recovery.  

More to come in the future, fingers-crossed.


Monday, January 29, 2024

Embarking on Authoring a Computer Science Book

 


Like many folks, I've occasionally been drawn to 'write a book', despite a real need for it.  Mid 2020, I had a couple inspiring computer science majors contact me via Reddit for advice, mostly involving 'what is CS' and 'How do I get into CS', but one exchange left me puzzled.  A young man from Ireland, just leaving high school was accepted into university but later rejected as a result of Covid reduction in campus actions.  He asked what he could do to get a head-start in self-study or prepare for the industry in the event he never gets in.  My suggestion was simply; "go to the university bookstore, find the CS textbooks, buy them and begin self-study".  I heavily encouraged going to university, and stressed that I doubt I'd be a professional in the industry had I not done so, but as a plan-B aligning your self-study with the university curriculum would be better than adhoc YouTube, influencer offerings, or code camps.  

To my surprise, I quickly found I'm out of touch with how universities teach as of late.  Many no longer have physical textbooks, or virtual ones for that matter, instead they teach via interactive websites with automated grading.  Restricted to university students closed the avenue to my suggestion.  So, over a few days I got a bug to write a book, a collaboration of my CS university teachings in a manner I had wished it was presented to me.  Worse case, after spending some time on it, I'd know how lofty an effort.  

On and off, I return to this passion project unsure of it's practicality but it helps rekindle my love for this career.

 

Attached is a snippet of my work in progress, not even titled yet;

WIP


I'd welcome any feedback on the chapter as well as experiences from anyone who has authored and published a book.

Friday, January 26, 2024

C++ Database ORM Project

 


 

Systems that utilize a database benefit from an automated means of translating database CRUD operations to application language.  

Products like 'ObjectStore' aim to provide mechanisms to exchange data to/from applications to the database in a seamless fashion.

This can be done by providing the means of converting C++ objects into SQL query/insert/update commands and converting query responses back into C++ objects.  

My SQL-fu being significantly rusty, I spend a bit of time attempting to create an ORM/MySql project go get a better understanding on how such a product could be created.

If we take the tact of most 'language-independent' products, we can start with a language-independent intermediary language, one that allows us to define the type of objects we wish to store/retrieve from the database.

A db-object file (e.g. MyDb.odb) can specify a database object as follows:

dbclass MyRecord002
  float val01 as key;
end;

This odb file can then be pre-processed, creating a language-dependent library  components (MyDb.h, MyDb.cpp) which can then be used by applications directly.  Updates to the library component can automatically be applied in the database.  The linked association between the C++ object and the database are enforced by constructors and access methods.

A bit of proof-of-concept at this time, works for a handful of data types {int, float, long, text, char(X)}, soon to add date/time.  At this time, constrained to 'flat' datatypes, but *fingers-crossed* to work for nested datatypes in the future.

https://github.com/fsk-software/pub/tree/master/DbObjOverlay

Recently came across Wt::Dbo reference as well, haven't had a chance to take a look yet; https://www.webtoolkit.eu/wt/doc/tutorial/dbo.html


Monday, April 11, 2022

Extracting Video Information w/FFProbe



It’s an interesting thing about tools, when they deliver what you need from them you’re often uninterested in ’how the sausage is made’, but digging into the details often re-enforces your understanding in the end. Kinda like eating your broccoli, its oftentimes good for you, and likely you’ll be better off having done it.

There is just shy of a bazillion things you can learn about FFmpeg and the video/audio domain, we’re going to spend just a little bit of time trying to understand some of the details readily available to us and hopefully understand the tooling and domain just a little bit more than when we started.

Saddle up, grab a beer and ’read on’ fellow digital cowboy. FFmpeg typically comes paired with a useful utility called ffprobe, a media prober, which we’ll use to examine media files and pull out interesting nuggets of information.

FFprobe, like FFmpeg, is pretty verbose when run writing a ton of debug information to stderr. This proves useful when it’s needed, but burdensome when not. For our uses we will quiet the utilities down by specifying -loglevel quiet. 

Let’s start by examining our media files container.
$ ffprobe - loglevel quiet - show_format BigBuckBunny . mp4
[ FORMAT ]
filename = BigBuckBunny . mp4
nb_streams =2
nb_programs =0
format_name = matroska , webm
format_long_name = Matroska / WebM
start_time = -0.007000
duration =596.501000
size =107903686
bit_rate =1447155
probe_score =100
TAG : C OMPATI BLE_BR ANDS = iso6avc1mp41
TAG : MAJOR_BRAND = dash
TAG : MINOR_VERSION =0
TAG : ENCODER = Lavf56 .40.101
[/ FORMAT ]

As you’re likely aware, a media container is simply a file that contains the video(s), audio(s), and subtitle(s). Media-wide properties, like file size, tags, length....are often available as well as user-defined tags (like GPS, date,...). By default, show format will show all properties of the media  container.Sometimes, you may wish to limit the fields to ones you’re particularly interested in, like duration and size. You’ll notice the user-tags are displayed despite not being specified, i’ve not found a way to suppress them directly so they can simply be ignored.

$ ffprobe - loglevel quiet - show_format BigBuckBunny .
mp4 - show_entries format = duration , size
[ FORMAT ]
duration =596.501000
size =107903686
TAG : COMPATIBLE_BRANDS = iso6avc1mp41
TAG : MAJOR_BRAND = dash
TAG : MINOR_VERSION =0
TAG : ENCODER = Lavf56 .40.101
[/ FORMAT ]

Cool, but not particularly interesting, and really nothing a filemanager couldn’t show you with a simple right-click.

Let’s dig a bit deeper by examining the video/audio frames. By specifying the show frames we can extract debug information for each frame in the media file. Let’s peek at the first few dozen lines.

$ ffprobe - loglevel quiet - show_frames BigBuckBunny .
mp4
[ FRAME ]
media_type = video
stream_index =0
key_frame =1
pkt_pts =0
pkt_pts_time =0.000000
pkt_dts =0
pkt_dts_time =0.000000
best_effort_timestamp =0
best_effort_timestamp_time =0.000000
pkt_duration =41
pkt_durat ion_time =0.041000
pkt_pos =1111
pkt_size =208
width =1280
height =720
pix_fmt = yuv420p
sample_aspect_ratio =1:1
pict_type = I
coded_picture_number =0
display_picture_number =0
interlaced_frame =0
top_field_first =0
repeat_pict =0
color_range = unknown
color_space = unknown
color_primaries = unknown
color_transfer = unknown
chroma_location = left
[/FRAME ]
[ FRAME ]
media_type = audio
stream_index =1
key_frame =1
pkt_pts =0
pkt_pts_time =0.000000
pkt_dts =0
pkt_dts_time =0.000000
best_effort_timestamp = -7
best_effort_timestamp_time = -0.007000
pkt_duration =13
pkt_duration_time =0.013000
pkt_pos =1368
pkt_size =3
sample_fmt = fltp
nb_samples =648
channels =2
channel_layout = stereo
[/ FRAME ]

Notice that this snippet contains two frames, one video and one audio, each
with a set of media-specific fields. Collectively, we’re left with the follow-
ing collection of fields: best effort timestamp, best effort timestamp time, chan-
nel layout, channels, chroma location, coded picture number, color primaries,
color range, color space, color transfer, display picture number, height, inter-
laced frame, key frame, media type, nb samples, pict type, pix fmt, pkt dts, pkt dts time,
pkt duration, pkt duration time, pkt pos, pkt pts, pkt pts time, pkt size, repeat pict,
sample aspect ratio, sample fmt, stream index, top field first, width.

A dilligent and motivated reader could spend time investigating each field, but I’m more of a pass/fail kinda guy so we’ll limit our interest in a few relevant fields and briefly discuss the relevance of others.
Each packet specifies a media type, audio or video. Let’s focus on video frames for now, we can select only video streams to simplify our review.

$ ffprobe - loglevel quiet - select_streams V -
show_frames BigBuckBunny . mp4[ FRAME ]
media_type = video
stream_index =0
key_frame =1
pkt_pts =0
pkt_pts_time =0.000000
pkt_dts =0
pkt_dts_time =0.000000
best_effort_timestamp =0
best_effort_timestamp_time =0.000000
pkt_duration =41
pkt_durat ion_ti me =0.041000
pkt_pos =1111
pkt_size =208
width =1280
height =720
pix_fmt = yuv420p
sample_aspect_ratio =1:1
pict_type = I
coded_picture_number =0
display_picture_number =0
interlaced_frame =0
top_field_first =0
repeat_pict =0
color_range = unknown
color_space = unknown
color_primaries = unknown
color_transfer = unknown
chroma_location = left
[/ FRAME ]
[ FRAME ]
media_type = video
stream_index =0
key_frame =0
pkt_pts =42
pkt_pts_time =0.042000
pkt_dts =42
pkt_dts_time =0.042000
best_effort_timestamp =42
best_effort_timestamp_time =0.042000
pkt_duration =41
pkt_duration_time =0.041000
pkt_pos =1325
pkt_size =37
width =1280
height =720pix_fmt = yuv420p
sample_aspect_ratio =1:1
pict_type = P

Packet Fields You’ll notice there are a number of packet-wise fields (8 specifically), remember even though we are inspecting a file many video/audio protocols support streaming and are more relevant for such purposes. Despite being named with packet-prefix/suffix, they are often relevant for files as well, so don’t simply disregard.

PictureType Field The pict type field can be particularly interesting for those interested in video compression. Video picture types are often referred to as I-frames, B-frames, or P-frames; each having to do with video compression. I-frames are modestly compressed and are considered self-contained, not requiring other frames to decode. P-frames and B-frames (bi-directional) however employ a higher level of compression by capitalizing on similarity with the previous or following frames. For example, rather than compress an entire video frame, if two video frames are similar differing slightly in specific regions we can focus our compression/storage to the regions of change and greatly increase our com-
pression as a result. That’s precisely the relevance of P-frames and B-frames. P-frames use data from the previous frame, storing/compressing what’s different rather than the entire frame. B-frames extend on this by utilizing the previous frame and the following frame. Pretty neat, huh?

Timestamp Fields Two particularly interesting fields are decoding time stamp (DTS) and presentation time stamp (PTS). These timestamps are particularly interesting when you wish to modify the playback speed of a video. The presentation time stamp (PTS) indicates at what respective time the frame should
be ’presented’, or displayed. At 3 minutes, 30.01 seconds into the movie, what frame(s) should pop up for the viewer? Adjusting the PTS of a file therefore can shift, speed up/down or simply alter when the frame is presented. Halving the PTS will speed up a video, doubling the PTS will slow it down. Relatively simple.

The decoding time stamp (DTS) however is often identical (or similar) to the PTS, but not necessarily. Why would we possibly need yet another timestamp? It all comes back to compression, let’s say a sequence of video frames come in the form of I-frames, P-frames and B-frames: I P B B... The I-frame is self contained, the following P-frame (which is dependent on the previous frame) can rely on the previous frame being decompressed before hand (because the previous frame PTS ¡ current frame PTS), but B-frames throw a wrench into the mix. B-frames are reliant on the previous and the next frame, so both those frames must be decompressed before the B-frame can be decompressed. As a general rule, PTS and DTS time stamps tend to only differ when a stream has B-frames in it. The first 30 frames, roughly the first second of our video, are aseries of I,P,B frames.

$ ffprobe -loglevel quiet -select_streams V -show_frames -show_entries frame=pict_type BigBuckBunny.mp4
[ FRAME ]
pict_type = I
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]
[ FRAME ]
pict_type = B
[/ FRAME ]
[ FRAME ]
pict_type = B
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]
[ FRAME ]
pict_type = B
[/ FRAME ]
[ FRAME ]
pict_type = P
[/ FRAME ]

Lovely, right?

Another pair of timestamps, a bit less relevant, are ’best effort’ timestamps (best effort timestamp,best effort timestamp time). These tend to only be relevant for streams that only specify a DTS timestamp (e.g. no PTS). Literally attempting to provide a guess for a PTS-like value, enforcing a monotonicly
increasing timestamp) derived from available timestamp values. 

Video Size Fields So riddle me this, why does each video stream have width/height fields? Wouldn’t that be better suited in the container? One, uniformly sized video file, right? Nope. A container often contains a number of audio tracks (alternative languages, director commentary,...), similarly a number of subtitles for a variety of languages. While not overly-common, a container can providemultiple video streams as well, alternative angles, 360-degree video, picture-in-picture.... Each video stream therefore requires an independent sizing fields to properly display.

So, that's it, that's all I got for now.  I feel like I better understand some of the fields, specifically DTS/PTS and a clearer understanding of the various compression frames.  Hope it was equally useful to you.

Cheers.