Thursday, October 27, 2016

Slowing Program Execution -- Slow Your Jets

Earlier this week I was assigned debugging a time-critical bug on a financial system.  Reproducing the issue with sufficient consistency is a problem, the suspicion is that slowing the execution of a processing thread would allow reproducing the issue with a high degree of certainty.  The recommendation was to recompile the system with an artificially introduced delay in the processing thread.

As an alternative of creating a one-off build and deploying on a highly used system I thought it would be worthwhile to see if GDB would serve the purpose instead.  The idea would be to attach to the process remotely, introduce a breakpoint within the processing thread which when reached would 'sleep' sufficiently to slow the processing thread sufficiently.

Let's start with a very simple program.

$ cat -n main.cpp
     1 #include <stdio.h>
     2 #include <sys/time.h>
     3 #include <unistd.h>
     6 double currentTime()
     7 {
     8  struct timeval start;
     9  gettimeofday(&start, NULL);
    10  const double mtime=start.tv_sec + (double)start.tv_usec/1000000.0;
    11  return mtime;
    12 }
    14 int main(int argc, char* argv[])
    15 {
    16  printf("%s:%d) main process initializing\n",__FILE__,__LINE__);
    17  while(1)
    18  {
    19    printf("(%s:%d) %lf\n",__FILE__,__LINE__,currentTime());
    20    ::usleep(500000);
    21  }
    22  printf("%s:%d) main process terminating\n",__FILE__,__LINE__);
    23 }

If we assume the while-loop represents the processing thread we wish to slow, setting a breakpoint at line 20 will start us off.  Then, we need to register a sequence of commands with hitting the breakpoint, specifically sleeping for X seconds, then continuing on with the program.  The following GDB command file does just that:

$ cat -n gdb.cmd
     1 b main.cpp:20
     2 commands
     3 silent
     4 !sleep 3.0
     5 cont
     6 end
     7 r

Line 1 adds the breakpoint, followed by lines 2-6 consist of the commands run when tripping the breakpoint.  Line 3, 'silent' disables the usual message about stopping at the breakpoint, if you elect to not silence the message you'll likely need to set pagination off otherwise the debugger will pause once the terminal is filled with debugging content and prompts you to continue.  Line 4 executes a shell command, sleeping 3 seconds, followed by line 5 which continues the program.  Line 6 ends the command sequence.
The last line, line 7, will begin running the program after registering the breakpoint.  As an alternative, you could issue the 'r' command interactively.

Compile the program as you would normally with debugging symbols.

$ gcc -g main.cpp -o main

Run via debugger, and associated GDB script by issuing:

$ gdb -x ./gdb.cmd ./main
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
Find the GDB manual and other documentation resources online at:
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./main...done.
Breakpoint 1 at 0x4006de: file main.cpp, line 20.
main.cpp:16) main process initializing
(main.cpp:19) 1476848880.178461
(main.cpp:19) 1476848883.681235
(main.cpp:19) 1476848887.184019
(main.cpp:19) 1476848890.686872
(main.cpp:19) 1476848894.189695
(main.cpp:19) 1476848897.692520
(main.cpp:19) 1476848901.195763
(main.cpp:19) 1476848904.699169
(main.cpp:19) 1476848908.201601
(main.cpp:19) 1476848911.704499
(main.cpp:19) 1476848915.207383
(main.cpp:19) 1476848918.710270
(main.cpp:19) 1476848921.326889
(main.cpp:19) 1476848921.826974
(main.cpp:19) 1476848922.327060
(main.cpp:19) 1476848922.828102
(main.cpp:19) 1476848923.329105
(main.cpp:19) 1476848923.829189
(main.cpp:19) 1476848924.329266
Program received signal SIGINT, Interrupt.
(gdb) q
Quit anyway? (y or n) y

Notice by the time stamps that the loop is running at ~3 secs, at least until hitting Control-C when resumes normal execution without the debugger breakpoint.

Seems like a viable alternative.  I've intentionally overlooked the complexity of attaching to a remote process but the proof-of-concept seems viable.


Sunday, October 16, 2016

Generating a Video Collage -- Get Your Rock On!

I began with an overall objective of being able to:
1) get a list of the top 100 songs for a given year from the Billboard Charts
2) perform a search for each song on YouTube using the title and artist
3) download the song
4) overlay the title/artist on the video
5) finally, create a video collage consisting of a 30 sec video clip for each song

Let's walk through the steps.  Let's focus on the year 1994 to pay tribute to the high school graduation year of my good friend Marshall, likely the only reader of this blog.  That shows true dedication 'Old Fashioned', poor judgement in good use of your time, but true dedication.

Step 1 -- Get Top 100 Song List

I first took the path of parsing the HTML and extracting the music list into a series of strings but I later abandoned that effort by dusting off an old tool that I last used back in college, say 1996-1998'ish.  Lynx is a text-based web browser with a sexy little secret, the ability to dump to a text file, making parsing the contents significantly easier.

$ lynx --dump -nonumbers  > list.txt
$ cat list.txt
     1   #alternate Edit this page Wikipedia (en) copyright
     3 Billboard Year-End Hot 100 singles of 1994
     5   From Wikipedia, the free encyclopedia
     6   Jump to: navigation, search
     8   This is a list of Billboard magazine's Top Hot 100 songs of 1994.^[1]
     9   № Title Artist(s)
    10   1 "The Sign" Ace of Base
    11   2 "I Swear" All-4-One
    12   3 "I'll Make Love to You" Boyz II Men
    13   4 "The Power of Love" CĂ©line Dion
    14   5 "Hero" Mariah Carey
    15   6 "Stay (I Missed You)" Lisa Loeb and Nine Stories
    16   7 "Breathe Again" Toni Braxton
    17   8 "All for Love" Bryan Adams, Rod Stewart and Sting
    18   9 "All That She Wants" Ace of Base
    19   10 "Don't Turn Around" Ace of Base
    20   11 "Bump n' Grind" R. Kelly
    21   12 "Again" Janet Jackson

This gives us something we can easily work with.  If we grab the title list using awk:
$ awk '/Title/,/ 100 /' ./list.txt 
   № Title Artist(s)
   1 "The Sign" Ace of Base
   2 "I Swear" All-4-One
   3 "I'll Make Love to You" Boyz II Men
   97 "What Is Love" Haddaway
   98 "And Our Feelings" Babyface
   99 "Bop Gun (One Nation)" Ice Cube featuring George Clinton
   100 "I Wanna Be Down" Brandy

For each of these titles, snag the artist and title and prepare to issue a respective Youtube-dl command.

Step 2-3 -- Search and Download Source Videos

Once we have a title and artist we should have sufficient info to perform a search, by issuing commands resembling the following:

 $ youtube-dl --verbose "gvsearch1:the sign ace if base" -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' --output Yt-001

This command will perform a Youtube search and prioritize videos of MP4 format.  The output specifies a prefix, there is not guarantee as to the format.  While there are pages of documentation on Youtube-dl we won't get into detail here, know however you can tailor your search to limit format, quality, resolution. frame-rate,....  Given we aren't constraining such video qualities we'll enforce some constraints post-processing the videos after we've downloaded them.  We can ensure a H264 format within an MP4 container by examining what was downloaded and reencode it if necessary;

    if [ ! -e $prefixName.mp4 ]; then
      ffmpeg -y -i $prefixName* -strict -2 -f h264 $prefixName.mp4

This is a good stage to generate a PNG image to overlay that consists of the List #, Artist and Title:
$ convert -background white -fill black -font FreeSerif-Italic -pointsize 46 label:"1 - Ace of Base - The Sign" 001.png

Step 4 -- Overlay the Title/Artist on the Video

Ensuring the videos share consist resolution and frame-rate will make our lives much easier.  For instance, the the font size of overlay PNG is aimed at 720p videos.  When we later concatenate the videos into a single video you'll find the input videos need to share a consistent frame-rate or the audio falls out of sync and video can stall out.  Let's transcode the Youtube video, normalize it to 720p, 30fps and let's seek in 60 seconds and grab a 30 sec clip.
$ ffmpeg -y -i Yt-001.mp4 -vf "[in] scale=hd720 [top]; movie=001.png[bottom]; [top][bottom] overlay=0:H-56 [out]" -strict -2 -ss 60 -t 15 -r 30 clip-001.mp4

For the observant reader, you may notice the overlay position of (H-56), this places the PNG image at x,y location 0,720-56.  For purposes of understanding, I provided the height dimension of the PNG image which places the overlay where we want it.

Step 5 -- Concatenate the Videos

The last step was the one that gave me the most trouble and made me miss my self-enforced blog deadline last week.  Concatenating the videos occasionally resulted in stalled video or Kung Fu Theatre style audio out-of-sync issues.  Despite many a Google search, I finally found through extensive debugging that the cause of the problem was supplying input videos that didn't have a consistent frame-rate so take care to enforce that before attempting this step like we did in the previous step.

Concating the videos is pretty straightforward: 1) provide a list of video files in a text file and 2) issue a concat filter via FFMpeg.

$ cat files.txt 
file 'clip-001.mp4'
file 'clip-002.mp4'
file 'clip-003.mp4'
file 'clip-004.mp4'

The text file (e.g. files.txt) needs to be located in the same working directory where you issue the FFMpeg command.  

  $ ffmpeg -y -f concat -i files.txt -strict -2 -r 30 video.mp4

A crudely authored script is available that automates this process.  Feel free to take a peek;

The output; the top 100 songs off the Billboard Chart for 1994:

Rock On 'Old Fashioned'!

Wednesday, October 12, 2016

Interactively Waiting On A Command -- Beer:30 Waits for No Command

As my work-day on Thursday was approaching the end, I was facing a predicament.  I had kicked off a long-running command that was uncompressing some 5000+ files earlier in the day, not knowing how long it would take.  It appeared that it would complete in the next couple hours and while I wanted to kick off some tests afterwards, I really didn't want to stick around until it completed.

Often, when this happens I would guess-timate the remaining time and issue a 'sleep' followed by the subsequent command.  Trouble is, what if I over-guessed?  Wasting valuable execution time.  Worst, what if I under-guessed?  A kicked off subsequent command could likely fail and the overnight opportunity would be wasted.  Had I anticipated this situation, I'd simply have created a script with each step and the timing would work itself out.  Unfortunately, I hadn't done so and stopping and restarting the command sequence seemed undesirable.

The 'kill' command seems to be the answer.

The steps include:
1) issuing the 'ps -aef' command to identify the process id of the command you wish to wait for
2) write an interactive script, or script file if you wish, that loops waiting for that process to complete
3) run the subsequent commands

Start by running some long command;

$ ./longCommand

Open an independent terminal and identify the process id of the 'long command';

$ ps -aef | grep longCommand
user  8186  7903  0 21:34 pts/2    00:00:00 /bin/bash ./longCommand
user  8207  8195  0 21:35 pts/17   00:00:00 grep --color=auto longCommand

Then, wait for this process to complete;

$ while kill -0 8186; do sleep 5; done; echo "done"

Followed by running whatever subsequent command(s) you wish.

The following video demonstrates this for your amusement.

Hope this helps.

Monday, October 3, 2016

Unbuffered StdOut

Occasionally I find long running and output intensive tasks often are hindered by stdout buffering.  The most apparent evidence of this is lags in the output followed by bursts of output, regardless of whether its run interactively or redirected output to a file.

The root cause is a default 4K buffer which can have performance advantages, but a hindrance if you're anxious to see the output.  This often is a point of aggravation when running lengthy tests while redirecting stdout to a file to allow:
1) actively monitoring the current state of tests and
2) providing a 'report' (often lengthy) that can be reviewed after the fact.

The following video shows an example of this buffering.  A simple python script, iterating 1024 times with a 0.05 sec delay between iterations, writing a simple message per iteration.  The desired affect is to observe each loop iteration in the output file (monitoring via 'tail -f').  You'll see however the stdout buffering gets in the way.

./run > /tmp/run.log

The stdout buffering can be disabled the 4K buffering via pre-pending the stdbuf command to the same command; 'stdbuf -oL' enables line buffering instead.

stdbuf -oL ./run > /tmp/run.log

Hope you find this useful.  Cheers.