Monday, December 31, 2018

Ffmpeg and Make -- A Match Made In Heaven

Although 'make' is most often used to compile, it's use far exceeds that.  'Make's greatest strength is that of it's dependency engine.  By specifying a set of rules, 'make' executes the rules in order to satisfy the dependency.  This is precisely why 'make' is a good match for ffmpeg, the remainder of this blog will hopefully demonstrate that.

Let's start with a simple rule; we need an input video file which can be satisfied by our first make rule:



$ cat Makefile 

input.mp4 :
 ${SH} youtube-dl https://www.youtube.com/watch?v=5xUFQKxdlxE -o $@

By issuing 'make', it will attempt to satisfy resolving the 'input.mp4' target by downloading the specified file from YouTube.


$ make
youtube-dl https://www.youtube.com/watch?v=5xUFQKxdlxE -o input.mp4
[youtube] 5xUFQKxdlxE: Downloading webpage
[youtube] 5xUFQKxdlxE: Downloading video info webpage
[youtube] 5xUFQKxdlxE: Extracting video information
WARNING: unable to extract uploader nickname
[youtube] 5xUFQKxdlxE: Downloading js player vflqFr_Sb
[download] Destination: input.f133.mp4
[download] 100% of 1.13MiB in 00:03
[download] Destination: input.mp4.f140
[download] 100% of 610.47KiB in 00:00
[ffmpeg] Merging formats into "input.mp4"
Deleting original file input.f133.mp4 (pass -k to keep)
Deleting original file input.mp4.f140 (pass -k to keep)

The format of each rule has 3 primary bits: a target, a prerequisite and a command.  In our rule, the target is 'input.mp4', there is no prerequisite, and the command is the YouTube download command.  Repeated execution of make will have no affect, since the file already exists there is no need to re-execute the rule command.

While simple, this rule doesn't really demonstrate the value of 'make', mostly because of the rule's lack of a prerequisite.  Let's look at another:



$ cat Makefile

input.mp4 :
 ${SH} youtube-dl https://www.youtube.com/watch?v=5xUFQKxdlxE -o $@

1x1.mp4: input.mp4
 ${SH} ffmpeg -i $< -vf scale=640:480 -acodec copy $@

Note the 2nd rule has a prerequisite.  A simple way to read the 2nd rule is "When I need the 1x1.mp4 file, I first need the 'input.mp4' file and once I have it use it in the rule command".  Suppose neither the 'input.mp4' nor the '1x1.mp4' file exists and you issue 'make 1x1.mp4';

  • Make determines in order to create '1x1.mp4' it needs 'input.mp4' (it's prerequisite)
  • Make then finds the 'input.mp4' rule and executes the rule command, downloading the file from YouTube and generates the target file (e.g. 'input.mp4')
  • Now that the pre-requisite is resolved, make returns to the '1x1.mp4' rule and executes the rule command, scaling the video file and generating the target
This chaining of dependencies, when done correctly, can execute a complex series of commands in satisfying the final target.  Better yet, with make's intrinsic parallelism it can do so quicker than a sequential script.  That, my friend, is b-e-a-utiful.

Let's look at a complete makefile;


$ cat Makefile 
all: output.mp4

input.mp4 :
 ${SH} youtube-dl https://www.youtube.com/watch?v=5xUFQKxdlxE -o $@

1x1.mp4: input.mp4
 ${SH} ffmpeg -i $< -vf scale=640:480 -acodec copy $@

2x2.mp4: input.mp4
 ${SH} ffmpeg -i $< -i $< -i $< -i $< \
 -filter_complex " \
 nullsrc=size=640x480 [base]; \
 [0:v] setpts=PTS-STARTPTS, scale=320x240 [upperleft]; \
 [1:v] setpts=PTS-STARTPTS, scale=320x240 [upperright]; \
 [2:v] setpts=PTS-STARTPTS, scale=320x240 [lowerleft]; \
 [3:v] setpts=PTS-STARTPTS, scale=320x240 [lowerright]; \
 [base][upperleft] overlay=shortest=1 [tmp1]; \
 [tmp1][upperright] overlay=shortest=1:x=320 [tmp2]; \
 [tmp2][lowerleft] overlay=shortest=1:y=240 [tmp3]; \
 [tmp3][lowerright] overlay=shortest=1:x=320:y=240 \
 " -c:v libx264 -acodec copy $@

4x4.mp4: 2x2.mp4
 ${SH} ffmpeg -i $< -i $< -i $< -i $< \
 -filter_complex " \
 nullsrc=size=640x480 [base]; \
 [0:v] setpts=PTS-STARTPTS, scale=320x240 [upperleft]; \
 [1:v] setpts=PTS-STARTPTS, scale=320x240 [upperright]; \
 [2:v] setpts=PTS-STARTPTS, scale=320x240 [lowerleft]; \
 [3:v] setpts=PTS-STARTPTS, scale=320x240 [lowerright]; \
 [base][upperleft] overlay=shortest=1 [tmp1]; \
 [tmp1][upperright] overlay=shortest=1:x=320 [tmp2]; \
 [tmp2][lowerleft] overlay=shortest=1:y=240 [tmp3]; \
 [tmp3][lowerright] overlay=shortest=1:x=320:y=240 \
 " -c:v libx264 -acodec copy $@
8x8.mp4: 4x4.mp4
 ${SH} ffmpeg -i $< -i $< -i $< -i $< \
 -filter_complex " \
 nullsrc=size=640x480 [base]; \
 [0:v] setpts=PTS-STARTPTS, scale=320x240 [upperleft]; \
 [1:v] setpts=PTS-STARTPTS, scale=320x240 [upperright]; \
 [2:v] setpts=PTS-STARTPTS, scale=320x240 [lowerleft]; \
 [3:v] setpts=PTS-STARTPTS, scale=320x240 [lowerright]; \
 [base][upperleft] overlay=shortest=1 [tmp1]; \
 [tmp1][upperright] overlay=shortest=1:x=320 [tmp2]; \
 [tmp2][lowerleft] overlay=shortest=1:y=240 [tmp3]; \
 [tmp3][lowerright] overlay=shortest=1:x=320:y=240 \
 " -c:v libx264 -acodec copy $@

16x16.mp4: 8x8.mp4
 ${SH} ffmpeg -i $< -i $< -i $< -i $< \
 -filter_complex " \
 nullsrc=size=640x480 [base]; \
 [0:v] setpts=PTS-STARTPTS, scale=320x240 [upperleft]; \
 [1:v] setpts=PTS-STARTPTS, scale=320x240 [upperright]; \
 [2:v] setpts=PTS-STARTPTS, scale=320x240 [lowerleft]; \
 [3:v] setpts=PTS-STARTPTS, scale=320x240 [lowerright]; \
 [base][upperleft] overlay=shortest=1 [tmp1]; \
 [tmp1][upperright] overlay=shortest=1:x=320 [tmp2]; \
 [tmp2][lowerleft] overlay=shortest=1:y=240 [tmp3]; \
 [tmp3][lowerright] overlay=shortest=1:x=320:y=240 \
 " -c:v libx264 -acodec copy $@

clip01.mp4: 1x1.mp4
 ${SH} ffmpeg -i $< -ss 0 -t 5 -acodec copy $@

clip02.mp4: 2x2.mp4
 ${SH} ffmpeg -i $< -ss 5 -t 5 -acodec copy $@

clip03.mp4: 4x4.mp4
 ${SH} ffmpeg -i $< -ss 10 -t 5 -acodec copy $@

clip04.mp4: 8x8.mp4
 ${SH} ffmpeg -i $< -ss 15 -t 5 -acodec copy $@

clip05.mp4: 16x16.mp4
 ${SH} ffmpeg -i $< -ss 20 -t 5 -acodec copy $@

clip06.mp4: 4x4.mp4
 ${SH} ffmpeg -i $< -ss 25 -acodec copy $@

output.mp4: clip01.mp4 clip02.mp4 clip03.mp4 clip04.mp4 clip05.mp4 clip06.mp4
 ${RM} ./files.txt
 ${SH} for f in `echo $^`; do echo "file '$$f'" >> ./files.txt; done
 ${SH} ffmpeg -y -f concat -i ./files.txt -c copy $@
 ${RM} ./files.txt

clean:
 ${RM} *.mp4

The final target 'output.mp4' is defined as a pre-requisite of the first make rule (e.g. all).  Make attempts to execute the final rule, finds a series of pre-requisites (e.g. clip01.mp4...) attempting to satisfy each pre-requisite each of which has pre-requisites of their own.  Following the chain of dependencies you'll come to the YouTube download rule, with no pre-requisites.  Make then executes that rule, generating the input file and works its way back the dependency chain 'til it is capable of generating the final target.

The final video will start as a 1x1 frame, grow to a 2x2 mosaic, proceed to a 4x4 mosaic.....all the way to a 16x16 mosaic.


I've now used make and ffmpeg in a number of video projects and the more I use them, the more I love using them together.  The dependency engine prevents unnecessarily issuing ffmpeg commands when the target file already exists and easily allows incrementally building the series of files necessary for creating the final video.

Happy Encoding!

Sunday, December 23, 2018

PyPlot - Best Line Fit


It's a pretty common need, a scatter plot of data points and generating a 'best fit line' .



#!/usr/bin/python
import random;
import matplotlib.pyplot as plt
import numpy as np;

def run():
  L=[];
  for x in range(0,500):
    y=random.randint(0,200) + x*3;
    L.append((x,y));

  out = [(float(x), float(y)) for x, y in L];
  for i in out:
     plt.scatter(i[0],i[1]);
     plt.xlabel('X');
     plt.ylabel('Y');
     plt.title('My Title');

  plt.show();

#---main---
run();

The above python snippet generates a scatterplot of data points around the line segment y=3x, applying a random dY.  Take a peek at the scatterplot, and it becomes clear that it follows a linear progression.


Generating a best-fit line is done by:
1) splitting the (x,y) tuples into a list of X values and a list of Y values.
2) plotting a best-fit line using the list of X and list of Y values.

This is done with the following plot command; plotting a red ('r') line, with line width ('lw') of 5:

  plt.plot(np.unique(Lx), np.poly1d(np.polyfit(Lx, Ly, 1))(np.unique(Lx)), lw=5, color='r');

The final script looks like this;


#!/usr/bin/python
import random;
import matplotlib.pyplot as plt
import numpy as np;

def run():
  L=[];
  for x in range(0,500):
    y=random.randint(0,200) + x*3;
    L.append((x,y));

  out = [(float(x), float(y)) for x, y in L];
  for i in out:
     plt.scatter(i[0],i[1]);
     plt.xlabel('X');
     plt.ylabel('Y');
     plt.title('My Title');

  Lx= [ x[0] for x in L ]
  Ly= [ x[1] for x in L ]
  plt.plot(np.unique(Lx), np.poly1d(np.polyfit(Lx, Ly, 1))(np.unique(Lx)), lw=5, color='r');

  plt.show();

#---main---
run();





Cheers.

Monday, December 17, 2018

Bash - Extracting Metrics from Unstructured Logs


Seems the most common thing I need to do as of late is extra extracting metrics from semi-structured debug logs.  Depending on the complexity, the tool from the toolbox is either a quick bash script, or if a bigger hammer is needed...Python.  This post will however focus on quick bash scripts/commands.

'Structured' files can have any number of meanings, predictable by definition, but for now let's say define structured as to mean a predictable number of substrings seperated by a unique delimiter.

For example; consider the following file snippet:

Fri Nov 30 23:06:01 CST 2018;some user log message;10072
Fri Nov 30 23:06:01 CST 2018;some user log message;1908
Fri Nov 30 23:06:01 CST 2018;some user log message;26583
Fri Nov 30 23:06:01 CST 2018;some user log message;22197
Fri Nov 30 23:06:01 CST 2018;some user log message;14374
Fri Nov 30 23:06:01 CST 2018;some user log message;1545
Fri Nov 30 23:06:01 CST 2018;some user log message;31080
Fri Nov 30 23:06:01 CST 2018;some user log message;18157
Fri Nov 30 23:06:01 CST 2018;some user log message;1606
Fri Nov 30 23:06:01 CST 2018;some user log message;19883

If our object is to extract the last element from each line (i.e. the numeric), we can consider the file format simply structured as it has a fixed number of elements with a unique delimiter (ie. ';').  Extracting the 3rd element separated by the ';' delimiter can be simply done by:

$ cat file.txt | cut -f 3 -d ';'

10072
1908
26583
22197
14374
1545
31080
18157
1606
19883

But what if the number of left fields varys rather than staying fixed?

Fri Nov 30 23:08:23 CST 2018;some user log message;something else;18689
Fri Nov 30 23:08:23 CST 2018;some user log message;31685
Fri Nov 30 23:08:23 CST 2018;some user log message;something else;27534
Fri Nov 30 23:08:23 CST 2018;some user log message;17393
Fri Nov 30 23:08:23 CST 2018;some user log message;something else;14007
Fri Nov 30 23:08:23 CST 2018;some user log message;13763
Fri Nov 30 23:08:23 CST 2018;some user log message;something else;11165
Fri Nov 30 23:08:23 CST 2018;some user log message;28675
Fri Nov 30 23:08:23 CST 2018;some user log message;something else;28553
Fri Nov 30 23:08:23 CST 2018;some user log message;6573

From the left, the number of elements (separated by the delimiter) varys from 2-3 dependent on the lines so extracting the 3rd element like done previously won't work.  However, from the right the fields are fixed....so if we could extract the right-most field we've got exactly what we need.

Surprisingly, this is pretty easy using the 'rev' command makes this an easy lift.  The 'rev' command takes a string and simply reverses it character-by-character.


$ echo "Easy Peezy" | rev

yzeeP ysaE



Reversing a reversed string results in the original string.  Obvious, for sure, but how that helps us can be elusively simple, cut the right-most field;

$ cat file.txt | rev | cut -f 1 -d ';' | rev
18689
31685
27534
17393
14007
13763
11165
28675
28553
6573


Cool, so we've got tricks for extracting specific fields for left or right-justified lines.  But what if we've got a more complicated file with even less structure?  The 'top' utility presents a pretty good example of unstructured log contents.

$ top -b -n 10 -d 1 > /tmp/top.out

Say we want to extract the idle metric; present only some of the lines and considerably unstructured, from the left as well as the right.

The 'grep' utility helps out considerably, allowing to return only the expressions satisfied by a specified regular expression.


$ cat /tmp/top.out | grep -oh "[0-9]*.[0-9] id,"
93.6 id,
79.7 id,
97.5 id,
96.8 id,
97.8 id,
97.5 id,
97.8 id,
77.2 id,
98.0 id,
97.3 id,


Pair it with an appropriate 'cut' command and we're gold.

$ cat /tmp/top.out | grep -oh "[0-9]*.[0-9] id," | cut -f 1 -d ' '
93.6
79.7
97.5
96.8
97.8
97.5
97.8
77.2
98.0
97.3


Again, cool but what if we need to perform some simple statistics, like calculating the average; pairing with 'awk' will crush this issue.



$ cat /tmp/top.out | grep -oh "[0-9]*.[0-9] id," | cut -f 1 -d ' ' | awk '{SUM+=$1;} END {print SUM/NR}'
93.32
How about the median?
$ cat /tmp/top.out | grep -oh "[0-9]*.[0-9] id," | cut -f 1 -d ' ' | sort -n | awk '{count[NR]=$1}END{print count[NR/2]}'
97.3


Python, Smython.....bash prevails.

Sunday, December 9, 2018

Detecting Correlation Relationships with Python


Have you ever visualized a data set and 'think' you see a relationship between two data streams?  When there seems to be a connection between one data stream and another this is often referred to as a correlation.  Detecting a correlation with two data streams is surprisingly easy with the numpy library.  The focus of this post will show what I've recently learned on the matter.

Let's create a list of objects with 5 numeric attributes.  Let's assign the first 4 attributes random values, the 5th attribute will 94% of the time be identical to the 4th attribute.  Because the 4th attribute and 5th attributes value are correlated we expect to detect a strong correlation coefficient when we calculate and a much lower coefficient (likely near zero) when comparing the unrelated attributes.  It's worthwhile noting that you likely need a significant sized data set to get a noteworthy coefficient, smaller data sets likely won't be sufficient.

Let's create a list of 1,000,000 elements and calculate the correlation coefficients between each of the values.  We expect a coefficient value between the 4th and 5th element to be 94% as that is the correlation we are forcing.



$ cat demoCorrelation 
#!/usr/bin/python
import numpy;
import uuid;

def run():
  L=[];
  DiceCoeff=0.94;
  for i in range(0,10000):
    e=dict();
    e['id']=str(uuid.uuid4());
    for k in ['c1','c2','c3','c4']:
      e[k]=numpy.random.random_integers(0,5,size=1)[0]
    #--add a bit of randomness to the correlation
    roll = numpy.random.random_integers(0,100,size=1)[0];
    if roll<=DiceCoeff*100:
      e['e1']=e['c4'];
    else:
      e['e1']=numpy.random.random_integers(0,5,size=1)[0];
    L.append(e);

  MinCorrCoeff=0.75;
 
  print "DiceCoeff: %f"%(DiceCoeff);
  print "MinCorrCoeff: %f"%(MinCorrCoeff);
  for k1 in sorted(L[0].keys()):
    L1=[e[k1] for e in L];
    for k2 in sorted(L[0].keys()):
      L2=[e[k2] for e in L];
      try:
        coeff=numpy.corrcoef(L1,L2)[0][1];
        if abs(coeff)>MinCorrCoeff and k1!=k2:
          print "%s/%s : %f"%(k1,k2,coeff);
      except Exception as e:
        pass;
 

#--main--
run()

When we run this beast you'll see that the forced correlation is detected.


$ ./demoCorrelation 
DiceCoeff: 0.940000
MinCorrCoeff: 0.750000
c4/e1 : 0.942737
e1/c4 : 0.942737


Correlation doesn't determine cause/effect, so we expect the bi-directional pairing.

Pretty cool, huh. Use this power for good dear reader.

Monday, December 3, 2018

Installing JeroMQ for Android Project


I've found that ZeroMq is an amazing library for authoring distributed systems.  Cross compiling ZeroMq for Android devices, while feasible, is highly unrecommended.  The general guidance is to instead use JeroMQ (a Java-native compatible library).  This post will outline how to configure it and install it into a simple Android project.

Let's start by snagging the latest release version:



$ wget https://github.com/zeromq/jeromq/archive/v0.4.3.tar.gz


Uncompress the package:


$ tar -zxvf v0.4.3.tar.gz


Build the library:


$ cd jeromq-0.4.3/

$ mvn package

...

[INFO] Building jar: /var/tmp/jeromq-0.4.3/target/jeromq-0.4.3-javadoc.jar

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 03:31 min

[INFO] Finished at: 2018-02-22T19:39:18-06:00

[INFO] Final Memory: 37M/444M

[INFO] ------------------------------------------------------------------------

lipeltgm@kaylee:/var/tmp/jeromq-0.4.3$



Locate the generated Java libraries:


lipeltgm@kaylee:/var/tmp/jeromq-0.4.3$ find . -name "*.jar"
./target/jeromq-0.4.3.jar
./target/jeromq-0.4.3-sources.jar
./target/jeromq-0.4.3-javadoc.jar
lipeltgm@kaylee:/var/tmp/jeromq-0.4.3$ 

Copy the proper library to your Android project:


$ cp /target/jeromq-0.4.3.jar ~/AndroidStudioProjects/App01/app/lib


Test Usage of Library in IDE


package com.fsk.app01;



import android.support.v7.app.AppCompatActivity;

import android.os.Bundle;

import org.zeromq.ZMQ;
import org.zeromq.ZMQ.Socket;
import org.zeromq.ZMQ.Context;



public class MainActivity extends AppCompatActivity {

  private ZMQ.Context context_=null;

  private ZMQ.Socket socket_ = null;

  @Override

  protected void onCreate(Bundle savedInstanceState) {

    super.onCreate(savedInstanceState);

    setContentView(R.layout.activity_main);

    context_ = ZMQ.context(1);
    socket_=context_.socket(ZMQ.REQ);  }

}



And now my friend you have the basis for doing some really cool stuff with ZeroMq in your Android application.

Cheers.

Friday, September 14, 2018

Building FFMpeg - Ubuntu


One continued frustrating point I struggle with is that often command line arguments are version-specific at least often enough to present frustration. For that reason, we'll build FFMpeg from source to make sure the examples work as expected.

The following is the process for configuring/building FFMpeg from scratch on Ubuntu 17.04.  The process should be similar for newer versions as well.

Download

$ wget https://ffmpeg.org/releases/ffmpeg-2.7.2.tar.bz2

Build

$ tar -jxvf ffmpeg-2.7.2.tar.bz2
$ cd ffmpeg-2.7.2
$ sudo apt-get install libx264-dev
$ sudo apt-get install libsdl1.2-dev
$ sudo apt-get install yasm
$ sudo apt-get install libfreetype6-dev
$ sudo apt-get install libmp3lame-dev
$ sudo apt-get install libzmq-dev
$ ./configure --enable-libx264 --enable-nonfree --enable-gpl --enable-libfreetype --enable-libmp3lame --enable-libzmq
$ make
$ make alltools

Install

$ sudo make install
Let's copy a useful utility to the same directory location as ffmpeg.
$ which ffmpeg
$ sudo cp ./tools/graph2dot /usr/local/bin/

Verify

For future reference, you can query the build parameters used to configure FFMpeg by requesting the version info:


$ ffmpeg -v