Tuesday, June 30, 2020

Visualization of Sorting Algorithms

Some time back a trending video made its way into my life, it provided a graphical representation of common sorting algorithms.

This simple video demonstrated the value of visualizing complex algorithms, hopefully something that is incorporated into CS coursework.  The value of divide-n-conquer techniques can be more easily understood with a side-by-side comparison.

We can generate an equivalent video by monitoring the list of numbers, generate a graph at each step and render the sequence of graphs into a video.

In order to generate each graph, we could modify the sorting algorithm to create and store a graph image.  This approach however would require modification to the source code.  An alternative, use a debugger to monitor the list of numbers and print out the list of numbers whenever it is modified.  Then, post-run, we could use the debugger output to create individual graphs and stitch them together.

So, our approach will take the form:
1) create C++ sorting algorithm
2) run w/debugger and monitor the state of the input array
3) parse the debugger log, creating a graphical representation of the input array at each step
4) stitch each frame into a video

1 - Sorting Algorithm
Let's create a simple C++ example with a variety of sorting algorithms.  We'll create a randomized list of integers and prepare to send it through a sorting routine.
$ cat -n main.cpp 
     1 #include <stdio.h>
     2 #include <stdlib.h>
     3 #include <vector>
     4
     5 #ifndef N
     6 #define N 100
     7 #endif
     8
     9 void swap(int& a, int& b)
    10 {
    11   const int tmp=a;
    12   a=b;
    13   b=tmp;
    14 }
    15
    16 void display(int list[], int size)
    17 {
    18   for(int i=0; i<size; i++)
    19   {
    20     printf("%d\n",list[i]);
    21   }
    22 }
    23
    24 //============Selection Sort============
    25 void selectionSort(int list[], int size)
    26 {
    27   for(int i=0; i<size-1; i++)
    28   {
    29     for(int j=i+1; j<size; j++)
    30     {
    31       if (list[j] < list[i])
    32       {
    33         swap(list[j],list[i]);
    34       }
    35     }
    36   }
    37 }
    38 //======================================
    39
    40 //==============Quick Sort==============
    41 int partition(int *a,int start,int end)
    42 {
    43   int pivot=a[end];
    44   //P-index indicates the pivot value index
    45
    46   int P_index=start;
    47   int i,t; //t is temporary variable
    48
    49   //Here we will check if array value is
    50   //less than pivot
    51   //then we will place it at left side
    52   //by swapping
    53
    54   for(i=start;i<end;i++)
    55   {
    56       if(a[i]<=pivot)
    57       {
    58           t=a[i];
    59           a[i]=a[P_index];
    60           a[P_index]=t;
    61           P_index++;
    62       }
    63    }
    64    //Now exchanging value of
    65    //pivot and P-index
    66     t=a[end];
    67     a[end]=a[P_index];
    68     a[P_index]=t;
    69
    70    //at last returning the pivot value index
    71    return P_index;
    72 }
    73
    74 void Quicksort(int *a,int start,int end)
    75 {
    76   if(start<end)
    77   {
    78     int P_index=partition(a,start,end);
    79     Quicksort(a,start,P_index-1);
    80     Quicksort(a,P_index+1,end);
    81   }
    82 }
    83
    84 void quickSort(int list[], int size)
    85 {
    86   Quicksort(list,0,size);
    87 }
    88
    89 void insertionSort(int list[], int size) {
    90
    91   int valueToInsert;
    92   int holePosition;
    93   int i;
    94  
    95   // loop through all numbers 
    96   for(i = 1; i < size; i++) { 
    97
    98      // select a value to be inserted. 
    99      valueToInsert = list[i];
   100
   101      // select the hole position where number is to be inserted 
   102      holePosition = i;
   103
   104      // check if previous no. is larger than value to be inserted 
   105      while (holePosition > 0 && list[holePosition-1] > valueToInsert) {
   106         list[holePosition] = list[holePosition-1];
   107         holePosition--;
   108         printf(" item moved : %d\n" , list[holePosition]);
   109      }
   110
   111      if(holePosition != i) {
   112         printf(" item inserted : %d, at position : %d\n" , valueToInsert,holePosition);
   113         // insert the number at current hole
   114         list[holePosition] = valueToInsert;
   115      }
   116
   117      printf("Iteration %d#:",i);
   118   } 
   119 }
   120
   121 void sort(int list[], int size)
   122 {
   123 //selectionSort(list,size);
   124 //quickSort(list,size);
   125   insertionSort(list,size);
   126 }
   127
   128 int main(int argc, char* argv[])
   129 {
   130   printf("(%s:%d) main process initializing\n",__FILE__,__LINE__);
   131   std::vector<int> L;
   132
   133   for(int i=0; i<N; ++i)
   134   {
   135     L.push_back(rand()%100);
   136   }
   137   sort(L.data(), L.size());
   138  
   139   display(L.data(), L.size());
   140
   141   printf("(%s:%d) main process terminating\n",__FILE__,__LINE__);
   142 }
 
On line 5-8, we declare N which represents the size of the array size, then on lines 131-136 we allocate a list of integers and populate them with randomized numbers.  Line 137 we provide the list as an argument to the sort() method.  

2 - Run w/Debugger
Macros are evil so let me justify the use of N in the C++ source file.  Typically, such a macro would be replaced with a static constant allowing more type-safe implementation.  In this example though the size of the array needs to be known by the GDB commands as well.  The makefile assigns the value of N, passing it in to the compiler as a pre-compiler directive and creates the gdb command file which also needs to know the value of N.
$ cat -n Makefile 
     1 CC=g++
     2 MAIN=main
     3 N=100
     4
     5 all: video.mp4
     6
     7 ${MAIN}: ${MAIN}.cpp
     8 ${CC} -D N=${N} -g $< -o $@
     9
    10 run: ${MAIN}
    11 ${SH} ./${MAIN}
    12
    13 plot: gdb.log
    14 ${SH} ./plotter $<
    15
    16 gdb.cmd:
    17 ${SH} echo "set pagination off" > $@
    18 ${SH} echo "break sort" >> $@
    19 ${SH} echo "commands" >> $@
    20 ${SH} echo "  watch *(int(*)[${N}])(list)" >> $@
    21 ${SH} echo "  commands" >> $@
    22 ${SH} echo "    cont" >> $@
    23 ${SH} echo "  end" >> $@
    24 ${SH} echo "  cont" >> $@
    25 ${SH} echo "end" >> $@
    26 ${SH} echo "run" >> $@
    27 ${SH} echo "quit" >> $@
    28
    29 gdb.log: gdb.cmd ${MAIN}
    30 ${SH} gdb -x gdb.cmd ${MAIN} > $@
    31
    32 video.mp4: plot
    33 ${SH} ffmpeg -i frame%05d.png $@
    34
    35 clean:
    36 ${RM} ${MAIN} *.o gdb.log gdb.cmd
    37 ${RM} *.png *.mp4

Notice on line 3 we define our array size as 100.  On line 8 we pass in the value when compiling the C++ source.  On line 20 we create a gdb command file that we will use to run with the debugger.  I've recently found it useful for makefiles to create such input files as part of the build process rather than author manually and preserve in version control.  The gdb.cmd file is generated as part of the build process and in this case allows the makefile to preserve a consistent value for N in the gdb command file and the C++ source file.

Notice on line 29, we define a  gdb.log target, which runs the executable with the necessary gdb command script and generates a gdb log file.  Line 20 is the key to the gdb script, it attaches a watchpoint to array, when the array changes it spits out the old/new value of the array.  

3 - Parse the Debugger Log
The gdb log file takes the form of:
$ cat -n gdb.log | head -50
     1 GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
     2 Copyright (C) 2014 Free Software Foundation, Inc.
     3 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
     4 This is free software: you are free to change and redistribute it.
     5 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
     6 and "show warranty" for details.
     7 This GDB was configured as "x86_64-linux-gnu".
     8 Type "show configuration" for configuration details.
     9 For bug reporting instructions, please see:
    10 <http://www.gnu.org/software/gdb/bugs/>.
    11 Find the GDB manual and other documentation resources online at:
    12 <http://www.gnu.org/software/gdb/documentation/>.
    13 For help, type "help".
    14 Type "apropos word" to search for commands related to "word"...
    15 Reading symbols from main...done.
    16 Breakpoint 1 at 0x400dcc: file main.cpp, line 125.
    17
    18 Breakpoint 1, sort (list=0x604270, size=100) at main.cpp:125
    19 125   insertionSort(list,size);
    20 Watchpoint 2: *(int(*)[100])(list)
    21 Watchpoint 2: *(int(*)[100])(list)
    22
    23 Old value = {83, 86, 77, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22, 58, 69, 67, 93, 56, 11, 42, 29, 73, 21, 19, 84, 37, 98, 24, 15, 70, 13, 26, 91, 80, 56, 73, 62, 70, 96, 81, 5, 25, 84, 27, 36, 5, 46, 29, 13, 57, 24, 95, 82, 45, 14, 67, 34, 64, 43, 50, 87, 8, 76, 78, 88, 84, 3, 51, 54, 99, 32, 60, 76, 68, 39, 12, 26, 86, 94, 39}
    24 New value = {83, 86, 86, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22, 58, 69, 67, 93, 56, 11, 42, 29, 73, 21, 19, 84, 37, 98, 24, 15, 70, 13, 26, 91, 80, 56, 73, 62, 70, 96, 81, 5, 25, 84, 27, 36, 5, 46, 29, 13, 57, 24, 95, 82, 45, 14, 67, 34, 64, 43, 50, 87, 8, 76, 78, 88, 84, 3, 51, 54, 99, 32, 60, 76, 68, 39, 12, 26, 86, 94, 39}

When the list array is modified the watchpoint breakpoint is tripped which results in lines 23-24, outputting the old/new values.  

4 - Stitch Frames into Video
Last bit, we need a simple Python script that is capable of parsing the gdb log file and generate a image representative of the state of the array.
$ cat -n plotter 
     1 #!/usr/bin/python
     2 import sys;
     3 import re;
     4 import matplotlib.pyplot as plt;
     5
     6 def plot(fileName):
     7   print "plotting %s"%(fileName);
     8   with open(fileName,'r') as fp:
     9     C=fp.read();
    10
    11   i=0;
    12   for line in C.split('\n'):
    13     m=re.match('New value = {(.*)}.*',line);
    14     if m:
    15       L=[int(el) for el in m.group(1).split(',')];
    16       print L;
    17       plt.ylim(0,100);
    18       plt.bar(range(len(L)),L,label='');
    19       plt.savefig('frame%05d.png'%(i));
    20       plt.close();
    21       i+=1;
    22   pass;
    23
    24 #---main---
    25 plot(sys.argv[1]);

This script will parse the 'New value =' lines and generate a bar graph that represents the order of the array.  For example, an initial value of a 10 element array could look like this;
Generating a series of images, during each step of the sorting routine and stitching them together into a video using FFmpeg.

Final results, i give you;

Selection Sort

Insertion Sort

Quick Sort



No comments:

Post a Comment