Tuesday, July 28, 2020

Limiting Network Bandwidth for Testing



Occasionally it's necessary to determine the affects of reduced network bandwidth. Authoring software that works for a networked system backed by Gigabit Ethernet as well as DSL requires testing on both forms of systems.  The alternative is to test on the most capable networking system and then test lower-bandwidth systems on the same network by artificially limiting the bandwidth.

Consulting the all-knowing Google provided a couple general options: trickle and wondershaper.  While there are other utilities, I mainly focused on these two.

Trickle is a 'lightweight user bandwidth shaper', it's particular value is the ability to limit processes independently and it doesn't require superuser privileges.


$ sudo apt-get install trickle


You can limit uploads and downloads to 100 KB/s for a command such as;

$ trickle -u 100 -d 100 wget http://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_stereo.ogg


Independently, you could specify alternative limits for another command.  Trickle is limited to TCP sockets so if you need to limit bandwidth for UDP sockets you'll need to look for an alternative.  Applying system-wide bandwidth, TCP in this case, supposedly is supported via trickled daemon utility but after numerous hours I gave up on getting it to work.  Regardless, I abandoned it's use not only because I couldn't get it to work but also it's limitation to TCP and applying the daemon and client command required prepending 'trickle' to the command line.  This would disqualify applying the network bandwidth constraints to network services like sshd and such without modifying each service starting the process.

Seeking a system-wide bandwidth traffic shaper brought me back to Wondershaper once again.  My first experience with wondershaper I found it didn't always work, this round I tested the usage 1-10 Mbps and I quickly found that it limited the bandwidth like expected until reaching 10 Mbps where it jumped to realized bandwidth of 40 Mbps.  Head-scratcher; working fine for bandwidths under 10Mbps but broke at >= 10Mbps.  Consulting Google again brought me to a fix proposed by 'buzzy'; https://github.com/magnific0/wondershaper/issues/2.  Modifying the wondershaper script as 'buzzy' indicated resolved the issue and I could consistently specify bandwidths 1-40 Mbps and observe realized bandwidths accordingly.


$ cat go

#!/bin/bash



for i in `seq 1 50`; do

  echo "$i Mbps"

  K=`expr $i \* 1024`

  sudo wondershaper wlan0 $K $K

  iperf -t 10 -c 192.168.1.132 2> /dev/null | grep "Mbits/sec"

  sleep 2

done

sudo wondershaper wlan0 clear



$ ./go 
1 Mbps
[  3]  0.0-11.2 sec  1.38 MBytes  1.03 Mbits/sec
2 Mbps
[  3]  0.0-10.7 sec  2.62 MBytes  2.06 Mbits/sec
3 Mbps
[  3]  0.0-10.4 sec  3.75 MBytes  3.02 Mbits/sec
4 Mbps
[  3]  0.0-10.4 sec  5.00 MBytes  4.04 Mbits/sec
5 Mbps
[  3]  0.0-10.5 sec  5.12 MBytes  4.11 Mbits/sec
6 Mbps
[  3]  0.0-10.3 sec  7.38 MBytes  6.00 Mbits/sec
7 Mbps
[  3]  0.0-10.3 sec  8.50 MBytes  6.95 Mbits/sec
8 Mbps
[  3]  0.0-10.3 sec  9.75 MBytes  7.97 Mbits/sec
9 Mbps
[  3]  0.0-10.2 sec  10.6 MBytes  8.76 Mbits/sec
10 Mbps
[  3]  0.0-10.1 sec  12.0 MBytes  9.94 Mbits/sec
11 Mbps
[  3]  0.0-10.1 sec  13.2 MBytes  11.0 Mbits/sec
12 Mbps
[  3]  0.0-10.2 sec  14.5 MBytes  12.0 Mbits/sec
13 Mbps
[  3]  0.0-10.2 sec  15.5 MBytes  12.8 Mbits/sec
14 Mbps
[  3]  0.0-10.3 sec  16.1 MBytes  13.2 Mbits/sec
15 Mbps
[  3]  0.0-10.1 sec  16.4 MBytes  13.6 Mbits/sec
16 Mbps
[  3]  0.0-10.1 sec  19.0 MBytes  15.8 Mbits/sec
17 Mbps
[  3]  0.0-10.1 sec  19.8 MBytes  16.4 Mbits/sec
18 Mbps
[  3]  0.0-10.1 sec  21.1 MBytes  17.6 Mbits/sec
19 Mbps
[  3]  0.0-10.1 sec  22.1 MBytes  18.3 Mbits/sec
20 Mbps
[  3]  0.0-10.1 sec  23.1 MBytes  19.3 Mbits/sec
...


While the use of WonderShaper requires superuser privileges, it limits the system network bandwidth in it's entirety. Enjoy.

Monday, July 20, 2020

Performance Appraisals -- 'Compared to Who?'

Photo Credit: https://unsplash.com/@sernarial

Engineers often love to measure; we measure system performance, we measure clock drift, we measure all sorts of fascinating things, we even occasionally measure ourselves.  Even so, show me an engineer that didn't encounter anxiety when contributing to their first performance appraisal and I'll show you someone who's pulling your chain.

I recently stumbled upon a Reddit post for a junior'ish software engineer who was asking for advice with respect to his recent performance appraisal and I'd have to say, he was getting a good deal of 'bad advice' in my opinion.  A good deal of the contributors seemed jaded, perhaps bitter, and contributed tales as to the dire consequences of bad performance reviews or suggesting that performance reviews were completely useless.

I remember being tasked with my first performance review, an overwhelming anxiety filled me for weeks.  The feeling resembled being lined up in a middle-school gymnasium by height; 'will I be too short', 'stand up straight', 'should I stand on my tiptoes'?  This original poster (OP) was bombarded with conflicting advice; "pay no attention to them, they are meaningless", "rank yourself exceptional or you'll never get a promotion", "rank yourself honestly, it's just a conversation starter".

Personally, I think performance appraisals are useful and I have found them to help me progress personally as well as professionally.  That said, I've been fortunate to work with companies, managers, and leadership that used them online with how I felt they'd be used. The anxiety and conflicting advice to this poor junior engineer are probably due to advice givers getting burned in their own experiences.  As valuable as performance reviews can be, they tend to have some fundamental blindspots which could be addressed in training and/or simply communication to the teams.  Twenty-*mumble* years in the profession, formal training by a variety of employers and these questions tend never to be addressed, frankly.....because they can be hard questions and typically extend beyond the expertise and/or reach of human resources (who tend to facilitate and provide the training).

What is Average?

Nearly every performance appraisal form applies some form of a numeric scale; 1-need improvement, 5-exceptional or something of that nature.  Training will tend to say "rate yourself" without going into much detail on the scale, instead focusing on the skill definitions.  Most engineers come readily armed with a flurry of mathematical prowess, well-versed in the normal distribution bell curve,  and likely are self-aware enough to easily compare themselves to the average if they know what an 'average' is.  In the absence of a clear definition they tend to define our own criteria.  The result; wild deviations in personal ratings, not uncommon to observe a seasoned engineer proficient in X rating themselves average and an engineer less proficient rating themselves exceptional and you're left with the compelling question "what gives?".  Lack of guidance from the organization, manager and leadership contributes to such a quandary.

As an organization, you could/should provide guidance by defining the sample set, for example:

Average in relation to the team:

You're surrounded by your team, work with them every day, and likely know how you stack up with respect to a set of skills for each.  Perhaps you stand head and shoulders above your team in terms of communications, or perhaps you are new to a technology and below average in knowledge on the subject.  The value of this guidance, it is easily relatable and a bell-curve distribution would be expected from the team collective.  "I'm better at some things than most of the team, short in others".  A junior or new team member would likely expect to be below average for a period until they gain the same degree of proficiency as the rest of the team.  A below average wouldn't be treated as a red-flag or brand, instead it would be used to align opportunities to improve on the skill (if needed/desired) or better align assignments with team members.

Average in relation to your colleagues:

Some companies take exceptional pride in "we only hire exceptional people" and such phrases tend to add confusion.  "If I'm working here, and the company is comprised of only exceptional folks then I guess I should rate myself exceptional."  Without clear guidance, one could easily arrive at this on their own.  Say you worked on tech XXX for the past years, have become proficient in it to an expert level in the general population, then hired into a company that specializes in XXX.  Are you now average, or exceptional? 

Average in relation to your experience:

Consider the talent pipeline, junior engineers coming onto the team, transitioning into senior contributors over the course of their career.  Should they rate themselves low until they become as proficient with experience or should they rate themselves with respect to their years of experience?  Or should they rate themselves solely with respect to their peers sharing the same title/experience? 

How Will These Be Used?


The organization and leadership primarily control how performance appraisals are applied within the organization, but rarely formally communicate it.  A great deal of anxiety and fear with respect to performance appraisals are a consequence of this.  A fearing employee may feel that the organization may only promote those with exceptional ratings, use the ratings as a litmus test when downsizing or use the ratings for pay increases.  When you fear your ratings will be used against you you're more likely to elevate them artificially.  It's been my observation that peer reviews are often orchestrated and authored with a specific purpose, but later find their way into broader use.  Additionally, in an industry of lean development philosophies no one wants to dedicate significant time into something that could be streamlined or eliminated, time is too precious to waste.

Are They Scored Consistently?


As a manager, I strived to be consistent in rating my team.  Humans are easily influenced beings and it is important to self-manage external factors.  "Whelp, a miserable Monday, I have a ripping headache.....let's get back to John's performance appraisal" -- a recipe for disaster for you, John and the team.  It unfortunately doesn't end there, suppose the H/W manager and S/W manager used different definitions of 'average', one rating their team higher than the other.  Performance appraisals tend not to be contained to departments, so how will the VP of Engineering interpret this inconsistency?  Will they think one department is stronger than another?  Will HR?  The CEO?





These are but a few considerations to take into consideration, lack of guidance results in a team, or individuals simply 'filling in the gaps' which can be beneficial, but can also add complications and confusion down the line.

Performance appraisals can be extraordinarily useful, to an individual, to a team and to a company.  They can facilitate a self-directed review of who you are, what you're good at, and who you want to become.  They set a stage to have an honest dialog between individual contributors and leadership, how they see each other and how they affect one another, and can provide a popcorn trail tracking the progression of a fresh employee to one that progresses to an organizational giant.   Their sheer existence enforces a 'measure what you care about' philosophy and an organization committed to professional growth of their team really demands their use.  That said, with a little clarification of definitions they could reduce a good deal of anxiety, confusion and inconsistencies.


Monday, July 13, 2020

Software System Forensics


While a good deal of my career has revolved around green-field development projects, as of recent I've been more heavily involved in existing systems.  Since then, I've acquired a new set of skills that help in acquiring knowledge about how the system is operating.  As a recent contractor, I've found myself frequently given assignments of the form "we need to understand this component of the system....go research that".  More often than not, all subject-matter experts are long gone so you're primarily on your own.  Luckily, there are a series of *nix tools that can assist in this discovery process.  Let's hit on a few;

While source code is the life-blood of software systems, outside the developer community most team members are more familiar with system components and likely couldn't point you to the source code.  Could be a shell script, Perl, Python, Java,.....and while many managers/team-leads likely couldn't aim you toward the source tree of any given system component, they likely can refer you to a process name (or snippet) and that alone can get you rolling on your investigation.

Let's say your big-bossman points you to a data processing feed called spooler1 that is currently running on the system;

Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.7 us,  2.0 sy,  0.5 ni, 91.4 id,  0.4 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8167120 total,   781680 free,  1680440 used,  5705000 buff/cache
KiB Swap:  8385532 total,  8371072 free,    14460 used.  5928716 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                   
29905 user  20   0   14008   4400   2708 R  20.0  0.1   0:00.22 spooler1

  Not a lot to go on, but you may be surprised what you can gleen from a running process.

lsof -- list open files

Let's start with the lsof utility in our investigational journey.  This utility does precisely what the name says....it lists all open files that the process has running.

$ lsof -p 29905
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
COMMAND    PID     USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
logMaker 12600 user  cwd    DIR   8,33     4096 70975491 /home/user/blog/SystemForensics
logMaker 12600 user  rtd    DIR    8,1     4096        2 /
logMaker 12600 user  txt    REG    8,1  1037528 41418771 /bin/bash
logMaker 12600 user  mem    REG    8,1  2981280 20186616 /usr/lib/locale/locale-archive
logMaker 12600 user  mem    REG    8,1  1868984 45355547 /lib/x86_64-linux-gnu/libc-2.23.so
logMaker 12600 user  mem    REG    8,1    14608 45355429 /lib/x86_64-linux-gnu/libdl-2.23.so
logMaker 12600 user  mem    REG    8,1   167240 45355621 /lib/x86_64-linux-gnu/libtinfo.so.5.9
logMaker 12600 user  mem    REG    8,1   162632 45355450 /lib/x86_64-linux-gnu/ld-2.23.so
logMaker 12600 user  mem    REG    8,1    26258 20449837 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
logMaker 12600 user    0u   CHR 136,20      0t0       23 /dev/pts/20
logMaker 12600 user    1w   REG   8,33    46450 53624834 /home/user/blog/SystemForensics/spooler.log
logMaker 12600 user    2u   CHR 136,20      0t0       23 /dev/pts/20
logMaker 12600 user    3r  FIFO   0,12      0t0 76487063 pipe
logMaker 12600 user  255r   REG   8,33      857 53624833 /home/user/blog/SystemForensics/spooler1

There is a host of information available from this command, but for now, let's focus on two key things (highlighted in blue); the current working directory and the log file.  While not relevant to our above example, the lsof utility will provide evidence of relevant libraries and/or network ports.  For example, if you observe open network ports and the process linking in Corba libraries....you can get a gauge of what you're in for.  A good starting point when trying to understand a system element is understanding the 'goes-intas, and the goes-outtas', so library-dependencies input and output files can set the stage for investigation tasks.

In our example, we now know where this utility is run from and that it generates a log file.

While not every process generates a log file, if a log file exists you've likely stumbled upon a plethora of information.  No amount of source code browsing will ever compare to spending time with a quality log file in gaining knowledge of what's going on.  Likely you're looking into a system component because it isn't operating as expected, and hopefully the log file will provide evidence of that and give you a hint at where you should focus your next steps.

Simple Log File Analysis
If you're fortunate enough to have access to sophisticated log file analysis tools by all means....use them.  For many of us we're stuck with the general system utilities and in many cases you can go far with these simple utilities.  I've applied these techniques on some pretty sophisticated log files on existing systems but many companies would consider the logs as proprietary, so we'll use a sample log file format for our purposes.

If we peek at the first few lines of our log file we can get a feel for what we can do with it;
$ head spooler.log 
2020-06-10 15:04:02.85|processing Table-E record
2020-06-10 15:04:04.45|processing Table-C record
2020-06-10 15:04:05.15|processing Table-D record
2020-06-10 15:04:05.75|processing Table-C record
2020-06-10 15:04:06.96|processing Table-D record
2020-06-10 15:04:07.66|processing Table-E record
2020-06-10 15:04:08.96|processing Table-D record
2020-06-10 15:04:09.86|Spooler::init() Connection established with peer pub.server.org port 9000
2020-06-10 15:04:11.06|processing Table-B record
2020-06-10 15:04:12.46|processing Table-E record

A well-planned log file will have a well-defined format, hopefully with a time-stamp and a series of events.  In our example, we have 6 categories of events {connection requests, and 5-table record processing events}, but only a fraction are evident from the head of the log file.  Let's see how we can get a comprehensive list of types of events.

With a focus on discovering a comprehensive list of event types we first see that the timestamp is ever-changing, so we need to ignore the timestamp and extract the rest of the line and see what we have.

$ cat spooler.log | cut -f 2- -d '|' | head -10
processing Table-E record
processing Table-C record
processing Table-D record
processing Table-C record
processing Table-D record
processing Table-E record
processing Table-D record
Spooler::init() Connection established with peer pub.server.org port 9000
processing Table-B record
processing Table-E record

Whelp, that shows promise.  Let's sort uniquely the result and see if that gives us what we want;

$ cat spooler.log | cut -f 2- -d '|' | sort -u
processing Table-A record
processing Table-B record
processing Table-C record
processing Table-D record
processing Table-E record
Spooler::init() Connection established with peer pub.server.org port 9000

BAM!  Looks good.  Now what?  The frequency of an event can provide you with some insight into how the system is behaving.  For example, repeated 'Connection established...' instances may hint at an unreliable network connection or unstable companion process.  Let's look at how many instances for each type of event.  We can do this by sorting the events followed by counting the instances of each type.  The sort+uniq utilities will do the trick;
$ cat spooler.log | cut -f 2- -d '|' | sort | uniq -c
   1422 processing Table-A record
   3593 processing Table-B record
   1427 processing Table-C record
   3572 processing Table-D record
   3625 processing Table-E record
    761 Spooler::init() Connection established with peer pub.server.org port 9000

From this we can see we have B,D,E records providing the highest volume of entries, a smaller number of A,C records and ~700 instances of connection retries.  

While this provides a nice summary of the types of events coming in as well as the quantities, often time-based summaries are essential.  Perhaps we're expecting burst-transfers in the off-hours and a smaller volume during the day.  We can bring the timestamps back into the equation for a look.

Data analysis is a discovery process, you often find a need to repeatedly refine your search criteria from your starting point.  
Let's breakdown a log entry so we can see how we can perform some time-based categorization;
2020-06-10 19:04:23.10|processing Table-B record

date -- char [1..10]
time -- char[12..22]
event -- char[24..]

The cut utility allows us to extract specific fields from each log line.  Let's say we are interested in the number of events organized by date, we'd want to preserve the data + event type like this:
$ cat spooler.log | cut -b 12-14,24-  
15:processing Table-E record
15:processing Table-C record
...

A bit ugly, but by extracting the hour+':'+event type, you've got the beginnings of what we want.  I purposely left the HH:MM colon delimiter in place to separate the hour from the event type;
$ cat spooler.log | cut -b 12-14,24-  | sort | uniq -c
    326 15:processing Table-A record
    847 15:processing Table-B record
    340 15:processing Table-C record
    831 15:processing Table-D record
    829 15:processing Table-E record
    180 15:Spooler::init() Connection established with peer pub.server.org port 9000
    347 16:processing Table-A record
    914 16:processing Table-B record
    332 16:processing Table-C record
    896 16:processing Table-D record
    932 16:processing Table-E record
    174 16:Spooler::init() Connection established with peer pub.server.org port 9000
    377 17:processing Table-A record
    895 17:processing Table-B record
    350 17:processing Table-C record
    876 17:processing Table-D record
    893 17:processing Table-E record
    202 17:Spooler::init() Connection established with peer pub.server.org port 9000
    348 18:processing Table-A record
    868 18:processing Table-B record
    373 18:processing Table-C record
    904 18:processing Table-D record
    912 18:processing Table-E record
    190 18:Spooler::init() Connection established with peer pub.server.org port 9000
     24 19:processing Table-A record
     69 19:processing Table-B record
     32 19:processing Table-C record
     65 19:processing Table-D record
     59 19:processing Table-E record
     15 19:Spooler::init() Connection established with peer pub.server.org port 9000

We can now see that how event counts change over the course of the hours.  Bursty processing loads would be evident by spending some time with this raw data, or better yet, plotting quantities over time can provide you with real insight into your system.  High volume of atypical events can give you a clue to problems; for example, say you observed a high volume of retries/restarts in the wee hours of the morning may imply controlled restarts or potential problems.

Let's prepare to power down on this post.  With a handful of lesser-known utilities we can start with a process id, locate an available log file and perform some high-level metrics on it in relatively short order.  I've found these utilities particularly useful in these past few months and have acquired a newly-found proficiency using them out of pure necessity.  Hopefully it is useful to you fellow readers. 

Cheers.














Tuesday, July 7, 2020

Mandelbrot Set with Python



Data can be beautiful.  Visualizing data is a worthwhile skill to acquire and it's relatively simple with Python.  Let's explore how to do some data visualization as an exercise.

The Mandelbrot set is often regarded as an example of art meeting science.  It's generated by evaluating the behavior of complex numbers and generating the results.  The end effect is a infinite and beautiful depiction of pure mathematics, an acid-trip of color and structure as you continuously zoom into the graph.

One of the best descriptions of the Mandelbrot set can be found here, I recommend you spend a few minutes to appreciate the concept before we begin graphing it with a simple Python snippet.

In about 30 lines of code we can create our own colorized visualization of the mandelbrot set.

Let's look at the source, then step into some of the details;
$ cat -n mandelbrot 
     1 #!/usr/bin/python
     2 import matplotlib.pyplot as plt;
     3 import sys;
     4
     5 def colorize(n):
     6   h="#%06x"%(int(n*2**23));
     7   return h;
     8
     9 xRange=[-2,1];
    10 yRange=[-1.5,1.5];
    11 incr=0.005;
    12
    13 x=xRange[0];
    14 while(x < xRange[1]):
    15   y=yRange[0];
    16   while(y < yRange[1]):
    17     c=x+y*1j;
    18     z=0;
    19     try:
    20       for k in range(50):
    21         z=z**2+c;
    22       if(abs(z) < 2):
    23         rgb=colorize(abs(z));
    24         plt.plot(x,y,'.',color=rgb);
    25     except:
    26       pass;
    27     y += incr;
    28   x += incr;
    29
    30 plt.xlim(xRange[0],xRange[1]);
    31 plt.ylim(yRange[0],yRange[1]);
    32 plt.savefig(sys.argv[1]);


Let's look over the non-Mandelbrot stuff first.  We're using the matplotlib library for our simple plotting example, line 9-10 we define our plot x and y ranges and enforce them in lines 30-31.  Finally in line 31 we save the plot to a figure rather than display to the screen.  We pass in the figure filename as a command line argument so ./mandelbrot foo.png would generate a foo.png file.  Lines 23-24 calculates a color for the pixel and plots it at (x,y).  Without going into details, the nested loop (lines 13-28) steps through the floating point 2D range defined by the x and y ranges, stepping by 0.005 (defined in line 11).  Each iteration, we selectively plot, or don't plot, a colorized pixel.  We plot a pixel if it's position is part of the Mandelbrot set.

The rest of the details are specific to calculating the Mandelbrot set.  In particular, lines 17,20-26.  The referenced video explores how and why this is done, but let's revisit some of the particulars.

Line 17 consists of the assignment of the complex number for each (x,y) position in the range.  Note 1j is the Python representation of complex number i.   This assignment was covered in the video, specifically, but easily overlooked;

The inclusion in the mandelbrot set for this position is characterized by how this complex number behaves under the influence of iteration of the function starting at 0.  This means, z starts as 0 and we repeatedly inject the f(z)=z^2+c in our loop, the end result will either blow up implying it's not in the mandelbrot set, or it'll remain bounded (e.g. <= 2) which means it's part of the set.  Lastly, we use the magnitude of the result to determine the color of the pixel.  This is optional, but it adds a level of beauty. We can represent the looping of (x,y) range and determination of it's inclusion/exclusion in the mandelbrot set by the clip from the reference video;

The end result is our visualization;
No go do something cool!