Dragon Quest 64: July 2020

Tuesday, July 28, 2020

Limiting Network Bandwidth for Testing

Occasionally it's necessary to determine the affects of reduced network bandwidth. Authoring software that works for a networked system backed by Gigabit Ethernet as well as DSL requires testing on both forms of systems. The alternative is to test on the most capable networking system and then test lower-bandwidth systems on the same network by artificially limiting the bandwidth.

Consulting the all-knowing Google provided a couple general options: trickle and wondershaper. While there are other utilities, I mainly focused on these two.

Trickle is a 'lightweight user bandwidth shaper', it's particular value is the ability to limit processes independently and it doesn't require superuser privileges.


$ sudo apt-get install trickle

You can limit uploads and downloads to 100 KB/s for a command such as;


$ trickle -u 100 -d 100 wget http://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_stereo.ogg

Independently, you could specify alternative limits for another command. Trickle is limited to TCP sockets so if you need to limit bandwidth for UDP sockets you'll need to look for an alternative. Applying system-wide bandwidth, TCP in this case, supposedly is supported via trickled daemon utility but after numerous hours I gave up on getting it to work. Regardless, I abandoned it's use not only because I couldn't get it to work but also it's limitation to TCP and applying the daemon and client command required prepending 'trickle' to the command line. This would disqualify applying the network bandwidth constraints to network services like sshd and such without modifying each service starting the process.

Seeking a system-wide bandwidth traffic shaper brought me back to Wondershaper once again. My first experience with wondershaper I found it didn't always work, this round I tested the usage 1-10 Mbps and I quickly found that it limited the bandwidth like expected until reaching 10 Mbps where it jumped to realized bandwidth of 40 Mbps. Head-scratcher; working fine for bandwidths under 10Mbps but broke at >= 10Mbps. Consulting Google again brought me to a fix proposed by 'buzzy'; https://github.com/magnific0/wondershaper/issues/2. Modifying the wondershaper script as 'buzzy' indicated resolved the issue and I could consistently specify bandwidths 1-40 Mbps and observe realized bandwidths accordingly.


$ cat go

#!/bin/bash



for i in `seq 1 50`; do

  echo "$i Mbps"

  K=`expr $i \* 1024`

  sudo wondershaper wlan0 $K $K

  iperf -t 10 -c 192.168.1.132 2> /dev/null | grep "Mbits/sec"

  sleep 2

done

sudo wondershaper wlan0 clear


$ ./go

1 Mbps

[ 3] 0.0-11.2 sec 1.38 MBytes 1.03 Mbits/sec

2 Mbps

[ 3] 0.0-10.7 sec 2.62 MBytes 2.06 Mbits/sec

3 Mbps

[ 3] 0.0-10.4 sec 3.75 MBytes 3.02 Mbits/sec

4 Mbps

[ 3] 0.0-10.4 sec 5.00 MBytes 4.04 Mbits/sec

5 Mbps

[ 3] 0.0-10.5 sec 5.12 MBytes 4.11 Mbits/sec

6 Mbps

[ 3] 0.0-10.3 sec 7.38 MBytes 6.00 Mbits/sec

7 Mbps

[ 3] 0.0-10.3 sec 8.50 MBytes 6.95 Mbits/sec

8 Mbps

[ 3] 0.0-10.3 sec 9.75 MBytes 7.97 Mbits/sec

9 Mbps

[ 3] 0.0-10.2 sec 10.6 MBytes 8.76 Mbits/sec

10 Mbps

[ 3] 0.0-10.1 sec 12.0 MBytes 9.94 Mbits/sec

11 Mbps

[ 3] 0.0-10.1 sec 13.2 MBytes 11.0 Mbits/sec

12 Mbps

[ 3] 0.0-10.2 sec 14.5 MBytes 12.0 Mbits/sec

13 Mbps

[ 3] 0.0-10.2 sec 15.5 MBytes 12.8 Mbits/sec

14 Mbps

[ 3] 0.0-10.3 sec 16.1 MBytes 13.2 Mbits/sec

15 Mbps

[ 3] 0.0-10.1 sec 16.4 MBytes 13.6 Mbits/sec

16 Mbps

[ 3] 0.0-10.1 sec 19.0 MBytes 15.8 Mbits/sec

17 Mbps

[ 3] 0.0-10.1 sec 19.8 MBytes 16.4 Mbits/sec

18 Mbps

[ 3] 0.0-10.1 sec 21.1 MBytes 17.6 Mbits/sec

19 Mbps

[ 3] 0.0-10.1 sec 22.1 MBytes 18.3 Mbits/sec

20 Mbps

[ 3] 0.0-10.1 sec 23.1 MBytes 19.3 Mbits/sec

...

While the use of WonderShaper requires superuser privileges, it limits the system network bandwidth in it's entirety. Enjoy.

Monday, July 20, 2020

Performance Appraisals -- 'Compared to Who?'

Photo Credit: https://unsplash.com/@sernarial

Engineers often love to measure; we measure system performance, we measure clock drift, we measure all sorts of fascinating things, we even occasionally measure ourselves. Even so, show me an engineer that didn't encounter anxiety when contributing to their first performance appraisal and I'll show you someone who's pulling your chain.

I recently stumbled upon a Reddit post for a junior'ish software engineer who was asking for advice with respect to his recent performance appraisal and I'd have to say, he was getting a good deal of 'bad advice' in my opinion. A good deal of the contributors seemed jaded, perhaps bitter, and contributed tales as to the dire consequences of bad performance reviews or suggesting that performance reviews were completely useless.

I remember being tasked with my first performance review, an overwhelming anxiety filled me for weeks. The feeling resembled being lined up in a middle-school gymnasium by height; 'will I be too short', 'stand up straight', 'should I stand on my tiptoes'? This original poster (OP) was bombarded with conflicting advice; "pay no attention to them, they are meaningless", "rank yourself exceptional or you'll never get a promotion", "rank yourself honestly, it's just a conversation starter".

Personally, I think performance appraisals are useful and I have found them to help me progress personally as well as professionally. That said, I've been fortunate to work with companies, managers, and leadership that used them online with how I felt they'd be used. The anxiety and conflicting advice to this poor junior engineer are probably due to advice givers getting burned in their own experiences. As valuable as performance reviews can be, they tend to have some fundamental blindspots which could be addressed in training and/or simply communication to the teams. Twenty-*mumble* years in the profession, formal training by a variety of employers and these questions tend never to be addressed, frankly.....because they can be hard questions and typically extend beyond the expertise and/or reach of human resources (who tend to facilitate and provide the training).

What is Average?

Nearly every performance appraisal form applies some form of a numeric scale; 1-need improvement, 5-exceptional or something of that nature. Training will tend to say "rate yourself" without going into much detail on the scale, instead focusing on the skill definitions. Most engineers come readily armed with a flurry of mathematical prowess, well-versed in the normal distribution bell curve, and likely are self-aware enough to easily compare themselves to the average if they know what an 'average' is. In the absence of a clear definition they tend to define our own criteria. The result; wild deviations in personal ratings, not uncommon to observe a seasoned engineer proficient in X rating themselves average and an engineer less proficient rating themselves exceptional and you're left with the compelling question "what gives?". Lack of guidance from the organization, manager and leadership contributes to such a quandary.

As an organization, you could/should provide guidance by defining the sample set, for example:

Average in relation to the team:

You're surrounded by your team, work with them every day, and likely know how you stack up with respect to a set of skills for each. Perhaps you stand head and shoulders above your team in terms of communications, or perhaps you are new to a technology and below average in knowledge on the subject. The value of this guidance, it is easily relatable and a bell-curve distribution would be expected from the team collective. "I'm better at some things than most of the team, short in others". A junior or new team member would likely expect to be below average for a period until they gain the same degree of proficiency as the rest of the team. A below average wouldn't be treated as a red-flag or brand, instead it would be used to align opportunities to improve on the skill (if needed/desired) or better align assignments with team members.

Average in relation to your colleagues:

Some companies take exceptional pride in "we only hire exceptional people" and such phrases tend to add confusion. "If I'm working here, and the company is comprised of only exceptional folks then I guess I should rate myself exceptional." Without clear guidance, one could easily arrive at this on their own. Say you worked on tech XXX for the past years, have become proficient in it to an expert level in the general population, then hired into a company that specializes in XXX. Are you now average, or exceptional?

Average in relation to your experience:

Consider the talent pipeline, junior engineers coming onto the team, transitioning into senior contributors over the course of their career. Should they rate themselves low until they become as proficient with experience or should they rate themselves with respect to their years of experience? Or should they rate themselves solely with respect to their peers sharing the same title/experience?

How Will These Be Used?

The organization and leadership primarily control how performance appraisals are applied within the organization, but rarely formally communicate it. A great deal of anxiety and fear with respect to performance appraisals are a consequence of this. A fearing employee may feel that the organization may only promote those with exceptional ratings, use the ratings as a litmus test when downsizing or use the ratings for pay increases. When you fear your ratings will be used against you you're more likely to elevate them artificially. It's been my observation that peer reviews are often orchestrated and authored with a specific purpose, but later find their way into broader use. Additionally, in an industry of lean development philosophies no one wants to dedicate significant time into something that could be streamlined or eliminated, time is too precious to waste.

Are They Scored Consistently?

As a manager, I strived to be consistent in rating my team. Humans are easily influenced beings and it is important to self-manage external factors. "Whelp, a miserable Monday, I have a ripping headache.....let's get back to John's performance appraisal" -- a recipe for disaster for you, John and the team. It unfortunately doesn't end there, suppose the H/W manager and S/W manager used different definitions of 'average', one rating their team higher than the other. Performance appraisals tend not to be contained to departments, so how will the VP of Engineering interpret this inconsistency? Will they think one department is stronger than another? Will HR? The CEO?

These are but a few considerations to take into consideration, lack of guidance results in a team, or individuals simply 'filling in the gaps' which can be beneficial, but can also add complications and confusion down the line.

Performance appraisals can be extraordinarily useful, to an individual, to a team and to a company. They can facilitate a self-directed review of who you are, what you're good at, and who you want to become. They set a stage to have an honest dialog between individual contributors and leadership, how they see each other and how they affect one another, and can provide a popcorn trail tracking the progression of a fresh employee to one that progresses to an organizational giant. Their sheer existence enforces a 'measure what you care about' philosophy and an organization committed to professional growth of their team really demands their use. That said, with a little clarification of definitions they could reduce a good deal of anxiety, confusion and inconsistencies.

Monday, July 13, 2020

Software System Forensics

While a good deal of my career has revolved around green-field development projects, as of recent I've been more heavily involved in existing systems. Since then, I've acquired a new set of skills that help in acquiring knowledge about how the system is operating. As a recent contractor, I've found myself frequently given assignments of the form "we need to understand this component of the system....go research that". More often than not, all subject-matter experts are long gone so you're primarily on your own. Luckily, there are a series of *nix tools that can assist in this discovery process. Let's hit on a few;

While source code is the life-blood of software systems, outside the developer community most team members are more familiar with system components and likely couldn't point you to the source code. Could be a shell script, Perl, Python, Java,.....and while many managers/team-leads likely couldn't aim you toward the source tree of any given system component, they likely can refer you to a process name (or snippet) and that alone can get you rolling on your investigation.

Let's say your big-bossman points you to a data processing feed called spooler1 that is currently running on the system;

Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie

%Cpu(s): 5.7 us, 2.0 sy, 0.5 ni, 91.4 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 8167120 total, 781680 free, 1680440 used, 5705000 buff/cache

KiB Swap: 8385532 total, 8371072 free, 14460 used. 5928716 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

29905 user 20 0 14008 4400 2708 R 20.0 0.1 0:00.22 spooler1

Not a lot to go on, but you may be surprised what you can gleen from a running process.

lsof -- list open files

Let's start with the lsof utility in our investigational journey. This utility does precisely what the name says....it lists all open files that the process has running.

$ lsof -p 29905

lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing

Output information may be incomplete.

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

logMaker 12600 user cwd DIR 8,33 4096 70975491 /home/user/blog/SystemForensics

logMaker 12600 user rtd DIR 8,1 4096 2 /

logMaker 12600 user txt REG 8,1 1037528 41418771 /bin/bash

logMaker 12600 user mem REG 8,1 2981280 20186616 /usr/lib/locale/locale-archive

logMaker 12600 user mem REG 8,1 1868984 45355547 /lib/x86_64-linux-gnu/libc-2.23.so

logMaker 12600 user mem REG 8,1 14608 45355429 /lib/x86_64-linux-gnu/libdl-2.23.so

logMaker 12600 user mem REG 8,1 167240 45355621 /lib/x86_64-linux-gnu/libtinfo.so.5.9

logMaker 12600 user mem REG 8,1 162632 45355450 /lib/x86_64-linux-gnu/ld-2.23.so

logMaker 12600 user mem REG 8,1 26258 20449837 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache

logMaker 12600 user 0u CHR 136,20 0t0 23 /dev/pts/20

logMaker 12600 user 1w REG 8,33 46450 53624834 /home/user/blog/SystemForensics/spooler.log

logMaker 12600 user 2u CHR 136,20 0t0 23 /dev/pts/20

logMaker 12600 user 3r FIFO 0,12 0t0 76487063 pipe

logMaker 12600 user 255r REG 8,33 857 53624833 /home/user/blog/SystemForensics/spooler1

There is a host of information available from this command, but for now, let's focus on two key things (highlighted in blue); the current working directory and the log file. While not relevant to our above example, the lsof utility will provide evidence of relevant libraries and/or network ports. For example, if you observe open network ports and the process linking in Corba libraries....you can get a gauge of what you're in for. A good starting point when trying to understand a system element is understanding the 'goes-intas, and the goes-outtas', so library-dependencies input and output files can set the stage for investigation tasks.

In our example, we now know where this utility is run from and that it generates a log file.

While not every process generates a log file, if a log file exists you've likely stumbled upon a plethora of information. No amount of source code browsing will ever compare to spending time with a quality log file in gaining knowledge of what's going on. Likely you're looking into a system component because it isn't operating as expected, and hopefully the log file will provide evidence of that and give you a hint at where you should focus your next steps.

Simple Log File Analysis

If you're fortunate enough to have access to sophisticated log file analysis tools by all means....use them. For many of us we're stuck with the general system utilities and in many cases you can go far with these simple utilities. I've applied these techniques on some pretty sophisticated log files on existing systems but many companies would consider the logs as proprietary, so we'll use a sample log file format for our purposes.

If we peek at the first few lines of our log file we can get a feel for what we can do with it;

$ head spooler.log

2020-06-10 15:04:02.85|processing Table-E record

2020-06-10 15:04:04.45|processing Table-C record

2020-06-10 15:04:05.15|processing Table-D record

2020-06-10 15:04:05.75|processing Table-C record

2020-06-10 15:04:06.96|processing Table-D record

2020-06-10 15:04:07.66|processing Table-E record

2020-06-10 15:04:08.96|processing Table-D record

2020-06-10 15:04:09.86|Spooler::init() Connection established with peer pub.server.org port 9000

2020-06-10 15:04:11.06|processing Table-B record

2020-06-10 15:04:12.46|processing Table-E record

A well-planned log file will have a well-defined format, hopefully with a time-stamp and a series of events. In our example, we have 6 categories of events {connection requests, and 5-table record processing events}, but only a fraction are evident from the head of the log file. Let's see how we can get a comprehensive list of types of events.

With a focus on discovering a comprehensive list of event types we first see that the timestamp is ever-changing, so we need to ignore the timestamp and extract the rest of the line and see what we have.

$ cat spooler.log | cut -f 2- -d '|' | head -10

processing Table-E record

processing Table-C record

processing Table-D record

processing Table-C record

processing Table-D record

processing Table-E record

processing Table-D record

Spooler::init() Connection established with peer pub.server.org port 9000

processing Table-B record

processing Table-E record

Whelp, that shows promise. Let's sort uniquely the result and see if that gives us what we want;

$ cat spooler.log | cut -f 2- -d '|' | sort -u

processing Table-A record

processing Table-B record

processing Table-C record

processing Table-D record

processing Table-E record

Spooler::init() Connection established with peer pub.server.org port 9000

BAM! Looks good. Now what? The frequency of an event can provide you with some insight into how the system is behaving. For example, repeated 'Connection established...' instances may hint at an unreliable network connection or unstable companion process. Let's look at how many instances for each type of event. We can do this by sorting the events followed by counting the instances of each type. The sort+uniq utilities will do the trick;

$ cat spooler.log | cut -f 2- -d '|' | sort | uniq -c

1422 processing Table-A record

3593 processing Table-B record

1427 processing Table-C record

3572 processing Table-D record

3625 processing Table-E record

761 Spooler::init() Connection established with peer pub.server.org port 9000

From this we can see we have B,D,E records providing the highest volume of entries, a smaller number of A,C records and ~700 instances of connection retries.

While this provides a nice summary of the types of events coming in as well as the quantities, often time-based summaries are essential. Perhaps we're expecting burst-transfers in the off-hours and a smaller volume during the day. We can bring the timestamps back into the equation for a look.

Data analysis is a discovery process, you often find a need to repeatedly refine your search criteria from your starting point.

Let's breakdown a log entry so we can see how we can perform some time-based categorization;

2020-06-10 19:04:23.10|processing Table-B record

date -- char [1..10]

time -- char[12..22]

event -- char[24..]

The cut utility allows us to extract specific fields from each log line. Let's say we are interested in the number of events organized by date, we'd want to preserve the data + event type like this:

$ cat spooler.log | cut -b 12-14,24-

15:processing Table-E record

15:processing Table-C record

...

A bit ugly, but by extracting the hour+':'+event type, you've got the beginnings of what we want. I purposely left the HH:MM colon delimiter in place to separate the hour from the event type;

$ cat spooler.log | cut -b 12-14,24- | sort | uniq -c

326 15:processing Table-A record

847 15:processing Table-B record

340 15:processing Table-C record

831 15:processing Table-D record

829 15:processing Table-E record

180 15:Spooler::init() Connection established with peer pub.server.org port 9000

347 16:processing Table-A record

914 16:processing Table-B record

332 16:processing Table-C record

896 16:processing Table-D record

932 16:processing Table-E record

174 16:Spooler::init() Connection established with peer pub.server.org port 9000

377 17:processing Table-A record

895 17:processing Table-B record

350 17:processing Table-C record

876 17:processing Table-D record

893 17:processing Table-E record

202 17:Spooler::init() Connection established with peer pub.server.org port 9000

348 18:processing Table-A record

868 18:processing Table-B record

373 18:processing Table-C record

904 18:processing Table-D record

912 18:processing Table-E record

190 18:Spooler::init() Connection established with peer pub.server.org port 9000

24 19:processing Table-A record

69 19:processing Table-B record

32 19:processing Table-C record

65 19:processing Table-D record

59 19:processing Table-E record

15 19:Spooler::init() Connection established with peer pub.server.org port 9000

We can now see that how event counts change over the course of the hours. Bursty processing loads would be evident by spending some time with this raw data, or better yet, plotting quantities over time can provide you with real insight into your system. High volume of atypical events can give you a clue to problems; for example, say you observed a high volume of retries/restarts in the wee hours of the morning may imply controlled restarts or potential problems.

Let's prepare to power down on this post. With a handful of lesser-known utilities we can start with a process id, locate an available log file and perform some high-level metrics on it in relatively short order. I've found these utilities particularly useful in these past few months and have acquired a newly-found proficiency using them out of pure necessity. Hopefully it is useful to you fellow readers.

Cheers.

Tuesday, July 7, 2020

Mandelbrot Set with Python

Data can be beautiful. Visualizing data is a worthwhile skill to acquire and it's relatively simple with Python. Let's explore how to do some data visualization as an exercise.

The Mandelbrot set is often regarded as an example of art meeting science. It's generated by evaluating the behavior of complex numbers and generating the results. The end effect is a infinite and beautiful depiction of pure mathematics, an acid-trip of color and structure as you continuously zoom into the graph.

One of the best descriptions of the Mandelbrot set can be found here, I recommend you spend a few minutes to appreciate the concept before we begin graphing it with a simple Python snippet.

In about 30 lines of code we can create our own colorized visualization of the mandelbrot set.

Let's look at the source, then step into some of the details;
$ cat -n mandelbrot
1 #!/usr/bin/python
2 import matplotlib.pyplot as plt;
3 import sys;
4
5 def colorize(n):
6 h="#%06x"%(int(n*2**23));
7 return h;
8
9 xRange=[-2,1];
10 yRange=[-1.5,1.5];
11 incr=0.005;
12
13 x=xRange[0];
14 while(x < xRange[1]):
15 y=yRange[0];
16 while(y < yRange[1]):
17 c=x+y*1j;
18 z=0;
19 try:
20 for k in range(50):
21 z=z**2+c;
22 if(abs(z) < 2):
23 rgb=colorize(abs(z));
24 plt.plot(x,y,'.',color=rgb);
25 except:
26 pass;
27 y += incr;
28 x += incr;
29
30 plt.xlim(xRange[0],xRange[1]);
31 plt.ylim(yRange[0],yRange[1]);
32 plt.savefig(sys.argv[1]);

Let's look over the non-Mandelbrot stuff first. We're using the matplotlib library for our simple plotting example, line 9-10 we define our plot x and y ranges and enforce them in lines 30-31. Finally in line 31 we save the plot to a figure rather than display to the screen. We pass in the figure filename as a command line argument so ./mandelbrot foo.png would generate a foo.png file. Lines 23-24 calculates a color for the pixel and plots it at (x,y). Without going into details, the nested loop (lines 13-28) steps through the floating point 2D range defined by the x and y ranges, stepping by 0.005 (defined in line 11). Each iteration, we selectively plot, or don't plot, a colorized pixel. We plot a pixel if it's position is part of the Mandelbrot set.

The rest of the details are specific to calculating the Mandelbrot set. In particular, lines 17,20-26. The referenced video explores how and why this is done, but let's revisit some of the particulars.

Line 17 consists of the assignment of the complex number for each (x,y) position in the range. Note 1j is the Python representation of complex number i. This assignment was covered in the video, specifically, but easily overlooked;

The inclusion in the mandelbrot set for this position is characterized by how this complex number behaves under the influence of iteration of the function starting at 0. This means, z starts as 0 and we repeatedly inject the f(z)=z^2+c in our loop, the end result will either blow up implying it's not in the mandelbrot set, or it'll remain bounded (e.g. <= 2) which means it's part of the set. Lastly, we use the magnitude of the result to determine the color of the pixel. This is optional, but it adds a level of beauty. We can represent the looping of (x,y) range and determination of it's inclusion/exclusion in the mandelbrot set by the clip from the reference video;

The end result is our visualization;

No go do something cool!