Understanding an existing software system can be a daunting task. Diving head-first into a source code repository with the objective of gaining a system understanding can be particularly challenging. Taking the high-dive into source code often results in crawling down a variety of rabbit holes that may or may not be of particular relevance. It's not uncommon for software to have edge cases and/or 'dead code' that while are compiled into the release are rarely (or ever) executed due to run-time constraints. But, really, what are the alternatives?
Whelp friends, what if you could execute a software system, gather method calls, w/caller and callees, and create a visual representation of the process flow? That will be the topic of this particular blog post.
Our power forward; the hustle with the muscle, the beta with aaaalllllll the data.....GDB.
At point guard; the mate that will translate, the teammate that will update....your buddy and mine...Python.
That's our roster; GDB to collect caller/callee information, Python to convert GDB output into something that can be used to generate a visual diagram, and WebSequenceDiagram to create the diagram. This particular team has proven to be quite beneficial when I've been tossed into the deep end of the pool without my water-wings. Let's work through a simple example;
$ cat -n main.cpp
1 #include <stdio.h>
2
3 class C
4 {
5 public:
6 C() { }
7 void beak();
8 void flap();
9 void shake();
10 void clap();
11 };
12
13 void C::beak() {}
14 void C::flap() {}
15 void C::shake() {}
16 void C::clap() {}
17
18 class B
19 {
20 public:
21 B():c_() { }
22 void stepOnce();
23 private:
24 C c_;
25 };
26
27 void B::stepOnce() { c_.beak(); c_.flap(); c_.shake(); c_.clap(); }
28
29 class A
30 {
31 private:
32 B b_;
33 public:
34 A():b_() { }
35 void run();
36 };
37 void A::run() { for(int i=0; i<10; ++i) b_.stepOnce(); }
38
39 int main()
40 {
41 printf("(%s:%d) main process initializing\n",__FILE__,__LINE__);
42 A obj;
43 obj.run();
44 printf("(%s:%d) main process terminating\n",__FILE__,__LINE__);
45 }
Even the most modest of software engineers can peek at this code and understand it without the need for any advanced tools, but this process of capturing debug info and transforming it into a sequence diagram works for far more complicated systems, frankly it's saved me hours and hours of tracing through source code. Fred R. Barnard may have not been a software engineer, but he just as well could have been when he coined the phrase "a picture is worth a thousand words".
So, that's our system, let's turn our attention to GDB. We'll author a GDB command script which will perform all the heavy lifting; we'll enable logging, write gdb info to a gdb.log file, set up breakpoints in methods we are particularly interested in (e.g. class A, B, C), the breakpoints will print the backtrace and release the process to continue. The backtraces saved in the gdb log file will be used to extract the caller/callee methods for our diagram.
$ cat -n gdb.cmd
1 set pagination off
2 set logging file ./gdb.log
3 set logging overwrite on
4 set logging on
5
6 define MyTrace
7 bt 2
8 cont
9 end
10
11 break main
12 commands
13 rbreak ^A::
14 commands
15 MyTrace
16 end
17
18 rbreak ^B::
19 commands
20 MyTrace
21 end
22
23 rbreak ^C::
24 commands
25 MyTrace
26 end
27
28 cont
29 end
30
31 run
32 quit
Armed with the gdb command script, we simply run our main process under gdb as follows:
$ gdb --batch -x ./gdb.cmd ./main 2> /dev/null
When the process terminates, we have a gdb.log file that takes the form:
$ more gdb.log
Breakpoint 1 at 0x40063c: file main.cpp, line 40.
Breakpoint 1, main () at main.cpp:40
40 {
Breakpoint 2 at 0x4006e4: file main.cpp, line 34.
void A::A();
...
Breakpoint 2, A::A (this=0x7fffffffdc77) at main.cpp:34
34 A():b_() { }
#0 A::A (this=0x7fffffffdc77) at main.cpp:34
#1 0x0000000000400670 in main () at main.cpp:42
Breakpoint 4, B::B (this=0x7fffffffdc77) at main.cpp:21
21 B():c_() { }
#0 B::B (this=0x7fffffffdc77) at main.cpp:21
#1 0x00000000004006f0 in A::A (this=0x7fffffffdc77) at main.cpp:34
Breakpoint 6, C::C (this=0x7fffffffdc77) at main.cpp:6
6 C() { }
#0 C::C (this=0x7fffffffdc77) at main.cpp:6
#1 0x00000000004006d4 in B::B (this=0x7fffffffdc77) at main.cpp:21
Breakpoint 3, A::run (this=0x7fffffffdc77) at main.cpp:37
37 void A::run() { for(int i=0; i<10; ++i) b_.stepOnce(); }
#0 A::run (this=0x7fffffffdc77) at main.cpp:37
#1 0x000000000040067c in main () at main.cpp:43
Since we created breakpoints for all our class A,B,C methods, hitting one will produce a backtrace depth of 2, the caller(#1) and the callee(#0). Since the stack trace has the class name and method, we have sufficient info to create a sequence diagram, we just have to parse the gdb log file and extract the info.
Python is an amazing tool for file processing/parsing and the one we'll be using. We will use some regex magic and string commands to transform the gdb raw output into a text file similar to this:
$ cat -n mtd.txt
1 main -> A:A()
2 A -> B:B()
3 B -> C:C()
4 main -> A:run()
5 A -> B:stepOnce()
6 B -> C:beak()
7 B -> C:flap()
8 B -> C:shake()
9 B -> C:clap()
This string format, <object> -> <class>:<method>(), is compliant with Web Sequence Diagram, simply copy-n-pasting in the contents into the web-app will produce magic. More on that later, let's turn our head toward the necessary Python script.
$ cat -n mkMtd
1 #!/usr/bin/python
2 import re;
3 import sys;
4
5 # https://www.websequencediagrams.com/
6
7 def methodName(S):
8 retVal="";
9 m1=re.search(".+ (.+)::(.+)\((.+)\)",S);
10 if m1:
11 retVal="%s:%s()"%(str(m1.group(1).strip()),str(m1.group(2).strip()));
12 else:
13 m2=re.search(".+ in (.+)\(.*\) (.+)",S);
14 if m2:
15 cName=' '.join(m2.group(2).split(' ')[1:]).split('.')[0];
16 retVal="%s:%s"%(cName, str(m2.group(1)));
17 else:
18 m2=re.search(".+ (.+)\(.*\) at (.+)",S);
19 cName=m2.group(2).split(".")[0];
20 retVal="%s:%s"%(cName, str(m2.group(1)));
21 return retVal;
22
23 def parseDebugOutput(fileName):
24 with open(fileName, 'r') as fp:
25 C=fp.read();
26 lastLine=(None,None);
27 noDupCallMap=dict();
28 for line in C.split('\n'):
29 callerX=re.search("#0 .*",line);
30 if callerX:
31 m1=methodName(line);
32 calledX=re.search("#1 .*",line);
33 if calledX:
34 m2=methodName(line);
35 mtdLine="%s -> %s"%(m2.split(':')[0],m1);
36 print mtdLine;
37
38 inFile=sys.argv[1];
39 parseDebugOutput(inFile);
You run this delicious little bastard as follows:
$ ./mkMtd ./gdb.log
And it spits out Web Sequence Diagram compliant input commands;
Export the results into a PNG and you can include it in your design documentation;
It's worth noting that while this method have time-and-time again proven useful to me, it presents a specific challenge;
You're likely to use this on a sophisticated system, one with dozens of classes, hundreds of methods and setting a breakpoint in each of them is technically possible, your diagram will quickly become an eye-sore. The challenge is carving out the uninteresting methods from the breakpoints or the gdb log file and that process can be time-consuming. I'd argue, not as time-consuming as spending dozens of hours browsing source code, but it will take a time investment of trial-n-error. So, be prepared to spend some time on that.
I've used this technique in multi-process systems (capturing and displaying message entry/exit points), investigated in-memory DB accesses (during system initialization) and executed this capture/analysis on specific user scenarios. It's an incredibly useful technique, produces valuable information, but takes some fine-tuning to find the right balance in breakpoint/method captures.
Cheers.
No comments:
Post a Comment