Lab: Debugging with gdb

From Tekkotsu Wiki

Jump to: navigation, search

Contents

Introduction

Learning Objective: This lab will teach you to use gdb, the GNU Debugger, to debug your robot code. gdb is the standard debugging tool for Linux.

The simplest way to debug robot behaviors is to insert print statements to display the values of variables and monitor progress through the code. But sometimes this isn't enough. For example, if your program is crashing inside of some Tekkotsu or C++ library function due to an illegal memory access (in Linux this is called a segmentation violation signal, or SIGSEGV), you might not know where to put the print statement or what you should be printing to find the error. In this case, what you need to do is examine the execution stack to see what was going on at the time of the crash. The debugger allows you to do this.

A Test Program

The GdbDemo behavior shown below will be used to demonstrate how one can generate a runtime error and diagnose it using gdb.

#include "Behaviors/StateMachine.h"
#include <sstream>

$nodeclass GdbDemo : StateNode {

  $nodeclass PrintPrompt : StateNode : doStart {
    cout << "\n\nWelcome to the Tekkotsu diner." << endl;
    cout << "Please use the msg command to enter a number from 1-3." << endl;
  }

  $nodeclass ProcessInput : StateNode : doStart {
    string const &userinput = ((const TextMsgEvent*)event)->getText();
    istringstream in(userinput);
    int response = -1;
    in >> response;

    vector<string> menu(4);
    menu[1] = "appetizer";
    menu[2] = "entree";
    menu[3] = "dessert";

    cout << "\n\nNow serving the " << menu[response] << "." << endl;
  }

  $setupmachine{
    startnode: PrintPrompt =TM=> ProcessInput =N=> startnode
  }

}

REGISTER_BEHAVIOR(GdbDemo);

This behavior consists of two nodes. PrintPrompt prints a message asking the user to input a number between 1 and 3. They can do this using by typing a message on the Tekkotsu console (in the terminal window). They should type "msg", a space, and the number, then hit the Enter key. This generates a text message event which triggers the =TM=> transition to the next state node, called ProcessInput. This state node extracts the string from the text message event, reads an integer from the string, and uses that integer to access the vector called menu. The state machine loops using a =N=> transition so you can keep entering messages.

Notice that there is no validity checking on the input. The user can type anything at all, and the program will use that value to index into the array. This is asking for trouble.

Reading a Stack Trace

Try compiling and running the GdbDemo behavior shown above. On the Tekkotsu console, you'll see a prompt like "HAL:Create>". Type "msg 1" and hit the Enter key. Then try "msg 2". Notice that the program works correctly when given valid inputs.

Now try typing "msg barf" and hit the Enter key. The program crashes due to an illegal memory access (SIGSEGV), and prints a stack trace like this:

HAL:Create> Now serving the *** ERROR Main: Received SIGSEGV
  0  sim::handle_signal(int) +0x2d5 (local/tekkotsu/sim.cc:586)
  1  ?? (libc.so.6)
  2  std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char, std::char_traits<char>,
       std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&, std::basic_string<char,
       std::char_traits<char>, std::allocator<char> > const&) +0x3 (libstdc++.so.6)
  3  GdbDemo::ProcessInput::doStart() +0x1a6 (cs/usr/dst/Tekkotsu/project/GdbDemo.cc.fsm:22)
  4  BehaviorBase::start() +0x7b (libtekkotsu.so) (Behaviors/BehaviorBase.cc:55)
  5  StateNode::start() +0x2c2 (libtekkotsu.so) (Behaviors/StateNode.cc:92)
  6  Transition::doFire() +0x774 (libtekkotsu.so) (Behaviors/Transition.cc:104)
  7  Transition::fireDownCallStack() +0x32 (libtekkotsu.so) (Behaviors/Transition.cc:56)
  8  Transition::fire(EventBase const&) +0x418 (libtekkotsu.so) (Behaviors/Transition.cc:53)
  9  TextMsgTrans::doEvent() +0x1c5 (cs/usr/dst/Tekkotsu/Behaviors/Transitions/TextMsgTrans.h:65)
 10  BehaviorBase::processEvent(EventBase const&) +0x150 (libtekkotsu.so) (Behaviors/BehaviorBase.cc:87)
 11  EventRouter::processTimers() +0x38d (libtekkotsu.so) (Events/EventRouter.cc:96)
 12  TimerExecThread::poll() +0x3b (local/tekkotsu/TimerExecThread.cc:41)
 13  PollThread::run() +0xaab (libtekkotsu.so) (IPC/PollThread.cc:115)
 14  Thread::launch(void*) +0x2e1 (libtekkotsu.so) (IPC/Thread.cc:344)
 15  start_thread (libpthread.so.0)
 16  clone +0x6d (libc.so.6)

The topmost stack frame is frame #0, which is part of Tekkotsu's error handling machinery. Stack frames #1 and #2 are part of the C++ library. Proceeding further down the stack, we see that frame #3 is user code: it's GdbDemo::ProcessInput::doStart(), which is code we wrote. The end of that line indicates that we were at line 22 of GdbDemo.cc.fsm when the error occurred.

Knowing how to read a stack trace can save you a lot of time. You just start at the top of the stack and work your way down until you find a stack frame referencing user code. This is the point in your program where the error occurred. By examining the source code at that line, you might be able to figure out the problem. If not, then you will want to use the debugger, gdb, to probe further.

Running Tekkotsu Under Gdb

Running Tekkotsu under gdb is straightforward:

> cd ~/project
> gdb ./tekkotsu-CREATE
(gdb) run

There is only one trick you need to know: any arguments that you were passing to Tekkotsu on the command line must instead be passed using gdb's "run" command. For example, if you running Tekkotsu with Mirage like this:

> ./tekkotsu-CREATE -c mirage.plist

To run Tekkotsu with Mirage under gdb you would instead type:

> gdb ./tekkotsu-CREATE
(gdb) run -c mirage.plist

You will see a lot of messages about individual threads starting up; you can ignore these. Once Tekkotsu has started you will no longer be talking to gdb; you will see the usual Tekkotsu command prompt (e.g, "HAL:Create>") and you will be talking to the Tekkotsu console. If you don't see the prompt, hit the Enter key. You can use the ControllerGUI as you normally would.

Try running the GdbDemo behavior and giving it an invalid input. The result will look something like this:

HAL:Create> msg barf

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe294b700 (LWP 7289)]
0x00007ffff5395b23 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char,
std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&,
std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from
/usr/lib/libstdc++.so.6
(gdb) 

The segmentation fault interrupted the program's execution. Now you are back talking to gdb, as the "(gdb)" prompt indicates, and you can examine the stack, display variables, and do lots of other things as described below.

Examining the Stack in Gdb

The bt (backtrace) command in gdb displays a stack trace. Continuing our example from the previous section:

(gdb) bt
#0  0x00007ffff5395b23 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <char,
    std::char_traits<char>, std::allocator<char> >(std::basic_ostream<char, std::char_traits<char> >&,
    std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from
    /usr/lib/libstdc++.so.6
#1  0x00000000006283a6 in GdbDemo::ProcessInput::doStart (this=0xb2eaf0) at GdbDemo.cc.fsm:22
#2  0x00007ffff7561399 in BehaviorBase::start (this=0xb2eaf0) at Behaviors/BehaviorBase.cc:54
#3  0x00007ffff74b3a8c in StateNode::start (this=0xb2eaf0) at Behaviors/StateNode.cc:91
#4  0x00007ffff76ac9d4 in Transition::doFire (this=0xb2ec10) at Behaviors/Transition.cc:104
#5  0x00007ffff76ac248 in Transition::fireDownCallStack (this=0xb2ec10) at Behaviors/Transition.cc:57
#6  0x00007ffff76ac206 in Transition::fire (this=0xb2ec10, ev=...) at Behaviors/Transition.cc:52
#7  0x0000000000627f21 in TextMsgTrans::doEvent (this=0xb2ec10)
    at /afs/cs/usr/dst/Tekkotsu/Behaviors/Transitions/TextMsgTrans.h:65
#8  0x00007ffff756160e in BehaviorBase::processEvent (this=0xb2ec10, curEvent=...)
    at Behaviors/BehaviorBase.cc:86
#9  0x00007ffff7608cff in EventRouter::processTimers (this=0xa6b940) at Events/EventRouter.cc:96
#10 0x00000000005ca5f1 in TimerExecThread::poll (this=0x7fffec032800) at
    local/tekkotsu/TimerExecThread.cc:41
#11 0x00007ffff773ca55 in PollThread::run (this=0x7fffec032800) at IPC/PollThread.cc:115
#12 0x00007ffff74a379f in Thread::launch (msg=0x7fffec032800) at IPC/Thread.cc:344
#13 0x00007ffff5afd9ca in start_thread () from /lib/libpthread.so.0
#14 0x00007ffff4bc370d in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()
(gdb) 

Notice that gdb's stack trace isn't identical to the stack trace that gets displayed when the program crashes outside of gdb, but it's very similar. In this example, the user code in GdbDemo::ProcessInput::doStart is stack frame #1.

In order to examine the program state, we need to move to stack frame #1. gdb always starts out at the top of the stack, but we can use the "frame" command to move to any frame we like:

(gdb) frame 1
#1  0x00000000006283a6 in GdbDemo::ProcessInput::doStart (this=0xb2eaf0) at GdbDemo.cc.fsm:22
22	    cout << "\n\nNow serving the " << menu[response] << "." << endl;

Gdb shows us stack frame #1 and the corresponding line of the source file. Now we can examine local variables of the doStart method. One way to do this is with the "print" command:

(gdb) print response
$1 = -1

This shows us that the value of response is -1. The "$1" notation at the beginning is gdb's way of numbering its answers; these numbers can be referred to in subsequent commands, although we won't bother with that here.

Line 22 of our program contains the expression menu[response], which will generate an error when -1 is used as the subscript. But why does response have this value? Recall that it was initialized to -1 a few lines above. Then the code reads an integer from the string userinput into response. This operation will fail when the string does not contain a valid integer. So what's in userinput right now?

(gdb) print userinput
$2 = (const std::string &) @0x7fffe294a410: {static npos = 18446744073709551615,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>},
  <No data fields>}, _M_p = 0xb0ccf8 "barf"}}

The above mess is the awkward way that gdb displays C++ string objects. You can ignore the stuff in the middle and just focus on the beginning and end of the expression. The beginning tells us that userinput is of type const std::string&, and the end tells us that the value of the string is "barf". This is not an integer, so now we see why we failed to read a new integer into response, leaving the original value of -1 in place.

The print command can do more than display a variable; it can display the value of an expression. Some examples:

gdb) print menu.size()
$3 = 4

(gdb) print menu[1]
$4 = (const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &) @0xb0ccb8: {
 static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = 
 {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
 _M_p = 0xb14258 "appetizer"}}

(gdb) print event
$4 = (const class EventBase *) 0xb2ef50

(gdb) print event->getGeneratorID()
$5 = EventBase::textmsgEGID

The "info" command has lots of options. The two we will focus on here are "info args" and "info local", both of which refer to the current stack frame. "info args" prints the arguments to the current function or method. Since doStart() doesn't take any arguments, there is only an implicit this argument provided to every method:

gdb) info args
this = 0xb2eaf0

The "info locals" command prints the local variables in existence at the current stack frame:

(gdb) info locals
userinput = @0x7fffe294a410
in = <incomplete type>
response = -1
menu = {<std::_Vector_base<std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
  std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
   _M_impl = {<std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >> =
   {<__gnu_cxx::new_allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >> =
   {<No data fields>}, <No data fields>}, _M_start = 0xb0ccb0, _M_finish = 0xb0ccd0,
        _M_end_of_storage = 0xb0ccd0}}, <No data fields>}

The "list" command can be used to examine the source code. Typing "list" displays several lines surrounding the line that corresponds to the current stack frame:

(gdb) list
17	    vector<string> menu(4);
18	    menu[1] = "appetizer";
19	    menu[2] = "entree";
20	    menu[3] = "dessert";
21	
22	    cout << "\n\nNow serving the " << menu[response] << "." << endl;
23	  }
24	
25	  $setupmachine{
26	    startnode: PrintPrompt =TM=> ProcessInput =N=> startnode

If you keep typing "list", the listing will advance a few more lines through the file. To move backward you can type "list -".

To exit gdb, type "quit". Note that if your program is running and you type control-C, rather than quitting Tekkotsu you will end up back in gdb. Use the "quit" command to exit.

Because the above commands are very frequently used, they all have one letter abbreviations.

Exercises:

  1. The "help" command displays information about gdb commands. How do you list the source code for a specific function, such as doStart, by name? Try "help list" to find out.
  2. For each letter, write down the gdb command that letter stands for: r, f, p, i, l, h, q.

Setting A Breakpoint

A breakpoint is a place in your code where execution should pause with control returned to gdb, allowing you to examine the stack and print variable values. You can then resume execution if you like, or step through the program line by line.

The "break" command is used to set a breakpoint. The easiest way to set a breakpoint is to give the source file and line number where the breakpoint should occur. Here is an example of setting a breakpoint before starting Tekkotsu:

> gdb tekkotsu-CREATE
(gdb) break GdbDemo.cc.fsm:12
Breakpoint 1 at 0x62836d: file GdbDemo.cc.fsm, line 12.
(gdb) run

Now we can use the ControllerGUI to start the GdbDemo behavior, and the "msg" command to send it an input. When we reach line 12 of the file, we hit the breakpoint and gdb takes over:

HAL:Create> msg barf

Breakpoint 1, GdbDemo::ProcessInput::doStart (this=0xb241d0) at GdbDemo.cc.fsm:12
12	    string const &userinput = ((const TextMsgEvent*)event)->getText();

(gdb) list
7	    cout << "\n\nWelcome to the Tekkotsu diner." << endl;
8	    cout << "Please use the msg command to enter a number from 1-3." << endl;
9	  }
10	
11	  $nodeclass ProcessInput : StateNode : doStart {
12	    string const &userinput = ((const TextMsgEvent*)event)->getText();
13	    istringstream in(userinput);
14	    int response = -1;
15	    in >> response;
16	

Execution is paused just before line 12, so the code on that line has not yet taken effect. Thus, the variable userinput does not yet have a valid value:

(gdb) p userinput
$1 = (const std::string &) @0xffffffff: <error reading variable>

We can use the "next" command to execute the current line and move to the next line of the program:

(gdb) next
13	    istringstream in(userinput);

Now that line 12 has executed, userinput has a valid value:

(gdb) p userinput
$2 = (const std::string &) @0x7fffe087b410: {static npos = 18446744073709551615, 
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>},
   <No data fields>}, _M_p = 0xb21678 "barf"}}

If we give a numeric argument N to the "next" command, it tells it to execute the next N lines:

(gdb) next 3
17	    vector<string> menu(4);

(gdb) p response
$4 = -1

We can use the "set" command to change the value of a variable:

(gdb) set response=2

Then we can use the "continue" command to continue from the breakpoint:

(gdb) continue
Continuing.

Now serving the entree.

Welcome to the Tekkotsu diner.
Please use the msg command to enter a number from 1-3.

The "info" command can be used to display breakpoints, and the "clear" command can be used to remove them. Since the program is running, we will need to interrupt it by typing control-C to return to gdb:

HAL:Create> ^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff7fd4720 (LWP 7775)]
0x00007ffff5aff03d in pthread_join () from /lib/libpthread.so.0
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000628215 in GdbDemo::ProcessInput::doStart() at GdbDemo.cc.fsm:12
	breakpoint already hit 1 time

(gdb) clear 12
Deleted breakpoint 1 

(gdb) continue

To Learn More

The "help" command will display a list of help topics you can explore. Try the following:

help stack
help running

There are many gdb tutorials available on the web. A google search on "gdb tutorial" will find them.