C, C++, Linux

Valgrind on ARM

paź 26, 2021 #c, #call, #leak, #massif, #memory, #profiling, #stack, #trace, #valgrind autorstwa Rafał

If you have a problem with memory bloating or leaks, the tool of the first choice is Valgrind. Briefly, it wraps some standard library calls with their implementation (malloc, new, etc.) and tracks all memory allocation within your code. By default, Valgrind is checking whether all memory allocated in the program is freed at the end. A more sophisticated tool is massif – it collects all allocations to profile memory consumption over time. After the program run, you may check it with ms_print or massif-visualizer GUI.

During my last job, I’ve noticed, that my application’s memory is constantly increasing. I’ve run Valgrind massif to check why. Unfortunately, I couldn’t find the reason, since all stacks were empty. All I could see was the amount of memory consumed without stack traces. Listing below shows the ms_print output

    KB
556.3^                                                                       #
     |                                                                    @@@#
     |                                                                @@@@@@@#
     |                                                           @@@@@@@@@@@@#
     |                                                       @@@@@@@@@@@@@@@@#
     |                                                    @@@@@@@@@@@@@@@@@@@#
     |                                                 @@:@@@@@@@@@@@@@@@@@@@#
     |                                             :@::@ :@@@@@@@@@@@@@@@@@@@#
     |                                       @@:::::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                                   @@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                              @@@@@@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                           @@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                         ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                  @  ::::::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                ::@::::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |             :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |     @@@@:::::::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |  :::@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     | :: :@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     | :: :@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   1.908

Number of snapshots: 88
 Detailed snapshots: [4, 5, 6, 15, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 40, 43, 45, 46, 47, 48, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87 (peak)]

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  0              0                0                0             0            0
  1     46,537,185           69,888           60,082         9,806            0
  2     80,589,950           98,024           85,567        12,457            0
  3    122,171,391           89,984           74,107        15,877            0
  4    167,353,809          119,272           99,911        19,361            0
83.77% (99,911B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  5    199,890,853          127,048          105,155        21,893            0
82.77% (105,155B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  6    221,582,101          132,288          108,651        23,637            0
82.13% (108,651B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

As you can see, there are no stacks collected. I spent two days investigating what’s going on. I will not write about the whole process, but I have eliminated the most obvious tracks like debug symbols, compiler options, etc. The last thing was reading the docs. I found these options:


       --unw-stack-scan-thresh=<number> [default: 0] ,
       --unw-stack-scan-frames=<number> [default: 5]
           Stack-scanning support is available only on ARM targets.

           These flags enable and control stack unwinding by stack
           scanning. When the normal stack unwinding mechanisms -- usage
           of Dwarf CFI records, and frame-pointer following -- fail,
           stack scanning may be able to recover a stack trace.

           Note that stack scanning is an imprecise, heuristic mechanism
           that may give very misleading results, or none at all. It
           should be used only in emergencies, when normal unwinding
           fails, and it is important to nevertheless have stack traces.

           Stack scanning is a simple technique: the unwinder reads
           words from the stack, and tries to guess which of them might
           be return addresses, by checking to see if they point just
           after ARM or Thumb call instructions. If so, the word is
           added to the backtrace.

           The main danger occurs when a function call returns, leaving
           its return address exposed, and a new function is called, but
           the new function does not overwrite the old address. The
           result of this is that the backtrace may contain entries for
           functions which have already returned, and so be very
           confusing.

           A second limitation of this implementation is that it will
           scan only the page (4KB, normally) containing the starting
           stack pointer. If the stack frames are large, this may result
           in only a few (or not even any) being present in the trace.
           Also, if you are unlucky and have an initial stack pointer
           near the end of its containing page, the scan may miss all
           interesting frames.

           By default stack scanning is disabled. The normal use case is
           to ask for it when a stack trace would otherwise be very
           short. So, to enable it, use --unw-stack-scan-thresh=number.
           This requests Valgrind to try using stack scanning to
           "extend" stack traces which contain fewer than number frames.

           If stack scanning does take place, it will only generate at
           most the number of frames specified by
           --unw-stack-scan-frames. Typically, stack scanning generates
           so many garbage entries that this value is set to a low value
           (5) by default. In no case will a stack trace larger than the

I’ve set --unw-stack-scan-thresh to 5, because all stacks had 0 or 1 element. --unw-stack-scan-frames to 30 – I don’t use recursion, so this value is much above needs. After setting it up the stacks appeared in ms_print output. Apparently, normal stack resolving failed – why? Maybe it’s because old gcc – 4.9.2 version. Another possible reason is that the standard stack parser works only on x86 architecture.

I didn’t spend much time finding out why this option was needed. If you know why, please share your thoughts in a comment.

C++

Overload operators wisely – proxy class pattern

paź 15, 2021 #c, #class, #operator, #overloading, #pattern, #proxy autorstwa Rafał

One of the key features of C++ is operator overloading. Beginners might find it difficult since the syntax is not intuitive. In this article, I will show you how to omit typical problems in this case.

In my last project, I had to operate on large XML files – read and write some values under specific XPath locations. during development, I have used libxml2 library. It has a really complicated C-based interface, so I had to create a convenience wrapper class.

I used C++ so I decided to use the index operator to fetch values under a specific key. The first interface looked like the following:

class XmlWrapper
{
public :
   //...
   std::string operator[](const std::string &key) const; //getter
   std::string& operator[](const std::string &key); //setter
   //...
};

What is wrong with this approach? At the first sight, it might look ok, but if you start implementation of setter, you will find out, that changing the content of the in-memory libxml2 tree is impossible. The upper approach would be ok if we would change the class attribute of type std::stringdsdf. Otherwise, this approach fails.

What is the possible solution to this approach? I used a proxy class design pattern. Before I will show you the code I will write down, what were the goals of the solution.

Access in-memory XML tree with setter to modify its content
Assign results of getter bracket operator to the std::string variable without intermediate steps.
Like point 2, but the other way assign to setter type std::string.

To implement goal 1, we must have some connection with XML-tree. String class obviously doesn’t. So the returned proxy class has the reference to XmlWrapper holding libxml2 instrumentation. To implement goal 2 we will implement cast operator from proxy class to std::string. To implement goal 3, we will do a similar thing – overload operator = with std::string parameter. The solution looks like this:

class ProxyXml; //forward declaration
class XmlWrapper
{
public :
   //...
   const ProxyXml operator[](const std::string &key) const; //getter
   ProxyXml operator[](const std::string &key); //setter
   //...
};
class ProxyXml
{
   //...
   operator std::string() const;
   ProxyXml& operator=(const std::string &s);
protected :
   XmlWrapper &xml_;
};

If we want to access XML content for read, the code creates a temporary ProxyXml object, and convert it to std::string class. If we want to modify XML content we create ProxyXml and use its XmlWrapper reference inside operator=. Sample usage looks like this:

int main(int argc, const char *argv[])
{
   XmlWrapper xml("path/to/file.xml");
   std::string somevalue = xml["keytosomevalue"]; //getter
   somevalue.append("appendedtext");
   xml["keytosomevalue"] = somevalue; // setter with updated content
   return 0;
}

Miesiąc: październik 2021

Valgrind on ARM

Overload operators wisely – proxy class pattern