Valgrind on ARM

paź 26, 2021 C, C++, Linux

Valgrind on ARM

If you have a problem with memory bloating or leaks, the tool of the first choice is Valgrind. Briefly, it wraps some standard library calls with their implementation (malloc, new, etc.) and tracks all memory allocation within your code. By default, Valgrind is checking whether all memory allocated in the program is freed at the end. A more sophisticated tool is massif – it collects all allocations to profile memory consumption over time. After the program run, you may check it with ms_print or massif-visualizer GUI.

During my last job, I’ve noticed, that my application’s memory is constantly increasing. I’ve run Valgrind massif to check why. Unfortunately, I couldn’t find the reason, since all stacks were empty. All I could see was the amount of memory consumed without stack traces. Listing below shows the ms_print output

    KB
556.3^                                                                       #
     |                                                                    @@@#
     |                                                                @@@@@@@#
     |                                                           @@@@@@@@@@@@#
     |                                                       @@@@@@@@@@@@@@@@#
     |                                                    @@@@@@@@@@@@@@@@@@@#
     |                                                 @@:@@@@@@@@@@@@@@@@@@@#
     |                                             :@::@ :@@@@@@@@@@@@@@@@@@@#
     |                                       @@:::::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                                   @@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                              @@@@@@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                           @@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                         ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                  @  ::::::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |                ::@::::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |             :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |     @@@@:::::::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     |  :::@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     | :: :@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
     | :: :@ @ ::: :::::@: ::: ::@@@@@@ @@@@@@@: :::@: @ :@@@@@@@@@@@@@@@@@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   1.908

Number of snapshots: 88
 Detailed snapshots: [4, 5, 6, 15, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 40, 43, 45, 46, 47, 48, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87 (peak)]

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  0              0                0                0             0            0
  1     46,537,185           69,888           60,082         9,806            0
  2     80,589,950           98,024           85,567        12,457            0
  3    122,171,391           89,984           74,107        15,877            0
  4    167,353,809          119,272           99,911        19,361            0
83.77% (99,911B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  5    199,890,853          127,048          105,155        21,893            0
82.77% (105,155B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  6    221,582,101          132,288          108,651        23,637            0
82.13% (108,651B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

As you can see, there are no stacks collected. I spent two days investigating what’s going on. I will not write about the whole process, but I have eliminated the most obvious tracks like debug symbols, compiler options, etc. The last thing was reading the docs. I found these options:


       --unw-stack-scan-thresh=<number> [default: 0] ,
       --unw-stack-scan-frames=<number> [default: 5]
           Stack-scanning support is available only on ARM targets.

           These flags enable and control stack unwinding by stack
           scanning. When the normal stack unwinding mechanisms -- usage
           of Dwarf CFI records, and frame-pointer following -- fail,
           stack scanning may be able to recover a stack trace.

           Note that stack scanning is an imprecise, heuristic mechanism
           that may give very misleading results, or none at all. It
           should be used only in emergencies, when normal unwinding
           fails, and it is important to nevertheless have stack traces.

           Stack scanning is a simple technique: the unwinder reads
           words from the stack, and tries to guess which of them might
           be return addresses, by checking to see if they point just
           after ARM or Thumb call instructions. If so, the word is
           added to the backtrace.

           The main danger occurs when a function call returns, leaving
           its return address exposed, and a new function is called, but
           the new function does not overwrite the old address. The
           result of this is that the backtrace may contain entries for
           functions which have already returned, and so be very
           confusing.

           A second limitation of this implementation is that it will
           scan only the page (4KB, normally) containing the starting
           stack pointer. If the stack frames are large, this may result
           in only a few (or not even any) being present in the trace.
           Also, if you are unlucky and have an initial stack pointer
           near the end of its containing page, the scan may miss all
           interesting frames.

           By default stack scanning is disabled. The normal use case is
           to ask for it when a stack trace would otherwise be very
           short. So, to enable it, use --unw-stack-scan-thresh=number.
           This requests Valgrind to try using stack scanning to
           "extend" stack traces which contain fewer than number frames.

           If stack scanning does take place, it will only generate at
           most the number of frames specified by
           --unw-stack-scan-frames. Typically, stack scanning generates
           so many garbage entries that this value is set to a low value
           (5) by default. In no case will a stack trace larger than the

I’ve set --unw-stack-scan-thresh to 5, because all stacks had 0 or 1 element. --unw-stack-scan-frames to 30 – I don’t use recursion, so this value is much above needs. After setting it up the stacks appeared in ms_print output. Apparently, normal stack resolving failed – why? Maybe it’s because old gcc – 4.9.2 version. Another possible reason is that the standard stack parser works only on x86 architecture.

I didn’t spend much time finding out why this option was needed. If you know why, please share your thoughts in a comment.

Rafał

PrzezRafał

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *