Formatting hexdump

gru 17, 2020 #ascii, #binary, #hex, #hexdump, #linux, #shell autorstwa Rafał

Today I found out how useful hexdump (with custom formatting) utility is. It took me some time to figure out how it works. Most of the tutorials and man page are obvious when you know how to do it :). I will try to describe how to use it in a really clear way.

Single format

Custom formatting is activated with -e or --format option. Right after that, you must fill the so-called format string. The pattern looks like this:

-e 'N/M "XXXXXXX"'

N – iteration count. It says how many subsequent times this pattern (XXX) is used. By default, its value is 1. This option becomes more clear in complicated patterns.
M – byte count. It says how many bytes “consumes” this pattern in a single iteration. The default value depends on the next part (XXXXX – format).
XXXXXXX – format. This is the string telling how M bytes are displayed in a single iteration. The manual page says that it’s fprintf-based formatting with some exceptions.

In other words, the pattern XXXXX will format every M bytes N times. Take a look at this simple example:

$ hexdump -e '1/4 " %10d\n"' file.bin
         1
         0
        93
   3279800
    131072
  65536000
   6094848

'1/4 " %10d\n"' means, in other words, take one time 4 bytes and format it like a signed integer, with places for ten digits. Put a new line after each iteration. This pattern is used till the end of the file. During your tests, you may see * signs in output. It means that the last pattern occurs many subsequent times. Like in this example:

$ hexdump -e '1/4 "%10d\n"' /dev/zero
         0
*

To avoid this effect, you can turn off squeezing with -v or --no-squeezing option. In this example you may not understand, what is the meaning of iteration count. It will be clear after I show you the next one.

Multiple formats

More advanced formatting allows putting several patterns. It’s nice when you are working on custom binary file formats. To use this just put the several format strings one after another, like in this example:

$ hexdump -e '2/4 " %10u" 2/2 " %04X" 1/0 "\n"' 20201217_095026.bin.dat
         1         0 8000 0000
         2         0 8000 0002
         3         0 8000 0002
         4         0 8000 0002
         5         0 0039 0002
         6     25000 0039 0003
         7    282000 0039 0001
         8    320000 0039 0000
         9    377000 003A 0000
        10   4619000 003A 0004
        11  10463000 003B 0004
        12  12380000 003B 0000
        13  20092000 003B 0004
        14  20503000 003C 0004

This example is more complicated. Let me first tell you about the file format used. It is called COMTRADE, which is used in the power supply industry. Each row of such binary file contains:

| 32-bit counter | 32-bit relative timestamp in ns | reapeated 16-bit values |. The number of values per row is saved in a configuration file. In this example, there are two 16-bit values in each row.

To format this thing in a nice, readable way we tell to hexdump:

2/4 " %10u" – take 4 bytes and format them with %10u directive (as in the previous example, but with unsigned decimal). Do it twice! So we can format counter and relative timestamp in one directive.
2/2 " %04X" – then, take 2 bytes and format them as hexadecimal numbers, make room for 4 digits and write down leading zeroes (%04X). As previously, do it two times, because we have two 16-bit values for each row.
1/0 "\n" – this one is new for you. You may notice, that we had no newline character in previous format strings. It would break our nice row-oriented layout. This directive should make clear for you what’s the difference between iteration count and characters to be consumed by hexdump format. It means – after you format point 1 and 2, put 1 new-line character 1-time, which consumes 0 characters from the input.

Special conversion strings

Last but not least. There is an option to show some extra information. For example current file offset. Let’s see how to do it:

$ hexdump -e '1/0 "[%04_ax]" 2/4 " %10u" 2/2 " %04X" 1/0 "\n"' 20201217_095026.bin.dat
[0000]          1          0 8000 0000
[000c]          2          0 8000 0002
[0018]          3          0 8000 0002
[0024]          4          0 8000 0002
[0030]          5          0 0039 0002
[003c]          6      25000 0039 0003
[0048]          7     282000 0039 0001
[0054]          8     320000 0039 0000
[0060]          9     377000 003A 0000
[006c]         10    4619000 003A 0004
[0078]         11   10463000 003B 0004
[0084]         12   12380000 003B 0000
[0090]         13   20092000 003B 0004
[009c]         14   20503000 003C 0004

The difference between this and the previous example is 1/0 "[%04_ax]". It contains _a input offset conversion string. x means hexadecimal formatting, with small letters. You can change it to decimal or octal form. Like newline character, it is put one time and consumes no characters (0).

There are few more types of conversion strings, like converting bytes to readable strings. For example, let’s say, we want to display 16-bit values as pairs of printable characters. It actually has no sense, but let’s say it has :). To do this, we will use _p conversion string, which stands for a character from the default set, with non-printing characters displyed as a dot:

$ hexdump -e '1/0 "[%04_ax]" 2/4 " %10u" 4/1 " %_p" 1/0 "\n"' 20201217_095026.bin.dat
[0000]          1          0 . . . .
[000c]          2          0 . . . .
[0018]          3          0 . . . .
[0024]          4          0 . . . .
[0030]          5          0 9 . . .
[003c]          6      25000 9 . . .
[0048]          7     282000 9 . . .
[0054]          8     320000 9 . . .
[0060]          9     377000 : . . .
[006c]         10    4619000 : . . .
[0078]         11   10463000 ; . . .
[0084]         12   12380000 ; . . .
[0090]         13   20092000 ; . . .
[009c]         14   20503000 < . . .

This example differs from the previous one in one thing. 2/2 " %04X" changed to 4/1 " %_p". This conversion string says take one byte, four times, and each byte show as a character from the default set. Non-printable characters are displayed as dots. The output is based on the same file, so you can compare how these numbers are transformed into ASCII characters.

Hexdump also has an option to color output strings. I don’t think it’s really useful. So I will not write about it. To use it just add postfix _L with color specifiers between square brackets.

Summary

I felt disappointed at some points when I worked on this post. I haven’t found an option to parse the same bytes twice – like in canonical form (there is hex output appended with ASCII strings on each line). Maybe there is an option, but I failed in searching for it.

Another thing is, that hexdump has no option to change the endianness of input data. If you parse 2, 4, or 8-byte values, you must base them on platform endianness.

The next lacked option is the support of dynamic file formats. For example parsing M bytes, with M taken from the file content. I know we have XXI century, and we have python to do such things. But hexdump is a utility, which can be found on every Linux distribution, especially on embedded systems. It would be a nice option to avoid python runtime installation.

Nevertheless, hexdump is a nice tool to work on static, binary formats. I hope this article helped you understand how to use it with custom formatting.

olejniczak.engineer

Who uses this port?

Formatting hexdump

Single format

Multiple formats

Special conversion strings

Summary

Tag: #shell

Who uses this port?

Formatting hexdump

Single format

Multiple formats

Special conversion strings

Summary