stdatomic.h under the hood #1

stdatomic.h under the hood #1

My favorite C11 feature is the support of atomic operations. It is useful, especially on embedded systems, where using mutexes is very often overkill. At this point, in many circumstances, we can put away platform-dependent, error-prone, and not-effective synchronization API and easily use variables from different contexts. Of course, not every operation is possible. But if something is not possible, it is probably caused by poor software design. Let’s see how it works on typical architectures.

I will run 2 threads using pthread native library and show how assembly is build to make presented operations atomic. The example is typical, placed in every book about multithread-programming.

#include <stdatomic.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>

int32_t x = 0;

void * Thread(void * arg);

int main(void)
{
	pthread_t t;
	pthread_create(&t, NULL, &Thread, (void*)1);
	Thread((void*)0);
	pthread_join(t, NULL);
	printf("%d\n", x);
	return 0;
}

void * Thread(void * arg)
{
	for (int i = 0; i < 1000000; ++i)
	{
		if (arg)
			x++;
		else
			x--;
	}

	return NULL;
}

I think the code is self-explanatory. The only thing is the dirty passing of boolean value by the pointer. I know it’s not elegant, but it takes less coding. The output expected by synchronization-ignorant is 0. But of course, it is not:

$ gcc non-synchronized.c -lpthread -o non-synchronized
$ ./non-synchronized 
193678
$ ./non-synchronized 
-578251
$ ./non-synchronized 
-509428
$ ./non-synchronized 
-571838
$ ./non-synchronized 
163562

Let’s see how stdatomic handle this situation on my native compiler. I’ve made similar code with the only difference of using stdatomic library and type supported by it:

$ diff non-synchronized.c synchronized.c 
3c3
< #include <stdint.h>
---
> #include <stdatomic.h>
6c6
< int32_t x = 0;
---
> atomic_int_least32_t x = 0;
$ gcc synchronized.c -lpthread -o synchronized
$ ./synchronized 
0
$ ./synchronized 
0
$ ./synchronized 
0
$ ./synchronized 
0
$ ./synchronized 
0

Now our results are surprisingly correct – with no locks, mutexes, etc. Let’s see what kind of magic is done behind it.

x86-64

The test in this part of the article series was performed on the native architecture of my laptop (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0). It produces the following assembly output for our program.

$ gcc synchronized.c -lpthread -S -o synchronized.s
$ cat synchronized.s
	.file	"synchronized.c"
	.text
	.globl	x
	.bss
	.align 4
	.type	x, @object
	.size	x, 4
x:
	.zero	4
	.section	.rodata
...
Thread:
.LFB6:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$48, %rsp
	movq	%rdi, -40(%rbp)
	movq	%fs:40, %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movl	$0, -12(%rbp)
	jmp	.L5
.L8:
	cmpq	$0, -40(%rbp)
	je	.L6
	movl	$1, -28(%rbp)
	movl	-28(%rbp), %eax
	lock xaddl	%eax, x(%rip)
	movl	%eax, -24(%rbp)
	jmp	.L7
.L6:
	movl	$1, -20(%rbp)
	movl	-20(%rbp), %eax
	negl	%eax
	lock xaddl	%eax, x(%rip)
	movl	%eax, -16(%rbp)
.L7:
	addl	$1, -12(%rbp)
.L5:
	cmpl	$999999, -12(%rbp)
	jle	.L8
	movl	$0, %eax
	movq	-8(%rbp), %rdx
	xorq	%fs:40, %rdx
	je	.L10
	call	__stack_chk_fail@PLT
.L10:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
...

The most interesting is the implementation of Thread function. So I pasted only this one. It has a few parts:

  • .LFB6: – variables initialization, not interesting
  • .L5 and.L7 – this is our for loop. L7is ivariable incrementation.
  • .L8 and .L6 – is ifstatement and its content. .L8 after je instruction implements if (arg) branch. L6 implements else branch.

The most crucial are these two assembly snippets:

	movl	$1, -28(%rbp)         # move 32-bit value "1" to memory[%rbp-28]
	movl	-28(%rbp), %eax       # load this memory to %eax register
	lock xaddl	%eax, x(%rip) # this instruction does the job (fetch-and-add with lock prefix)
	movl	$1, -20(%rbp)         # as above memory[%rbp-20]
	movl	-20(%rbp), %eax       # as above
	negl	%eax                  # make negation of %eax to decrement variable i
	lock xaddl	%eax, x(%rip) # as above

So, the usage of stdatomic library forced the compiler to utilize lock xaddl instruction, which allows atomic incrementation. lock is x86 feature, that blocks cache line for whole instruction. Keep in mind, that this, like any kind of synchronization, has an impact on performance. Let’s see how code is built without stdatomic.h library. I will paste only the essential part:

.L8:
	cmpq	$0, -24(%rbp)
	je	.L6
	movl	x(%rip), %eax      # load
	addl	$1, %eax           # add
	movl	%eax, x(%rip)      # store
	jmp	.L7
.L6:
	movl	x(%rip), %eax      # load
	subl	$1, %eax           # subtract
	movl	%eax, x(%rip)      # store

As you can see. The code similar, but there is no lock xaddl instruction. Incrementation is done in three standard steps (load movl, increment addl, and store movl). Which are not atomic nor synchronized.

In the next chapter, I will focus on stdatomic with ARM architecture.

Formatting hexdump

Today I found out how useful hexdump (with custom formatting) utility is. It took me some time to figure out how it works. Most of the tutorials and man page are obvious when you know how to do it :). I will try to describe how to use it in a really clear way.

Single format

Custom formatting is activated with -e or --format option. Right after that, you must fill the so-called format string. The pattern looks like this:

-e 'N/M "XXXXXXX"'
  • N – iteration count. It says how many subsequent times this pattern (XXX) is used. By default, its value is 1. This option becomes more clear in complicated patterns.
  • M – byte count. It says how many bytes “consumes” this pattern in a single iteration. The default value depends on the next part (XXXXX – format).
  • XXXXXXX – format. This is the string telling how M bytes are displayed in a single iteration. The manual page says that it’s fprintf-based formatting with some exceptions.

In other words, the pattern XXXXX will format every M bytes N times. Take a look at this simple example:

$ hexdump -e '1/4 " %10d\n"' file.bin
         1
         0
        93
   3279800
    131072
  65536000
   6094848

'1/4 " %10d\n"' means, in other words, take one time 4 bytes and format it like a signed integer, with places for ten digits. Put a new line after each iteration. This pattern is used till the end of the file. During your tests, you may see * signs in output. It means that the last pattern occurs many subsequent times. Like in this example:

$ hexdump -e '1/4 "%10d\n"' /dev/zero
         0
*

To avoid this effect, you can turn off squeezing with -v or --no-squeezing option. In this example you may not understand, what is the meaning of iteration count. It will be clear after I show you the next one.

Multiple formats

More advanced formatting allows putting several patterns. It’s nice when you are working on custom binary file formats. To use this just put the several format strings one after another, like in this example:

$ hexdump -e '2/4 " %10u" 2/2 " %04X" 1/0 "\n"' 20201217_095026.bin.dat
         1         0 8000 0000
         2         0 8000 0002
         3         0 8000 0002
         4         0 8000 0002
         5         0 0039 0002
         6     25000 0039 0003
         7    282000 0039 0001
         8    320000 0039 0000
         9    377000 003A 0000
        10   4619000 003A 0004
        11  10463000 003B 0004
        12  12380000 003B 0000
        13  20092000 003B 0004
        14  20503000 003C 0004

This example is more complicated. Let me first tell you about the file format used. It is called COMTRADE, which is used in the power supply industry. Each row of such binary file contains:

| 32-bit counter | 32-bit relative timestamp in ns | reapeated 16-bit values |. The number of values per row is saved in a configuration file. In this example, there are two 16-bit values in each row.

To format this thing in a nice, readable way we tell to hexdump:

  1. 2/4 " %10u" – take 4 bytes and format them with %10u directive (as in the previous example, but with unsigned decimal). Do it twice! So we can format counter and relative timestamp in one directive.
  2. 2/2 " %04X" – then, take 2 bytes and format them as hexadecimal numbers, make room for 4 digits and write down leading zeroes (%04X). As previously, do it two times, because we have two 16-bit values for each row.
  3. 1/0 "\n" – this one is new for you. You may notice, that we had no newline character in previous format strings. It would break our nice row-oriented layout. This directive should make clear for you what’s the difference between iteration count and characters to be consumed by hexdump format. It means – after you format point 1 and 2, put 1 new-line character 1-time, which consumes 0 characters from the input.
Special conversion strings

Last but not least. There is an option to show some extra information. For example current file offset. Let’s see how to do it:

$ hexdump -e '1/0 "[%04_ax]" 2/4 " %10u" 2/2 " %04X" 1/0 "\n"' 20201217_095026.bin.dat
[0000]          1          0 8000 0000
[000c]          2          0 8000 0002
[0018]          3          0 8000 0002
[0024]          4          0 8000 0002
[0030]          5          0 0039 0002
[003c]          6      25000 0039 0003
[0048]          7     282000 0039 0001
[0054]          8     320000 0039 0000
[0060]          9     377000 003A 0000
[006c]         10    4619000 003A 0004
[0078]         11   10463000 003B 0004
[0084]         12   12380000 003B 0000
[0090]         13   20092000 003B 0004
[009c]         14   20503000 003C 0004

The difference between this and the previous example is 1/0 "[%04_ax]". It contains _a input offset conversion string. x means hexadecimal formatting, with small letters. You can change it to decimal or octal form. Like newline character, it is put one time and consumes no characters (0).

There are few more types of conversion strings, like converting bytes to readable strings. For example, let’s say, we want to display 16-bit values as pairs of printable characters. It actually has no sense, but let’s say it has :). To do this, we will use _p conversion string, which stands for a character from the default set, with non-printing characters displyed as a dot:

$ hexdump -e '1/0 "[%04_ax]" 2/4 " %10u" 4/1 " %_p" 1/0 "\n"' 20201217_095026.bin.dat
[0000]          1          0 . . . .
[000c]          2          0 . . . .
[0018]          3          0 . . . .
[0024]          4          0 . . . .
[0030]          5          0 9 . . .
[003c]          6      25000 9 . . .
[0048]          7     282000 9 . . .
[0054]          8     320000 9 . . .
[0060]          9     377000 : . . .
[006c]         10    4619000 : . . .
[0078]         11   10463000 ; . . .
[0084]         12   12380000 ; . . .
[0090]         13   20092000 ; . . .
[009c]         14   20503000 < . . .

This example differs from the previous one in one thing. 2/2 " %04X" changed to 4/1 " %_p". This conversion string says take one byte, four times, and each byte show as a character from the default set. Non-printable characters are displayed as dots. The output is based on the same file, so you can compare how these numbers are transformed into ASCII characters.

Hexdump also has an option to color output strings. I don’t think it’s really useful. So I will not write about it. To use it just add postfix _L with color specifiers between square brackets.

Summary

I felt disappointed at some points when I worked on this post. I haven’t found an option to parse the same bytes twice – like in canonical form (there is hex output appended with ASCII strings on each line). Maybe there is an option, but I failed in searching for it.

Another thing is, that hexdump has no option to change the endianness of input data. If you parse 2, 4, or 8-byte values, you must base them on platform endianness.

The next lacked option is the support of dynamic file formats. For example parsing M bytes, with M taken from the file content. I know we have XXI century, and we have python to do such things. But hexdump is a utility, which can be found on every Linux distribution, especially on embedded systems. It would be a nice option to avoid python runtime installation.

Nevertheless, hexdump is a nice tool to work on static, binary formats. I hope this article helped you understand how to use it with custom formatting.

Favourite strace options

I will try to avoid writing next article about Linux developer/admin swiss army knife, which is obviously strace. Instead, I would like to focus on most useful options:

Follow forks -f

If you try to debug your multithreading application, don’t forget about -f option. After that, strace will track all new threads and child processes.

This is really useful for checking what’s going on, after any sort of deamon, like sshd or getty, accept new client.

For example, I want to catch any file operations made by new, logged on user on SSH:

# pgrep sshd # (1) returns sshd PID -> 256
256
# strace -p 256 -e trace=file -f
 ... a lot of spam (checking user authentity in files like /etc/passwd...
[pid  9883] open("/root/.ash_history", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0600) = 3
[pid  9883] stat64("/bin/cat", {st_mode=S_IFREG|S_ISUID|0755, st_size=655180, ...}) = 0
Process 9894 attached
[pid  9894] execve("/bin/cat", ["cat", "/etc/network/interfaces"], [/* 20 vars */]) = 0
...ld and dynamic linking stuff...
[pid  9894] open("/etc/network/interfaces", O_RDONLY|O_LARGEFILE) = 3
[pid  9894] +++ exited with 0 +++
[pid  9883] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=9894, si_uid=0, si_status=0, si_utime=1, si_stime=3} ---
# # (4)
  1. If you don’t have pgrep, just use top or htop. pgrep might return several PIDs, which are main sshd process and it’s children (existing sessions). Probably it will be first one.
  2. -e trace=fileis my next favourite option. It tracks all syscalls related to file manipulation and reading.
  3. This guy executed cat /etc/network/interfaces
Filtering out syscalls

strace have great filtering options. Embedded systems quite often has some limits on it, but it’s still quite useful

  • Track file related syscalls: -e trace=file
  • Filtering precise syscall: -e trace=mmap2.
  • Track network stuff, but not sendto/recvfrom: -e trace=network -e trace=\!sendto -e trace=\!recvfrom – some shells need backslash before exclamation mark
  • Last resort, if those filters not fit, you can use grep. This example show syscalls which fails on errno EAGAIN: strace -p XXX 2>&1 | grep EAGAIN – remember, that strace output is taken into stderr. To pipe it together with stdout you need add 2>&1. grep is much more flexible, however there is big efficiency impact without using strace builtin filters. Each process usually make huge amount of syscalls, and all of them are piped.
Performance analysis

Really nice feature is measuring time spent in each syscall. You can make it with -T option, and check how many seconds it lasts. Remember that if your process is preempted on a syscall, the time of other jobs is accumulated into measurement. That’s beacuse time is measured between entering syscall and returning it’s exit code.

Usually useful is adding -tt option which prefixes all records with microsecond timestamp. It lets you correlate syscalls with other logs.

Worth noting here is possibility to track several processes. You can use -p option multiple times and give several PIDs. I’ve used this option recently to track how threads connected through IPC are preempting each other.

Don’t forget…

…that ever debug tool have impact on system you follow. In this case every syscall generates huge console output, which slows down execution.

Strace implementation are different. Especially on embedded systems not all options are available

Obviously you must have root privileges to use strace.

Eclipse project generator

The common problem of all cross-compiled projects is well-configured IDE. Usually, the first thing done at the beginning of the project is cloning the repository and setting up a build environment. If you’re patient enough, probably you can configure include configuration of your favorite code editor. If it’s a simple project this approach may be suitable. But if your code gets more and more complicated, new libraries are attached, your IDE will be full of red underscored code. Usually, it’s treated as one-time work, however, what will happen if your hard drive is gone, or if you want to work on several repositories simultaneously? Simple copying of IDE configuration sometimes works, sometimes not. Especially if you are setting up an environment on a new machine, include path problems may take you many hours of stupid work. Some projects setting is almost impossible by a human.

Some time ago I’ve realized that all you need to set up your environment are build variables passed to the compiler. Usually, all of it is placed in the Makefile script. It’s a common tool for most of the cross-compiled projects – the place where compiler, configuration parameters, library paths, and many other things meet to build your code. This implies that the Makefile script is the best place to set up your IDE. This is the reason for creating this simple Python script. I made some reverse engineering to see how CDT Eclipse configuration is stored in its XML files (.cproject, .project, and .settings directory). The general principle is simple and filling it with automatically generated data was not a big deal.

The first thing, which is not a part of a script, is copying .settings folder with it’s content to a project directory. I didn’t spend time investigating what is it about – it is a good googling topic – but it is an obligatory part of the Eclipse project. I suppose that local project settings, not connected with the build system are set up here. So let’s execute our first step:

$ cp -r template/.settings-base project/.settings

After that, we can start the most interesting part – analyzing what’s important for Eclipse to parse your C/C++ code correctly. Script base principle is loading .cproject and .project bare template into ElementTree structure and filling it with data taken from parameters. The script will be executed in Makefile so all the parameters will be passed to it like in any other program e.g. gcc. I will give short snippets of Python in the article, the full-featured script is added on the bottom.

Loading .project file stub is done with the following commands:

import xml.etree.ElementTree as ET

tree = ET.parse("template/.project-stub")
root = tree.getroot()

At this point, we have loaded and parsed XML structure of standard .project file in a tree object. Root node handle is fetched in the last line of a snippet. .project file contains language-neutral settings like name of a project – this is the first thing filled with our data:

projectNameNode = root.find('name')
projectNameNode.text = args.name[0]

find method gets the node with the specified identifier, which in our case is name. The second line set its content. Worth noting here is the usage of argparse Python module. At the beginning of the script, you will find

import argparse

parser = argparse.ArgumentParser(description='Eclipse project  generator, creates .project and .cproject files')
parser.add_argument('--name', nargs=1, metavar='NAME', help='Project name, will be displayed in project explorer')
parser.add_argument('--includes', nargs='+', metavar='PATH', help='Paths to include directories, might be relative')
parser.add_argument('--sources', nargs='+', metavar='PATH', help='Paths to source locations, relative to project only!')
parser.add_argument('--external', nargs='*', metavar='PATH', help='Source directiories outside project', default=[])
parser.add_argument('--defines', nargs='*', metavar="MACRO", help='Defined symbols', default=[])
parser.add_argument('--dest', nargs=1, metavar='DEST', help='Project destination folder', default='.')
args = parser.parse_args()

This is a really useful Python utility, which parses script parameters and populates it in args variable set on the bottom. After that, we can access parameter with a human-readable form like args.name[0] (these objects are arrays – that’s why it is indexed with 0.

The next thing to do with our .project file template is to fill linkedResources element. It is not the most important part – linkedResources are directories which are outside project, but they are linked. Not every project contains it, but these paths may be included with --external parameter.

linkedResources = ET.SubElement(root, 'linkedResources')
for ex in args.external:
        opt = ET.SubElement(linkedResources, 'link')
        ET.SubElement(opt, 'name').text = os.path.basename(ex)
        ET.SubElement(opt, 'type').text = '2'
        ET.SubElement(opt, 'location').text = ex

External directories should be passed as space-separated paths. for-loop is passing all of them and creating new subnodes under linkedResources . These are all things configured in .project file. It is written to the project directory (.settings folder is already there).

tree.write(args.dest[0] + '/.project', 'UTF-8', True)

Now it is time to setup .cproject file. As its name says it is related strictly to C language. It contains things like source and header directories.

tree = ET.parse("template/.cproject-stub")
root = tree.getroot()
cdt = root.find(".//*[@moduleId='cdtBuildSystem']")

for incPath in cdt.findall('.//*/option[@valueType="includePath"]'):
        for path in args.includes:
                opt = ET.SubElement(incPath, 'listOptionValue')
                opt.set('builtIn', 'false')
                opt.set('value', os.path.abspath(path))

for defineNode in cdt.findall('.//*/option[@valueType="definedSymbols"]'):
        for define in args.defines:
                opt = ET.SubElement(defineNode, 'listOptionValue')
                opt.set('builtIn', 'false')
                opt.set('value', define)

sourcesNode = cdt.find('.//*/sourceEntries')
for source in args.sources:
        opt = ET.SubElement(sourcesNode, 'entry')
        opt.set('flags', 'VALUE_WORKSPACE_PATH')
        opt.set('kind', 'sourcePath')
        opt.set('name', source)
for ex in args.external:
        opt = ET.SubElement(sourcesNode, 'entry')
        opt.set('flags', 'VALUE_WORKSPACE_PATH|RESOLVED')
        opt.set('kind', 'sourcePath')
        opt.set('name', os.path.basename(ex))

Just like in .project file .cproject-stub is opened, loaded and parsed to a tree structure with root node object root. This time all operations are made on subnodes of storageModule with attribute moduleId="cdtBuildSystem". Under this node, we can find the configuration of all languages added to this project. In Eclipse CDT C/C++ Project it is usually C, C++, and assembly. My script loops on configuration options of all languages (outer for-loop) for the sake of simplicity. These options are includePath, definedSymbols, and sourceEntries. Just like in previous parts all arguments are translated from space-separated list to Eclipse-known XML node list. I’ve marked all of these as builtin symbols/paths even if it’s not true. There is no sense in splitting them. Some attributes and subnodes were reverse-engineered, I think they are self-explaining (like kind sourcePath). Worth noting here is that script links external directories, already configured in .project file as the source path. The idea behind it is that as external paths I gave library source paths, which in turn could be easily accessed during development. It is really useful, but I had problems with naming. When I wrote that script I assumed that the source directory name will distinguish it from other folders. It was a mistake, almost all of the source folders are named src. Fortunately, this script is really simple and you can manage your own naming strategy if it is a problem for you :).

Last, but not least, saving .cproject was a little bit problematic due to not standard XML header. It looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?fileVersion 4.0.0?>

ElementTree parsed input file well, but there was no way to restore in the same form (using standard API). This is my workaround:

f = open(args.dest[0] + '/.cproject', 'w')
f.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n<?fileVersion 4.0.0?>' + ET.tostring(root))
f.close()

The script is ready, it’s full form is presented at the bottom of the article. Now it is time to use it. As I mentioned above, the best place to call it is Makefile script. It is really easy, and beautiful help generated by argparse module is helpful here:

usage: eclipse-generator.py [-h] [--name NAME] [--includes PATH [PATH ...]]
                            [--sources PATH [PATH ...]]
                            [--external [PATH [PATH ...]]]
                            [--defines [MACRO [MACRO ...]]] [--dest DEST]

Eclipse project generator, creates .project and .cproject files

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           Project name, will be displayed in project explorer
  --includes PATH [PATH ...]
                        Paths to include directories, might be relative
  --sources PATH [PATH ...]
                        Paths to source locations, relative to project only!
  --external [PATH [PATH ...]]
                        Source directiories outside project
  --defines [MACRO [MACRO ...]]
                        Defined symbols
  --dest DEST           Project destination folder

Source and includes are straightforward, but what with defines parameter? Without it, perfect indexing C code would be impossible – yes, we have all proper include and source paths, but it usually contains a lot of conditional compilation statements. They are based on a version of the language standard, on POSIX conformance, etc. – all the things which are handled by the toolchain. Luckily, there is an option of gcc, that shows all builtin defined macros. The option of our need is

gcc -dM -E - < /dev/null

-dM option outputs all defines used by the compiler, -E option stops build process on preprocessing, – tell gcc to take code from stdin which is redirected from /dev/null. There is no need to running the compilation process in this one-liner, so it is stopped on preprocessing. We are fetching builtin macros only, so passing C files is not necessary. Without /dev/null redirection, the program would stop waiting for input.

Unfortunately, that’s not all. gcc gives us all #define directives in a C-understood format. That is not what our script wants. But with some sed knowledge we can easily handle this problem:

gcc -dM -E - < /dev/null | cut -c 9- | sed 's/ /=/' | sed 's/ /\\ /g'

First element in pipe – cut – eliminates #definedirective. sed 's/ /=/'substitutes first occurrence of space (after macro name) with = character (this is expected by CDT). Last sed call prefixes all spaces with a backslash – without it, space would be treated as a separator.

With all those things, joining it together in Makefile should be simple, but there is one problem with the last pipelined command. Makefile engine has problems with parentheses in its variables. This is why our command inside Makefile would be not as nice as it was:

LEFTPAREN := (
RIGHTPAREN := )
DEFINES := $(shell gcc -dM -E - < /dev/null | cut -c 9- | sed 's/ /=/' | sed 's/$(LEFTPAREN)/\\$(LEFTPAREN)/g' | sed 's/$(RIGHTPAREN)/\\$(RIGHTPAREN)/g' | sed 's/ /\\ /g')

And that’s all. I hope my script will save you plenty of time on Eclipse configuration, just like in my case. Any enhancement and comments are welcome. Here you can find the whole working script. .settings directory and .(c)project stubs can be easily extracted from bare CDT project.

import xml.etree.ElementTree as ET
import os
import argparse

parser = argparse.ArgumentParser(description='Eclipse project generator, creates .project and .cproject files')
parser.add_argument('--name', nargs=1, metavar='NAME', help='Project name, will be displayed in project explorer')
parser.add_argument('--includes', nargs='+', metavar='PATH', help='Paths to include directories, might be relative')
parser.add_argument('--sources', nargs='+', metavar='PATH', help='Paths to source locations, relative to project only!')
parser.add_argument('--external', nargs='*', metavar='PATH', help='Source directiories outside project', default=[])
parser.add_argument('--defines', nargs='*', metavar="MACRO", help='Defined symbols', default=[])
parser.add_argument('--dest', nargs=1, metavar='DEST', help='Project destination folder', default='.')
args = parser.parse_args()

print("Generating .project file...")

tree = ET.parse("template/.project-stub")
root = tree.getroot()
projectNameNode = root.find('name')
projectNameNode.text = args.name[0]

linkedResources = ET.SubElement(root, 'linkedResources')
for ex in args.external:
	opt = ET.SubElement(linkedResources, 'link')
	ET.SubElement(opt, 'name').text = os.path.basename(ex)
	ET.SubElement(opt, 'type').text = '2'
	ET.SubElement(opt, 'location').text = ex

tree.write(args.dest[0] + '/.project', 'UTF-8', True)

print("Generating .cproject file...")
tree = ET.parse("template/.cproject-stub")
root = tree.getroot()
cdt = root.find(".//*[@moduleId='cdtBuildSystem']")

for incPath in cdt.findall('.//*/option[@valueType="includePath"]'):
	for path in args.includes:
		opt = ET.SubElement(incPath, 'listOptionValue')
		opt.set('builtIn', 'false')
		opt.set('value', os.path.abspath(path))

for defineNode in cdt.findall('.//*/option[@valueType="definedSymbols"]'):
	for define in args.defines:
		opt = ET.SubElement(defineNode, 'listOptionValue')
		opt.set('builtIn', 'false')
		opt.set('value', define)

sourcesNode = cdt.find('.//*/sourceEntries')
for source in args.sources:
	opt = ET.SubElement(sourcesNode, 'entry')
	opt.set('flags', 'VALUE_WORKSPACE_PATH')
	opt.set('kind', 'sourcePath')
	opt.set('name', source)
for ex in args.external:
	opt = ET.SubElement(sourcesNode, 'entry')
	opt.set('flags', 'VALUE_WORKSPACE_PATH|RESOLVED')
	opt.set('kind', 'sourcePath')
	opt.set('name', os.path.basename(ex))

f = open(args.dest[0] + '/.cproject', 'w')
f.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n<?fileVersion 4.0.0?>' + ET.tostring(root))
f.close()

Welcome to my technical blog

Hi all! This is my technical blog. I will cover low-level programming related topics. Hopefully, anyone who shares my interests will find something new. Most frequent topics will be:

  • C and C++ programming – especially looking behind the scenes
  • Work automation with Python, Makefile and other utilities
  • Linux usage and work analysis
  • Electronic design and fixing

I hope you will like it…