Following the kernel #2 – Kernel Build System

maj 28, 2021 #following, #kbuild, #kconfig, #kernel, #linux, #makefile, #system, #the, AM335x autorstwa Rafał

Multi-platform projects like Linux or U-Boot need a flexible tool to configure all the conditional stuff inside and run the build in a concise form. Our series is long before going into Linux code. However, I will describe Kbuild system in the example of Linux source code. This is the original version without any modifications. If you want to see it I recommend you clone the repository given in the previous part. The way it works is the same on U-Boot, but it may differ at some points.

Makefile scripts show how things can be made complicated, but it is a standard living much longer than I do and it does the job for all these years. Their strength is that behind the enigmatic form, hides simple shell invocations, which are very flexible. I expect from you basic knowledge in this topic.

scripts/Makefile.build

The central place of Kbuild is the Makefile.build script, placed under scripts directory. It is a smart, handy helper which is generically used in every source compilation. It gathers configuration and accepts one parameter obj, which is a path to a directory with a source code to build.

If you want to build some kernel module in Kbuild, just call make -f scripts/Makefile.build obj=path/to/kernel/module, or with Kbuild helper variable make $(build)=path/to/kernel/module. Of course, it is a simplification. I haven’t mentioned about configuration step, but this is a general idea.

# Init all relevant variables used in kbuild files so
# 1) they have correct type
# 2) they do not inherit any value from the environment
obj-y :=
obj-m :=
lib-y :=
lib-m :=
always :=
always-y :=
always-m :=
targets :=
subdir-y :=
subdir-m :=
EXTRA_AFLAGS   :=
EXTRA_CFLAGS   :=
EXTRA_CPPFLAGS :=
EXTRA_LDFLAGS  :=
asflags-y  :=
ccflags-y  :=
cppflags-y :=
ldflags-y  :=

subdir-asflags-y :=
subdir-ccflags-y :=

The first thing done in it is the initialization of build variables. At this point, you may see the modular nature of Linux. The variables with the suffix -y corresponds to objects compiled into the Kernel (in the case of U-Boot everything is compiled and linked into one binary). The suffix -m, as you may expect, corresponds to modules loaded during Linux runtime. These variables tell Makefile.build what should be compiled and eventually, are there any local build options. These variables are supplied by a developer in the local module Makefile script. I will describe it later.

Below this code, there are several inclusions of other files. Now I will briefly describe what they do.

include/config/auto.conf

This one includes generated Makefile-readable configuration. It contains all parameters, which decide, how Linux code should be compiled. At this point I can tell you, that the repository contains a lot of prepared, specific boards configurations, so you don’t have to make it on your own. Usually, you choose one of it, and create small changes, up to your needs.

# Read auto.conf if it exists, otherwise ignore
-include include/config/auto.conf

- before the include clause cause that there is no error if the file doesn’t exist. In the bare, cloned repository, there is no include/config directory, since we didn’t specify the configuration. More info about it I will write later in this article. Let’s assume, that configuration is right there. Below you can find a sample configuration, taken from the U-Boot repository:

#
# Automatically generated file; DO NOT EDIT.
# U-Boot 2021.04-rc1 Configuration
#
CONFIG_ENV_SUPPORT=y
CONFIG_DISPLAY_BOARDINFO=y
CONFIG_CMD_BOOTM=y
CONFIG_ENV_FAT_FILE="uboot.env"
CONFIG_CMD_EXT4=y
CONFIG_SYS_OMAP24_I2C_SPEED=100000

scripts/Kbuild.include

Now you see, that customization stuff is accessible at the early beginning of Makefile analysis. Right after that, we can see the next inclusion. It contains Kbuild-specific Makefile helper definitions.

include scripts/Kbuild.include

If you are experienced, using Makefile scripts, you may see that it’s problematic with special character usage. The first thing done in this included script is defining such characters as variables:

# Convenient variables
comma   := ,
quote   := "
squote  := '
empty   :=
space   := $(empty) $(empty)
space_escape := _-_SPACE_-_
pound := \#

For example, comma sign is used by Makefile in many functions, such as subst or call, as an argument separator. But if it is placed in a statement as a variable $(comma), it is only substituted by Makefile without interpreting it as a special character. After that, there are some other helpers, commonly used in Kbuild scripts.

I will briefly describe more complicated ones. The first of them is filechk. It is quite complex, but let’s puzzle it out:

define filechk
	$(Q)set -e;						\
	mkdir -p $(dir $@);					\
	trap "rm -f $(dot-target).tmp" EXIT;			\
	{ $(filechk_$(1)); } > $(dot-target).tmp;		\
	if [ ! -r $@ ] || ! cmp -s $@ $(dot-target).tmp; then	\
		$(kecho) '  UPD     $@';			\
		mv -f $(dot-target).tmp $@;			\
	fi
endef

define directive creates Makefile macro, which can be called in this form:

filechk_sample = echo $(KERNELRELEASE)
version.h: FORCE
	$(call filechk,sample)

Pay attention to the ; \ signs in the filechk definition. Everything is run in one /bin/sh invocation (every recipe line in Makefile calls a separate shell with a command supplied there if there is a single line, everything is called in one shell environment). $(Q) variables are optional and say if the command should be written to the stdout. set -e cause immediate exit on any command failure within this function. mkdir -p generates the target directory and the parameter -p suppresses error if the directory already exists.

trap function might be new for you. It is the exit handler installed in our /bin/sh invocation. For example, if we enter CTRL+C combination, during the build, the makefile will be stoped, but before it, the commands between the quotes will be executed –rm -f $(dot-target).tmp.

dot-target is another macro defined in Kbuild.include file. It is quite simple, so I will not tell much on that. Briefly speaking it generates the name of the temporary (prefixed with a dot) file for a target. In our case it is .version.h. If the build is interrupted, we should remove it to allow a clean build, in later make invocation.

Last 4 lines are most important. { $(filechk_$(1)); } > $(dot-target).tmp; executes in recipe the commands defined before. $(1) is the argument passed in call statement. In our example it is sample, so as the effect, our recipe executes filecheck_sample function –echo $(KERNELRELEASE)and redirects it output to the temporary file mentioned before.

Right after that, there is an if statement. It uses Makefile recipe variable $@ which is the target file, in our example it is version.h, to check its existence and accessibility (-r). If the file exists it also compares its content with the generated temporary file. If any of these conditions are true (file not exists or it has different content), the file is updated with new, generated content (mv -f $(dot-target).tmp $@;).

So the filechk macro defines a way of checking the content of a generated file before creating it. It is very useful. If you have ever written more complex Makefile scripts with generating files, you probably noticed, that every generation of such file triggers not necessary builds of other targets, which rely on it – Makefile looks at the last touch timestamp of the file, not its content.

The next function is try-run:

try-run = $(shell set -e;		\
	TMP=$(TMPOUT)/tmp;		\
	TMPO=$(TMPOUT)/tmp.o;		\
	mkdir -p $(TMPOUT);		\
	trap "rm -rf $(TMPOUT)" EXIT;	\
	if ($(1)) >/dev/null 2>&1;	\
	then echo "$(2)";		\
	else echo "$(3)";		\
	fi)

At the beginning, it sets environment variables TMP, TMPO and makes output directory $(TMPOUT) – this variable should be set before try-run call. You are already familiar with trap. The purpose of it is the same, but in this case, it removes the ouput directory on exit. As the first argument, the function gets the command to be executed. If it succeeds, the second argument is outputted. If it files, the third one.

Worth noting here is that TMP and TMPO environment variables are accessible in the wrapped command. They are valid only inside try-run call. A good example of usage is a next definition:

as-option = $(call try-run,\
	$(CC) $(KBUILD_CFLAGS) $(1) -c -x assembler /dev/null -o "$$TMP",$(1),$(2))

It runs the compiler $(CC) with default C-related flags $(KBUILD_CFLAGS) and given as the as-option parameter assembly-related flag $(1). It tells the compiler (-c) to stop at the compilation step (without linking), since we only want to check the passed option. As the name of the functions says, it is related to the assembly, so we tell to the $(CC) that the language is assembler (-x assembler). As the input file /dev/null is given (as I have mentioned it is just the compiler check). The output file is a temporary $$TMP file defined in try-run. We must supply it here with double $$ sign, because otherwise, Makefile would try to resolve it during definition as a variable (like for example $(CC)).

Sample usage of as-option is placed in arch/arm/Makefile script.

AFLAGS_NOWARN	:=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)

This statement assigns to variable AFLAGS_NOWARN result of the as-option call. It runs the compiler with the -Wa,-mno-warn-deprecated flag. If it is supported, the -Wa,-W value is assigned to the AFLAGS_NOWARN. Otherwise, the variable remains empty.

Next important part are shorthand definitions, common build statements:

###
# Shorthand for $(Q)$(MAKE) -f scripts/Makefile.build obj=
# Usage:
# $(Q)$(MAKE) $(build)=dir
build := -f $(srctree)/scripts/Makefile.build obj

###
# Shorthand for $(Q)$(MAKE) -f scripts/Makefile.dtbinst obj=
# Usage:
# $(Q)$(MAKE) $(dtbinst)=dir
dtbinst := -f $(srctree)/scripts/Makefile.dtbinst obj

###
# Shorthand for $(Q)$(MAKE) -f scripts/Makefile.clean obj=
# Usage:
# $(Q)$(MAKE) $(clean)=dir
clean := -f $(srctree)/scripts/Makefile.clean obj

When we want to build an object (statically linked or loaded as a module), the make -f scripts/Makefile.build obj=<module_dir> must be called. With the encapsulating build variable it is easier since the call looks like this: make $(build)=<module_dir>. Similar purposes have dtbinst (device-tree related) and clean variables.

I encourage you to read all of these Makefile helpers. They are commonly used around Kbuild scripts, and knowing them lets you read them fluently. I tried to describe a few, so it should be easier to read all others.

Local Kbuild/Makefile

After Kbuild.include helpers, the local Kbuild/Makefile script is included:

# The filename Kbuild has precedence over Makefile
kbuild-dir := $(if $(filter /%,$(src)),$(src),$(srctree)/$(src))
kbuild-file := $(if $(wildcard $(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
include $(kbuild-file)

This is the part filled by the module developer. It defines files, which should be built into the kernel (obj-y) or as a loadable module (obj-m). These are the most common definitions. At the beginning of Makefile.build I have shown all variables, which might be supplied here. These are lib- which will be built as statically linked libraries, always- are targets, which are – as the name says – are always triggered during the build of our module. Good examples are generated header files. The recipe for it should be described in our local Makefile. The next variable is targets – this is a variable related to the way, the Kbuild works, and to if_changed helper, defined in Kbuild.include. Briefly, it helps with a situation in which the command building something changed in Makefile, so the rebuild of this particular thing should be done. If you want to read more about it, please refer to https://www.kernel.org/doc/Documentation/kbuild/makefiles.rst chapter 3.12 – Command change detection. subdir- are variables that trigger descending into subdirectories in Kbuild framework. More about it I will tell you later. All other variables with *flags* part lets apply specific build flags locally in our module.

A simple example of such local Makefile is placed in drivers/tty/serial/8250/Makefile :

obj-$(CONFIG_SERIAL_8250)		+= 8250.o 8250_base.o
8250-y					:= 8250_core.o
8250-$(CONFIG_SERIAL_8250_PNP)		+= 8250_pnp.o
8250_base-y				:= 8250_port.o
8250_base-$(CONFIG_SERIAL_8250_DMA)	+= 8250_dma.o
8250_base-$(CONFIG_SERIAL_8250_DWLIB)	+= 8250_dwlib.o
8250_base-$(CONFIG_SERIAL_8250_FINTEK)	+= 8250_fintek.o
obj-$(CONFIG_SERIAL_8250_GSC)		+= 8250_gsc.o
obj-$(CONFIG_SERIAL_8250_PCI)		+= 8250_pci.o
obj-$(CONFIG_SERIAL_8250_EXAR)		+= 8250_exar.o
obj-$(CONFIG_SERIAL_8250_HP300)		+= 8250_hp300.o

The often shorthand is concatenating obj- with configuration variable like $(CONFIG_SERIAL_8250). If CONFIG_SERIAL_8250 is declared as y, it is built into the kernel. If this tristate variable is set to m these files are built as a loadable module. If it is set to n it is not built at all

Some modules may be linked from several compilation units (.c files). This situation is presented in 8250.o and 8250_base.o files. After placing them in obj-y/m definition, the Makefile script defines which files must be compiled before linking into 8250.o/8250_base.o. It is done in 8250-y and 8250_base-y variables.

scripts/Makefile.lib

The next part of Makefile.build is Makefile.lib script. It defines all commands, which are used to build the kernel. In the beginning we can find backward compatibility variables support code

asflags-y  += $(EXTRA_AFLAGS)
ccflags-y  += $(EXTRA_CFLAGS)
cppflags-y += $(EXTRA_CPPFLAGS)
ldflags-y  += $(EXTRA_LDFLAGS)

EXTRA_ variables are now deprecated. New modules should use xxflags-y variables in their Makefile scripts (described above). But old ones also must be properly compiled.

# When an object is listed to be built compiled-in and modular,
# only build the compiled-in version
obj-m := $(filter-out $(obj-y),$(obj-m))

The important thing here is a declaration, that if a module is configured for both built-into kernel and as a loadable module, the built-in option is chosen.

# Libraries are always collected in one lib file.
# Filter out objects already built-in
lib-y := $(filter-out $(obj-y), $(sort $(lib-y) $(lib-m)))

Some objects may be built as libraries, but similarly, if they are declared as obj-y, they are filtered out from lib-y.

# Subdirectories we need to descend into
subdir-ym := $(sort $(subdir-y) $(subdir-m) \
			$(patsubst %/,%, $(filter %/, $(obj-y) $(obj-m))))

This part needs more comment. Kbuild system is designed to traverse all configured directories from root folders to more specialized ones if they are configured. As we can see in the subdir-ym definition, these subdirectories might be passed from local Makefile through subdir-y, subdir-m but also with obj-y and obj-m variables – but these one are treated like that, only when they end with a /.

Repository structure reflects a logical design of kernel. For example, device drivers are placed under the drivers folder, under this location, we can find tty directory (teletypewriter). For now, it is a broad area of devices, mainly based on serial ports, but also the virtual devices are inside this group

The tty directory in turn has some generic stuff for all tty devices and serial directory which conform to more specialized device drivers like omap-serial.c. A completely separate directory is arch that includes stuff dependent on the core architecture or the fs directory, containing file system-related code.

The general principle is to start building from the main Makefile, placed in a root directory and descend to more specific directories, chosen according to the configuration. The example code is Makefile in drivers directory, always including tty subsystem directory:

obj-y				+= tty/

Descending is implemented in Makefile.build file, line 488-503:

__build: $(if $(KBUILD_BUILTIN), $(targets-for-builtin)) \
	 $(if $(KBUILD_MODULES), $(targets-for-modules)) \
	 $(subdir-ym) $(always-y)
	@:

endif

# Descending
# ---------------------------------------------------------------------------

PHONY += $(subdir-ym)
$(subdir-ym):
	$(Q)$(MAKE) $(build)=$@ \
	$(if $(filter $@/, $(KBUILD_SINGLE_TARGETS)),single-build=) \
	need-builtin=$(if $(filter $@/built-in.a, $(subdir-builtin)),1) \
	need-modorder=$(if $(filter $@/modules.order, $(subdir-modorder)),1)

subdir-ym is one of the last prerequisites performed in __build target. The recipe for it is taking one by one each subdirectory (the order has significance) and recursive calling make $(build)=$@ with some conditional parameters and the current target variable $@.

Worth noting here is that descending is often related to configuration. We may add subdir- variables conditionally, according to CONFIG_ variables.

The next part, of Makefile.build which I’d like to enlighten, are build rules for source code files. Kbuild supports code written in C and assembly. If you scroll this file, you will find definitions like:

quiet_cmd_cc_s_c = CC $(quiet_modtag)  $@
      cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) $(DISABLE_LTO) -fverbose-asm -S -o $@ $<

$(obj)/%.s: $(src)/%.c FORCE
	$(call if_changed_dep,cc_s_c)

The cmd_cc_s_c abbreviation tells, that its command using CC (C Compiler) program to generate .S file (assembly) from the C source. This is an intermediate step, which is sometimes helpful to debug problems and see how the human-readable machine code of our module looks like. Next, a similar command is cmd_cpp_i_c which generates with C preprocessor code in .i files (pure C after preprocessing). All other rules are written similarly.

In this example, you may see how if_changed... helper is used. Build command is defined as cmd_cc_s_c and is passed (without cmd_ prefix) to the helper. Besides that quite_cmd_cc_s_c is defined to tell Kbuild, how this command should be displayed at some level of verbosity. FORCE is given as a prerequisite, to let if_changed_dephelper decide if the file should be built or not.

The whole path between C (or assembly) source code and object file is presented below. as you may see, Kbuild adds some extra steps. Some of them are optional, some are essential to kernel work. We will not describe them in detail, because it is a topic for a whole series. Maybe I will go back here when it will be needed to code analysis.

define rule_cc_o_c
	$(call cmd_and_fixdep,cc_o_c)
	$(call cmd,gen_ksymdeps)
	$(call cmd,checksrc)
	$(call cmd,checkdoc)
	$(call cmd,objtool)
	$(call cmd,modversions_c)
	$(call cmd,record_mcount)
endef

define rule_as_o_S
	$(call cmd_and_fixdep,as_o_S)
	$(call cmd,gen_ksymdeps)
	$(call cmd,objtool)
	$(call cmd,modversions_S)
endef

Root Makefile

Now after reading all these raw definitions it’s time to split it up. This is the job of the main Makefile script in the kernel repo. This is the root caller of Makefile.build recursion. Unfortunately, it’s long and messy, but after following each called rule we will know what is most essential.

Configuration

As I mentioned above, CONFIG_ variables, placed in auto.conf are choosing drivers, architecture, and all customization stuff. At first sight, you might be afraid, that the whole kernel configuration is in your hands. Fortunately, it is not as bad. There are predefined configuration files for each architecture and specific boards. They are placed under arch/$ARCH/configs/$PLATFORM_defconfig files. For example, defconfig dedicated for BeagleBone Black (and some other boards) is placed under arch/arm/configs/omap2plus_defconfig. The terminology here might be misleading. OMAP is the larger SoC containing ARM processor (for example AM335x) and multimedia coprocessor, probably that is why Linux puts Sitara AM335x processor under this config file.

To configure our cross-compilation we need to set up a couple of things. The first one is assigning ARCH variable (in the terminal before executing make). In our case, it must be set to arm (export ARCH=arm). If it is not set properly, the Makefile script will choose the architecture of the underlying build machine. This variable lets Makefile decide which arch/$SRCARCH/Makefile should be included:

include arch/$(SRCARCH)/Makefile
export KBUILD_DEFCONFIG KBUILD_KCONFIG CC_VERSION_TEXT

config: outputmakefile scripts_basic FORCE
	$(Q)$(MAKE) $(build)=scripts/kconfig $@

%config: outputmakefile scripts_basic FORCE
	$(Q)$(MAKE) $(build)=scripts/kconfig $@

The next thing is telling Kbuild which toolchain should be used. By default, build machine gcc is used. In embedded systems however, it’s rarely used. We must explicitly assign chosen one to CROSS_COMPILE variable. In my case, it is export CROSS_COMPILE=arm-linux-gnueabihf-. You can download such toolchains from Linaro https://releases.linaro.org/components/toolchain/binaries/latest-5/arm-linux-gnueabihf/. Of course, after uncompressing it, you must add it to your local PATH variable.

When we are done, the next step is choosing default defconfig. So we exectue make omap2plus_defconfig. This will run the recipe for the target %config placed in the upper Makefile snippet. It will descend into scripts/kconfig and run Makefile.build on the local Makefile script. As the result, a host program called conf will be built and executed. This rule is implemented in the following part of scripts/kconfig/Makefile:

%_defconfig: $(obj)/conf
	$(Q)$< $(silent) --defconfig=arch/$(SRCARCH)/configs/$@ $(Kconfig)

$< variable is first prerequisite (theconf) program, and it’s called with parameters --defconfig=arch/arm/configs/omap2plus_defconfig. This program gathers all the parameters supplied in omap2plus_defconfig and environment variables declared by make.

The output, generated by it, is include/config directory, containing all the configuration (CONFIG_ variables), readable in C language (as header files) and auto.conf mentioned before. The input data are environment variables, Kconfig files from all Linux subdirectories (which probably are validators), and the most important defconfig file – in our case omap2plus_defconfig.

Running the build

When these files are generated, we are almost there. The only thing to do is running our long-lasting kernel build. If our ARCH and CROSS_COMPILE variables are still in the terminal session, we can just execute make. The default target is all, triggers vmlinux target.

# Final link of vmlinux with optional arch pass after final link
cmd_link-vmlinux =                                                 \
	$(CONFIG_SHELL) $< "$(LD)" "$(KBUILD_LDFLAGS)" "$(LDFLAGS_vmlinux)";    \
	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)

vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
	+$(call if_changed,link-vmlinux)

We will focus on the most important parts – vmlinux-deps are all the objects linked into the kernel image, in a single variable. Let’s see how it’s defined.

$ cat Makefile | grep -w "vmlinux-deps .="
vmlinux-deps := $(KBUILD_LDS) $(KBUILD_VMLINUX_OBJS) $(KBUILD_VMLINUX_LIBS)

$ cat Makefile | grep -w "KBUILD_LDS \|KBUILD_VMLINUX_OBJS \|KBUILD_VMLINUX_LIBS "
KBUILD_VMLINUX_OBJS := $(head-y) $(patsubst %/,%/built-in.a, $(core-y))
KBUILD_VMLINUX_OBJS += $(addsuffix built-in.a, $(filter %/, $(libs-y)))
KBUILD_VMLINUX_OBJS += $(patsubst %/, %/lib.a, $(filter %/, $(libs-y)))
KBUILD_VMLINUX_LIBS := $(filter-out %/, $(libs-y))
KBUILD_VMLINUX_LIBS := $(patsubst %/,%/lib.a, $(libs-y))
KBUILD_VMLINUX_OBJS += $(patsubst %/,%/built-in.a, $(drivers-y))
export KBUILD_LDS          := arch/$(SRCARCH)/kernel/vmlinux.lds

$ cat Makefile | grep -w "core-y\|libs-y\|drivers-y"
core-y		:= init/ usr/
drivers-y	:= drivers/ sound/
drivers-y	+= net/ virt/
libs-y		:= lib/
core-y		+= kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/

$ cat arch/arm/Makefile | grep -w "head-y"
head-y		:= arch/arm/kernel/head$(MMUEXT).o

This is the glue, joining all the things described previously. The main Makefile after preparing configuration (auto.conf and generated headers) descend into main source directories (init, drivers, etc.) and after building all configured objects, it links built-in.a (this archive contains all related kernel objects) using scripts/link-vmlinux.sh script. The makefile recipe, doing it is called descend:

vmlinux-dirs	:= $(patsubst %/,%,$(filter %/, \
		     $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
		     $(libs-y) $(libs-m)))
...
build-dirs	:= $(vmlinux-dirs)
...
# Handle descending into subdirectories listed in $(build-dirs)
# Preset locale variables to speed up the build process. Limit locale
# tweaks to this spot to avoid wrong language settings when running
# make menuconfig etc.
# Error messages still appears in the original language
PHONY += descend $(build-dirs)
descend: $(build-dirs)
$(build-dirs): prepare
	$(Q)$(MAKE) $(build)=$@ \
	single-build=$(if $(filter-out $@/, $(filter $@/%, $(KBUILD_SINGLE_TARGETS))),1) \
	need-builtin=1 need-modorder=1

Summary

Kbuild system is a really broad topic. I have mentioned the most important skeleton, which hopefully eases the understanding of more complicated parts.

In the next chapter, I will describe debug environment setup. And finally, we will dive into some interesting stuff.

Python

Eclipse project generator

sie 21, 2020 #eclipse, #makefile, #python autorstwa Rafał

The common problem of all cross-compiled projects is well-configured IDE. Usually, the first thing done at the beginning of the project is cloning the repository and setting up a build environment. If you’re patient enough, probably you can configure include configuration of your favorite code editor. If it’s a simple project this approach may be suitable. But if your code gets more and more complicated, new libraries are attached, your IDE will be full of red underscored code. Usually, it’s treated as one-time work, however, what will happen if your hard drive is gone, or if you want to work on several repositories simultaneously? Simple copying of IDE configuration sometimes works, sometimes not. Especially if you are setting up an environment on a new machine, include path problems may take you many hours of stupid work. Some projects setting is almost impossible by a human.

Some time ago I’ve realized that all you need to set up your environment are build variables passed to the compiler. Usually, all of it is placed in the Makefile script. It’s a common tool for most of the cross-compiled projects – the place where compiler, configuration parameters, library paths, and many other things meet to build your code. This implies that the Makefile script is the best place to set up your IDE. This is the reason for creating this simple Python script. I made some reverse engineering to see how CDT Eclipse configuration is stored in its XML files (.cproject, .project, and .settings directory). The general principle is simple and filling it with automatically generated data was not a big deal.

The first thing, which is not a part of a script, is copying .settings folder with it’s content to a project directory. I didn’t spend time investigating what is it about – it is a good googling topic – but it is an obligatory part of the Eclipse project. I suppose that local project settings, not connected with the build system are set up here. So let’s execute our first step:

$ cp -r template/.settings-base project/.settings

After that, we can start the most interesting part – analyzing what’s important for Eclipse to parse your C/C++ code correctly. Script base principle is loading .cproject and .project bare template into ElementTree structure and filling it with data taken from parameters. The script will be executed in Makefile so all the parameters will be passed to it like in any other program e.g. gcc. I will give short snippets of Python in the article, the full-featured script is added on the bottom.

Loading .project file stub is done with the following commands:

import xml.etree.ElementTree as ET

tree = ET.parse("template/.project-stub")
root = tree.getroot()

At this point, we have loaded and parsed XML structure of standard .project file in a tree object. Root node handle is fetched in the last line of a snippet. .project file contains language-neutral settings like name of a project – this is the first thing filled with our data:

projectNameNode = root.find('name')
projectNameNode.text = args.name[0]

find method gets the node with the specified identifier, which in our case is name. The second line set its content. Worth noting here is the usage of argparse Python module. At the beginning of the script, you will find

import argparse

parser = argparse.ArgumentParser(description='Eclipse project  generator, creates .project and .cproject files')
parser.add_argument('--name', nargs=1, metavar='NAME', help='Project name, will be displayed in project explorer')
parser.add_argument('--includes', nargs='+', metavar='PATH', help='Paths to include directories, might be relative')
parser.add_argument('--sources', nargs='+', metavar='PATH', help='Paths to source locations, relative to project only!')
parser.add_argument('--external', nargs='*', metavar='PATH', help='Source directiories outside project', default=[])
parser.add_argument('--defines', nargs='*', metavar="MACRO", help='Defined symbols', default=[])
parser.add_argument('--dest', nargs=1, metavar='DEST', help='Project destination folder', default='.')
args = parser.parse_args()

This is a really useful Python utility, which parses script parameters and populates it in args variable set on the bottom. After that, we can access parameter with a human-readable form like args.name[0] (these objects are arrays – that’s why it is indexed with 0.

The next thing to do with our .project file template is to fill linkedResources element. It is not the most important part – linkedResources are directories which are outside project, but they are linked. Not every project contains it, but these paths may be included with --external parameter.

linkedResources = ET.SubElement(root, 'linkedResources')
for ex in args.external:
        opt = ET.SubElement(linkedResources, 'link')
        ET.SubElement(opt, 'name').text = os.path.basename(ex)
        ET.SubElement(opt, 'type').text = '2'
        ET.SubElement(opt, 'location').text = ex

External directories should be passed as space-separated paths. for-loop is passing all of them and creating new subnodes under linkedResources . These are all things configured in .project file. It is written to the project directory (.settings folder is already there).

tree.write(args.dest[0] + '/.project', 'UTF-8', True)

Now it is time to setup .cproject file. As its name says it is related strictly to C language. It contains things like source and header directories.

tree = ET.parse("template/.cproject-stub")
root = tree.getroot()
cdt = root.find(".//*[@moduleId='cdtBuildSystem']")

for incPath in cdt.findall('.//*/option[@valueType="includePath"]'):
        for path in args.includes:
                opt = ET.SubElement(incPath, 'listOptionValue')
                opt.set('builtIn', 'false')
                opt.set('value', os.path.abspath(path))

for defineNode in cdt.findall('.//*/option[@valueType="definedSymbols"]'):
        for define in args.defines:
                opt = ET.SubElement(defineNode, 'listOptionValue')
                opt.set('builtIn', 'false')
                opt.set('value', define)

sourcesNode = cdt.find('.//*/sourceEntries')
for source in args.sources:
        opt = ET.SubElement(sourcesNode, 'entry')
        opt.set('flags', 'VALUE_WORKSPACE_PATH')
        opt.set('kind', 'sourcePath')
        opt.set('name', source)
for ex in args.external:
        opt = ET.SubElement(sourcesNode, 'entry')
        opt.set('flags', 'VALUE_WORKSPACE_PATH|RESOLVED')
        opt.set('kind', 'sourcePath')
        opt.set('name', os.path.basename(ex))

Just like in .project file .cproject-stub is opened, loaded and parsed to a tree structure with root node object root. This time all operations are made on subnodes of storageModule with attribute moduleId="cdtBuildSystem". Under this node, we can find the configuration of all languages added to this project. In Eclipse CDT C/C++ Project it is usually C, C++, and assembly. My script loops on configuration options of all languages (outer for-loop) for the sake of simplicity. These options are includePath, definedSymbols, and sourceEntries. Just like in previous parts all arguments are translated from space-separated list to Eclipse-known XML node list. I’ve marked all of these as builtin symbols/paths even if it’s not true. There is no sense in splitting them. Some attributes and subnodes were reverse-engineered, I think they are self-explaining (like kind sourcePath). Worth noting here is that script links external directories, already configured in .project file as the source path. The idea behind it is that as external paths I gave library source paths, which in turn could be easily accessed during development. It is really useful, but I had problems with naming. When I wrote that script I assumed that the source directory name will distinguish it from other folders. It was a mistake, almost all of the source folders are named src. Fortunately, this script is really simple and you can manage your own naming strategy if it is a problem for you :).

Last, but not least, saving .cproject was a little bit problematic due to not standard XML header. It looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?fileVersion 4.0.0?>

ElementTree parsed input file well, but there was no way to restore in the same form (using standard API). This is my workaround:

f = open(args.dest[0] + '/.cproject', 'w')
f.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n<?fileVersion 4.0.0?>' + ET.tostring(root))
f.close()

The script is ready, it’s full form is presented at the bottom of the article. Now it is time to use it. As I mentioned above, the best place to call it is Makefile script. It is really easy, and beautiful help generated by argparse module is helpful here:

usage: eclipse-generator.py [-h] [--name NAME] [--includes PATH [PATH ...]]
                            [--sources PATH [PATH ...]]
                            [--external [PATH [PATH ...]]]
                            [--defines [MACRO [MACRO ...]]] [--dest DEST]

Eclipse project generator, creates .project and .cproject files

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           Project name, will be displayed in project explorer
  --includes PATH [PATH ...]
                        Paths to include directories, might be relative
  --sources PATH [PATH ...]
                        Paths to source locations, relative to project only!
  --external [PATH [PATH ...]]
                        Source directiories outside project
  --defines [MACRO [MACRO ...]]
                        Defined symbols
  --dest DEST           Project destination folder

Source and includes are straightforward, but what with defines parameter? Without it, perfect indexing C code would be impossible – yes, we have all proper include and source paths, but it usually contains a lot of conditional compilation statements. They are based on a version of the language standard, on POSIX conformance, etc. – all the things which are handled by the toolchain. Luckily, there is an option of gcc, that shows all builtin defined macros. The option of our need is

gcc -dM -E - < /dev/null

-dM option outputs all defines used by the compiler, -E option stops build process on preprocessing, – tell gcc to take code from stdin which is redirected from /dev/null. There is no need to running the compilation process in this one-liner, so it is stopped on preprocessing. We are fetching builtin macros only, so passing C files is not necessary. Without /dev/null redirection, the program would stop waiting for input.

Unfortunately, that’s not all. gcc gives us all #define directives in a C-understood format. That is not what our script wants. But with some sed knowledge we can easily handle this problem:

gcc -dM -E - < /dev/null | cut -c 9- | sed 's/ /=/' | sed 's/ /\\ /g'

First element in pipe – cut – eliminates #definedirective. sed 's/ /=/'substitutes first occurrence of space (after macro name) with = character (this is expected by CDT). Last sed call prefixes all spaces with a backslash – without it, space would be treated as a separator.

With all those things, joining it together in Makefile should be simple, but there is one problem with the last pipelined command. Makefile engine has problems with parentheses in its variables. This is why our command inside Makefile would be not as nice as it was:

LEFTPAREN := (
RIGHTPAREN := )
DEFINES := $(shell gcc -dM -E - < /dev/null | cut -c 9- | sed 's/ /=/' | sed 's/$(LEFTPAREN)/\\$(LEFTPAREN)/g' | sed 's/$(RIGHTPAREN)/\\$(RIGHTPAREN)/g' | sed 's/ /\\ /g')

And that’s all. I hope my script will save you plenty of time on Eclipse configuration, just like in my case. Any enhancement and comments are welcome. Here you can find the whole working script. .settings directory and .(c)project stubs can be easily extracted from bare CDT project.

import xml.etree.ElementTree as ET
import os
import argparse

parser = argparse.ArgumentParser(description='Eclipse project generator, creates .project and .cproject files')
parser.add_argument('--name', nargs=1, metavar='NAME', help='Project name, will be displayed in project explorer')
parser.add_argument('--includes', nargs='+', metavar='PATH', help='Paths to include directories, might be relative')
parser.add_argument('--sources', nargs='+', metavar='PATH', help='Paths to source locations, relative to project only!')
parser.add_argument('--external', nargs='*', metavar='PATH', help='Source directiories outside project', default=[])
parser.add_argument('--defines', nargs='*', metavar="MACRO", help='Defined symbols', default=[])
parser.add_argument('--dest', nargs=1, metavar='DEST', help='Project destination folder', default='.')
args = parser.parse_args()

print("Generating .project file...")

tree = ET.parse("template/.project-stub")
root = tree.getroot()
projectNameNode = root.find('name')
projectNameNode.text = args.name[0]

linkedResources = ET.SubElement(root, 'linkedResources')
for ex in args.external:
	opt = ET.SubElement(linkedResources, 'link')
	ET.SubElement(opt, 'name').text = os.path.basename(ex)
	ET.SubElement(opt, 'type').text = '2'
	ET.SubElement(opt, 'location').text = ex

tree.write(args.dest[0] + '/.project', 'UTF-8', True)

print("Generating .cproject file...")
tree = ET.parse("template/.cproject-stub")
root = tree.getroot()
cdt = root.find(".//*[@moduleId='cdtBuildSystem']")

for incPath in cdt.findall('.//*/option[@valueType="includePath"]'):
	for path in args.includes:
		opt = ET.SubElement(incPath, 'listOptionValue')
		opt.set('builtIn', 'false')
		opt.set('value', os.path.abspath(path))

for defineNode in cdt.findall('.//*/option[@valueType="definedSymbols"]'):
	for define in args.defines:
		opt = ET.SubElement(defineNode, 'listOptionValue')
		opt.set('builtIn', 'false')
		opt.set('value', define)

sourcesNode = cdt.find('.//*/sourceEntries')
for source in args.sources:
	opt = ET.SubElement(sourcesNode, 'entry')
	opt.set('flags', 'VALUE_WORKSPACE_PATH')
	opt.set('kind', 'sourcePath')
	opt.set('name', source)
for ex in args.external:
	opt = ET.SubElement(sourcesNode, 'entry')
	opt.set('flags', 'VALUE_WORKSPACE_PATH|RESOLVED')
	opt.set('kind', 'sourcePath')
	opt.set('name', os.path.basename(ex))

f = open(args.dest[0] + '/.cproject', 'w')
f.write('<?xml version="1.0" encoding="UTF-8" standalone="no"?>\n<?fileVersion 4.0.0?>' + ET.tostring(root))
f.close()

Tag: #makefile