Tag: #assembly

Following the kernel #4 – MLO part 1

We have finished last period on jumping into reset vector of the MLO code. If you haven’t done it yet, I encourage you to clone U-Boot repository from GitHub https://github.com/u-boot/u-boot. I’m working on the newest master commit a6ba59583abd4085db5ab00358d751f175e2a451. As I wrote, U-Boot is configured via KBuild with default am335x set.

Reset vectors of MLO are defined in common for U-Boot and MLO (TPL and SPL) place – arch/arm/lib/vectors.S file. If you were careful enough, your debugger should stop on the line 87:

/*
 *************************************************************************
 *
 * Exception vectors as described in ARM reference manuals
 *
 * Uses indirect branch to allow reaching handlers anywhere in memory.
 *
 *************************************************************************
 */

_start:
#ifdef CONFIG_SYS_DV_NOR_BOOT_CFG
	.word	CONFIG_SYS_DV_NOR_BOOT_CFG
#endif
	ARM_VECTORS // <--- right here
#endif /* !defined(CONFIG_ENABLE_ARM_SOC_BOOT0_HOOK) */

ARM_VECTORS is the macro defined few lines above (21-34).

/*
 * A macro to allow insertion of an ARM exception vector either
 * for the non-boot0 case or by a boot0-header.
 */
        .macro ARM_VECTORS
#ifdef CONFIG_ARCH_K3
	ldr     pc, _reset
#else
	b	reset
#endif
	ldr	pc, _undefined_instruction
	ldr	pc, _software_interrupt
	ldr	pc, _prefetch_abort
	ldr	pc, _data_abort
	ldr	pc, _not_used
	ldr	pc, _irq
	ldr	pc, _fiq
	.endm

Our architecture is OMAP2+ (CONFIG_ARCH_OMAP2PLUS – check it with make menuconfig for practice), so our code includes b reset part, and this is the instruction placed under our first breakpoint. We are jumping into the reset routine. This one is defined in arch/arm/cpu/armv7/start.S file. Reset routine starts in line 38.

reset:
	/* Allow the board to save important registers */
	b	save_boot_params
save_boot_params_ret:
#ifdef CONFIG_ARMV7_LPAE
/*
 * check for Hypervisor support
 */
	mrc	p15, 0, r0, c0, c1, 1		@ read ID_PFR1
	and	r0, r0, #CPUID_ARM_VIRT_MASK	@ mask virtualization bits
	cmp	r0, #(1 << CPUID_ARM_VIRT_SHIFT)
	beq	switch_to_hypervisor
switch_to_hypervisor_ret:
#endif
	/*
	 * disable interrupts (FIQ and IRQ), also set the cpu to SVC32 mode,
	 * except if in HYP mode already
	 */
	mrs	r0, cpsr
	and	r1, r0, #0x1f		@ mask mode bits
	teq	r1, #0x1a		@ test for HYP mode
	bicne	r0, r0, #0x1f		@ clear all mode bits
	orrne	r0, r0, #0x13		@ set SVC mode
	orr	r0, r0, #0xc0		@ disable FIQ and IRQ
	msr	cpsr,r0

/*
 * Setup vector:
 * (OMAP4 spl TEXT_BASE is not 32 byte aligned.
 * Continue to use ROM code vector only in OMAP4 spl)
 */
#if !(defined(CONFIG_OMAP44XX) && defined(CONFIG_SPL_BUILD))
	/* Set V=0 in CP15 SCTLR register - for VBAR to point to vector */
	mrc	p15, 0, r0, c1, c0, 0	@ Read CP15 SCTLR Register
	bic	r0, #CR_V		@ V = 0
	mcr	p15, 0, r0, c1, c0, 0	@ Write CP15 SCTLR Register

#ifdef CONFIG_HAS_VBAR
	/* Set vector address in CP15 VBAR register */
	ldr	r0, =_start
	mcr	p15, 0, r0, c12, c0, 0	@Set VBAR
#endif
#endif

	/* the mask ROM code should have PLL and others stable */
#ifndef CONFIG_SKIP_LOWLEVEL_INIT
#ifdef CONFIG_CPU_V7A
	bl	cpu_init_cp15
#endif
#ifndef CONFIG_SKIP_LOWLEVEL_INIT_ONLY
	bl	cpu_init_crit
#endif
#endif

	bl	_main

As we can see, here we have more code to focus on. This code prepares CPU before execution and call _main function at last stage. First thing is jumping into save_boot_params routine. This one is defined in arch/arm/mach-omap2/lowlevel_init.S.

ENTRY(save_boot_params)
	ldr	r1, =OMAP_SRAM_SCRATCH_BOOT_PARAMS
	str	r0, [r1]
	b	save_boot_params_ret
ENDPROC(save_boot_params)

Do you remember part 1 of this series? One of the last thing done by the ROM Code is passing boot device information via pointer passed to r0 register, just before running MLO. As we can see, this pointer is stored under OMAP_SRAM_SCRATCH_BOOT_PARAMS address. r0 for sure will be utilized later, so this is the first thing done. After that, flow is going back to rest of reset.

CONFIG_ARMV7_LPAE is turned off, so we are going to line 56. The Current Program State Register is set. If processor is in Hypervisor Mode it is left unchanged, otherwise Supervisor mode is set. After that all interrupts are masked with that register, no matter which mode is running. Supervisor mode (SVC) is state of processor proper for running kernel mode, this is worth noting. If you want to learn more about ARM processor modes, examine Reference Manual of ARMv7 architecture.

Right after that more sophisticated things are done. MCR and MRC instructions are used. These are instructions communicating with coprocessors attached on the silicon. There are maximum 16 coprocessors – p15 is so called Control Processor , described here https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/about-control-coprocessor-cp15?lang=en. Instruction mnemonics supported by it are extensively described here https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/summary-of-cp15-instructions. These sites refer to not our architecture – some things might be not the same as in our chip, but they are well explained.

As you could read in above links, first instruction mrc p15, 0, r0, c1, c0, 0 loads current value of MMU-related Control Register into local, core register r0. Then program clears V-bit (with bic instruction), which means, that exception vectors are kept in 0x00000000 address range and write it back to the Control Register with mcr.

Next instruction mcr p15, 0, r0, c12, c0, 0 sets the Vector Base Address Register inside Control Processor to the address of _start. This symbol was defined in our first listing in this article. Now all the exceptions caught by processor should be handled by the Vector defined in MLO (described in vector.s).

Before we jump to the long-awaited C code there are couple of assembly routines which must be executed. The first one is cpu_init_cp15. This one is defined in file arch/arm/cpu/armv7/start.S, line 143. As you can see, there are a lot of MCR transfers, I will try to explain general idea behind the whole procedure without analyzing every single mnemonic.

The name of routine tells us, that once again coprocessor number 15 will be initialized. As the comments says line 143-148 invalidates all data saved in L1 cache, which after restart may have wrong values. These are TLBs (Translation Lookaside Buffers – records which speed up resolving virtual address into physical memory), instruction caches which might buffer adjacent blocks of assembly code, to speed up instruction fetches, and lastly invalidation of BP (Branch Prediction) arrays is done. Branch prediction is another mechanism of speeding up pipeline processors to fetch the right instruction, even if it is not the next mnemonic of our assembly code. If you want learn more in this area, I encourage to read wiki article https://en.wikipedia.org/wiki/Branch_predictor.

After invalidating caches, Data Synchronization Barrier is done to explicitly wait for the end of all memory operations, and ISB (Instruction Synchronization Barrier), described in documentation as flushing prefetch buffer is done. As documentation says:

The Flush Prefetch Buffer instruction flushes the pipeline in the processor, so that all instructions following the pipeline flush are fetched from memory, (including the instruction cache), after the instruction has been completed. Combined with Data Synchronization Barrier, and potentially a memory barrier, this ensures that any instructions written by the processor are executed. This guarantee is required as part of the mechanism for handling self-modifying code. The execution of a Data Synchronization Barrier instruction and the invalidation of the Instruction Cache and Branch Target Cache are also required for the handling of self-modifying code. The Flush Prefetch Buffer is guaranteed to perform this function, while alternative methods of performing the same task, such as a branch instruction, can be optimized in the hardware to avoid the pipeline flush (for example, by using a branch predictor).

Next bunch of assembly is commented as disabling MMU stuff and caches. Once again c1,c0 operation (Read Control Register) is made, than V-bit (once again), and CAM bits are cleared. Then A (bit 1) and Z (bit 11) are set. V-bit was described, previously. I don’t know why it’s repeated. CAM bits (https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/register-descriptions/c1–control-register?lang=en) disables data Cache, Alignment fault checking and MMU. According to the hash-defines, we can turn on or turn off instruction cache in this place. In our configuration it is turned on, so setting bit with orr instruction is made.

After that, bunch of Errata stuff is done. Fortunately, none of these are applied in our build beside CONFIG_ARM_CORTEX_A8_CVE_2017_5715. But before that more interesting operations are done

	mov	r5, lr			@ Store my Caller
	mrc	p15, 0, r1, c0, c0, 0	@ r1 has Read Main ID Register (MIDR)
	mov	r3, r1, lsr #20		@ get variant field
	and	r3, r3, #0xf		@ r3 has CPU variant
	and	r4, r1, #0xf		@ r4 has CPU revision
	mov	r2, r3, lsl #4		@ shift variant field for combined value
	orr	r2, r4, r2		@ r2 has combined CPU variant + revision

/* Early stack for ERRATA that needs into call C code */
#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
	ldr	r0, =(CONFIG_SPL_STACK)
#else
	ldr	r0, =(CONFIG_SYS_INIT_SP_ADDR)
#endif
	bic	r0, r0, #7	/* 8-byte alignment for ABI compliance */
	mov	sp, r0

Once again CP15 is used to read our silicon ID. CPU variant and revision is combined and stored in r2 register. After that, according to configuration stack pointer address is chosen – in our case it is 0x4030ff10 (the value of CONFIG_SYS_INIT_SP_ADDR). Check this address in part #1 of my series. You will find out, that it’s one of the highest accessible address on Internal SRAM memory. This address is aligned to 8-bytes with clearing 3 least significant bits. Finally stack pointer is initialized with movopcode. Wohoo! Doing it is last thing prepared by cpu_init_cp15 routine, so we are going back to the reset routine and jump to another place, called cpu_init_crit.

ENTRY(cpu_init_crit)
	/*
	 * Jump to board specific initialization...
	 * The Mask ROM will have already initialized
	 * basic memory. Go here to bump up clock rate and handle
	 * wake up conditions.
	 */
	b	lowlevel_init		@ go setup pll,mux,memory
ENDPROC(cpu_init_crit)

It is the place, where more specialized initialization might be done. Our lowlevel_init procedure is implemented in arch/arm/cpu/armv7/lowlevel_init.S

WEAK(s_init)
	bx	lr
ENDPROC(s_init)
.popsection

.pushsection .text.lowlevel_init, "ax"
WEAK(lowlevel_init)
	/*
	 * Setup a temporary stack. Global data is not available yet.
	 */
#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
	ldr	sp, =CONFIG_SPL_STACK
#else
	ldr	sp, =CONFIG_SYS_INIT_SP_ADDR
#endif
	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
#ifdef CONFIG_SPL_DM
	mov	r9, #0
#else
	/*
	 * Set up global data for boards that still need it. This will be
	 * removed soon.
	 */
#ifdef CONFIG_SPL_BUILD
	ldr	r9, =gdata
#else
	sub	sp, sp, #GD_SIZE
	bic	sp, sp, #7
	mov	r9, sp
#endif
#endif
	/*
	 * Save the old lr(passed in ip) and the current lr to stack
	 */
	push	{ip, lr}

	/*
	 * Call the very early init function. This should do only the
	 * absolute bare minimum to get started. It should not:
	 *
	 * - set up DRAM
	 * - use global_data
	 * - clear BSS
	 * - try to start a console
	 *
	 * For boards with SPL this should be empty since SPL can do all of
	 * this init in the SPL board_init_f() function which is called
	 * immediately after this.
	 */
	bl	s_init
	pop	{ip, pc}
ENDPROC(lowlevel_init)

arch/armv7 directory contains generic implementation of low-level init. All routines are defined as so-called weak symbols. If linker will find other with the same name, it will leave weak symbol not used and choose the other. Basically it doesn’t implement nothing new. Once again stack pointer is initialized. CONFIG_SPL_DM is defined (Driver Model for SPL), so r9 is zeroed.

Our program stack is ready, so we can use it with a pushinstruction. Link-register and instruction pointer is passed, after that branch with copying return address to link-register is done. We are jumping to s_init, but as we can see in the listing, it is empty.

Finally after returing to resetroutine, we are jumping to the _main routine, which is really, really close to the C-world. It is defined in arch/arm/lib/crt0.S. As we can see, it is the code initializing C environment (clearing BSS, preparing heap etc. We will cover it in the next part of the series.