Following the kernel #4 – MLO part 1
We have finished last period on jumping into reset vector of the MLO code. If you haven’t done it yet, I encourage you to clone U-Boot repository from GitHub https://github.com/u-boot/u-boot. I’m working on the newest master commit a6ba59583abd4085db5ab00358d751f175e2a451. As I wrote, U-Boot is configured via KBuild with default am335x set.
Reset vectors of MLO are defined in common for U-Boot and MLO (TPL and SPL) place – arch/arm/lib/vectors.S file. If you were careful enough, your debugger should stop on the line 87:
/*
*************************************************************************
*
* Exception vectors as described in ARM reference manuals
*
* Uses indirect branch to allow reaching handlers anywhere in memory.
*
*************************************************************************
*/
_start:
#ifdef CONFIG_SYS_DV_NOR_BOOT_CFG
.word CONFIG_SYS_DV_NOR_BOOT_CFG
#endif
ARM_VECTORS // <--- right here
#endif /* !defined(CONFIG_ENABLE_ARM_SOC_BOOT0_HOOK) */
ARM_VECTORS
is the macro defined few lines above (21-34).
/*
* A macro to allow insertion of an ARM exception vector either
* for the non-boot0 case or by a boot0-header.
*/
.macro ARM_VECTORS
#ifdef CONFIG_ARCH_K3
ldr pc, _reset
#else
b reset
#endif
ldr pc, _undefined_instruction
ldr pc, _software_interrupt
ldr pc, _prefetch_abort
ldr pc, _data_abort
ldr pc, _not_used
ldr pc, _irq
ldr pc, _fiq
.endm
Our architecture is OMAP2+ (CONFIG_ARCH_OMAP2PLUS – check it with make menuconfig
for practice), so our code includes b reset
part, and this is the instruction placed under our first breakpoint. We are jumping into the reset
routine. This one is defined in arch/arm/cpu/armv7/start.S file. Reset routine starts in line 38.
reset:
/* Allow the board to save important registers */
b save_boot_params
save_boot_params_ret:
#ifdef CONFIG_ARMV7_LPAE
/*
* check for Hypervisor support
*/
mrc p15, 0, r0, c0, c1, 1 @ read ID_PFR1
and r0, r0, #CPUID_ARM_VIRT_MASK @ mask virtualization bits
cmp r0, #(1 << CPUID_ARM_VIRT_SHIFT)
beq switch_to_hypervisor
switch_to_hypervisor_ret:
#endif
/*
* disable interrupts (FIQ and IRQ), also set the cpu to SVC32 mode,
* except if in HYP mode already
*/
mrs r0, cpsr
and r1, r0, #0x1f @ mask mode bits
teq r1, #0x1a @ test for HYP mode
bicne r0, r0, #0x1f @ clear all mode bits
orrne r0, r0, #0x13 @ set SVC mode
orr r0, r0, #0xc0 @ disable FIQ and IRQ
msr cpsr,r0
/*
* Setup vector:
* (OMAP4 spl TEXT_BASE is not 32 byte aligned.
* Continue to use ROM code vector only in OMAP4 spl)
*/
#if !(defined(CONFIG_OMAP44XX) && defined(CONFIG_SPL_BUILD))
/* Set V=0 in CP15 SCTLR register - for VBAR to point to vector */
mrc p15, 0, r0, c1, c0, 0 @ Read CP15 SCTLR Register
bic r0, #CR_V @ V = 0
mcr p15, 0, r0, c1, c0, 0 @ Write CP15 SCTLR Register
#ifdef CONFIG_HAS_VBAR
/* Set vector address in CP15 VBAR register */
ldr r0, =_start
mcr p15, 0, r0, c12, c0, 0 @Set VBAR
#endif
#endif
/* the mask ROM code should have PLL and others stable */
#ifndef CONFIG_SKIP_LOWLEVEL_INIT
#ifdef CONFIG_CPU_V7A
bl cpu_init_cp15
#endif
#ifndef CONFIG_SKIP_LOWLEVEL_INIT_ONLY
bl cpu_init_crit
#endif
#endif
bl _main
As we can see, here we have more code to focus on. This code prepares CPU before execution and call _main
function at last stage. First thing is jumping into save_boot_params
routine. This one is defined in arch/arm/mach-omap2/lowlevel_init.S.
ENTRY(save_boot_params)
ldr r1, =OMAP_SRAM_SCRATCH_BOOT_PARAMS
str r0, [r1]
b save_boot_params_ret
ENDPROC(save_boot_params)
Do you remember part 1 of this series? One of the last thing done by the ROM Code is passing boot device information via pointer passed to r0 register, just before running MLO. As we can see, this pointer is stored under OMAP_SRAM_SCRATCH_BOOT_PARAMS
address. r0 for sure will be utilized later, so this is the first thing done. After that, flow is going back to rest of reset
.
CONFIG_ARMV7_LPAE
is turned off, so we are going to line 56. The Current Program State Register is set. If processor is in Hypervisor Mode it is left unchanged, otherwise Supervisor mode is set. After that all interrupts are masked with that register, no matter which mode is running. Supervisor mode (SVC) is state of processor proper for running kernel mode, this is worth noting. If you want to learn more about ARM processor modes, examine Reference Manual of ARMv7 architecture.
Right after that more sophisticated things are done. MCR
and MRC
instructions are used. These are instructions communicating with coprocessors attached on the silicon. There are maximum 16 coprocessors – p15 is so called Control Processor , described here https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/about-control-coprocessor-cp15?lang=en. Instruction mnemonics supported by it are extensively described here https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/summary-of-cp15-instructions. These sites refer to not our architecture – some things might be not the same as in our chip, but they are well explained.
As you could read in above links, first instruction mrc p15, 0, r0, c1, c0, 0
loads current value of MMU-related Control Register into local, core register r0. Then program clears V-bit (with bic
instruction), which means, that exception vectors are kept in 0x00000000 address range and write it back to the Control Register with mcr
.
Next instruction mcr p15, 0, r0, c12, c0, 0
sets the Vector Base Address Register inside Control Processor to the address of _start
. This symbol was defined in our first listing in this article. Now all the exceptions caught by processor should be handled by the Vector defined in MLO (described in vector.s).
Before we jump to the long-awaited C code there are couple of assembly routines which must be executed. The first one is cpu_init_cp15
. This one is defined in file arch/arm/cpu/armv7/start.S, line 143. As you can see, there are a lot of MCR transfers, I will try to explain general idea behind the whole procedure without analyzing every single mnemonic.
The name of routine tells us, that once again coprocessor number 15 will be initialized. As the comments says line 143-148 invalidates all data saved in L1 cache, which after restart may have wrong values. These are TLBs (Translation Lookaside Buffers – records which speed up resolving virtual address into physical memory), instruction caches which might buffer adjacent blocks of assembly code, to speed up instruction fetches, and lastly invalidation of BP (Branch Prediction) arrays is done. Branch prediction is another mechanism of speeding up pipeline processors to fetch the right instruction, even if it is not the next mnemonic of our assembly code. If you want learn more in this area, I encourage to read wiki article https://en.wikipedia.org/wiki/Branch_predictor.
After invalidating caches, Data Synchronization Barrier is done to explicitly wait for the end of all memory operations, and ISB (Instruction Synchronization Barrier), described in documentation as flushing prefetch buffer is done. As documentation says:
The Flush Prefetch Buffer instruction flushes the pipeline in the processor, so that all instructions following the pipeline flush are fetched from memory, (including the instruction cache), after the instruction has been completed. Combined with Data Synchronization Barrier, and potentially a memory barrier, this ensures that any instructions written by the processor are executed. This guarantee is required as part of the mechanism for handling self-modifying code. The execution of a Data Synchronization Barrier instruction and the invalidation of the Instruction Cache and Branch Target Cache are also required for the handling of self-modifying code. The Flush Prefetch Buffer is guaranteed to perform this function, while alternative methods of performing the same task, such as a branch instruction, can be optimized in the hardware to avoid the pipeline flush (for example, by using a branch predictor).
Next bunch of assembly is commented as disabling MMU stuff and caches. Once again c1,c0 operation (Read Control Register) is made, than V-bit (once again), and CAM bits are cleared. Then A (bit 1) and Z (bit 11) are set. V-bit was described, previously. I don’t know why it’s repeated. CAM bits (https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/register-descriptions/c1–control-register?lang=en) disables data Cache, Alignment fault checking and MMU. According to the hash-defines, we can turn on or turn off instruction cache in this place. In our configuration it is turned on, so setting bit with orr
instruction is made.
After that, bunch of Errata stuff is done. Fortunately, none of these are applied in our build beside CONFIG_ARM_CORTEX_A8_CVE_2017_5715
. But before that more interesting operations are done
mov r5, lr @ Store my Caller
mrc p15, 0, r1, c0, c0, 0 @ r1 has Read Main ID Register (MIDR)
mov r3, r1, lsr #20 @ get variant field
and r3, r3, #0xf @ r3 has CPU variant
and r4, r1, #0xf @ r4 has CPU revision
mov r2, r3, lsl #4 @ shift variant field for combined value
orr r2, r4, r2 @ r2 has combined CPU variant + revision
/* Early stack for ERRATA that needs into call C code */
#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
ldr r0, =(CONFIG_SPL_STACK)
#else
ldr r0, =(CONFIG_SYS_INIT_SP_ADDR)
#endif
bic r0, r0, #7 /* 8-byte alignment for ABI compliance */
mov sp, r0
Once again CP15 is used to read our silicon ID. CPU variant and revision is combined and stored in r2 register. After that, according to configuration stack pointer address is chosen – in our case it is 0x4030ff10 (the value of CONFIG_SYS_INIT_SP_ADDR
). Check this address in part #1 of my series. You will find out, that it’s one of the highest accessible address on Internal SRAM memory. This address is aligned to 8-bytes with clearing 3 least significant bits. Finally stack pointer is initialized with mov
opcode. Wohoo! Doing it is last thing prepared by cpu_init_cp15
routine, so we are going back to the reset
routine and jump to another place, called cpu_init_crit
.
ENTRY(cpu_init_crit)
/*
* Jump to board specific initialization...
* The Mask ROM will have already initialized
* basic memory. Go here to bump up clock rate and handle
* wake up conditions.
*/
b lowlevel_init @ go setup pll,mux,memory
ENDPROC(cpu_init_crit)
It is the place, where more specialized initialization might be done. Our lowlevel_init
procedure is implemented in arch/arm/cpu/armv7/lowlevel_init.S
WEAK(s_init)
bx lr
ENDPROC(s_init)
.popsection
.pushsection .text.lowlevel_init, "ax"
WEAK(lowlevel_init)
/*
* Setup a temporary stack. Global data is not available yet.
*/
#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
ldr sp, =CONFIG_SPL_STACK
#else
ldr sp, =CONFIG_SYS_INIT_SP_ADDR
#endif
bic sp, sp, #7 /* 8-byte alignment for ABI compliance */
#ifdef CONFIG_SPL_DM
mov r9, #0
#else
/*
* Set up global data for boards that still need it. This will be
* removed soon.
*/
#ifdef CONFIG_SPL_BUILD
ldr r9, =gdata
#else
sub sp, sp, #GD_SIZE
bic sp, sp, #7
mov r9, sp
#endif
#endif
/*
* Save the old lr(passed in ip) and the current lr to stack
*/
push {ip, lr}
/*
* Call the very early init function. This should do only the
* absolute bare minimum to get started. It should not:
*
* - set up DRAM
* - use global_data
* - clear BSS
* - try to start a console
*
* For boards with SPL this should be empty since SPL can do all of
* this init in the SPL board_init_f() function which is called
* immediately after this.
*/
bl s_init
pop {ip, pc}
ENDPROC(lowlevel_init)
arch/armv7 directory contains generic implementation of low-level init. All routines are defined as so-called weak symbols. If linker will find other with the same name, it will leave weak symbol not used and choose the other. Basically it doesn’t implement nothing new. Once again stack pointer is initialized. CONFIG_SPL_DM
is defined (Driver Model for SPL), so r9 is zeroed.
Our program stack is ready, so we can use it with a push
instruction. Link-register and instruction pointer is passed, after that branch with copying return address to link-register is done. We are jumping to s_init
, but as we can see in the listing, it is empty.
Finally after returing to reset
routine, we are jumping to the _main
routine, which is really, really close to the C-world. It is defined in arch/arm/lib/crt0.S. As we can see, it is the code initializing C environment (clearing BSS, preparing heap etc. We will cover it in the next part of the series.