Learn, Implement and Share


This article is a part of Arduino / ATmega328p Embedded C Firmware Programming Tutorial. Consider exploring the course home page for articles on similar topics.

Arduino Tutorial Embedded C Register Level Arduino Master Class

Arduino Tutorial Embedded C Register Level Arduino Master Class

Also visit the Release Page for Register Level Embedded C Hardware Abstraction Library and Code for AVR.


This section discusses the AVR CPU Core, ALU, Status Register, Stack Pointer, Interrupt handling, and Instruction processing. The main function of the CPU core is to ensure correct program execution. The AVR CPU is capable to access memories, perform calculations, control peripherals, and handle interrupts. The components of the CPU were discussed in the AVR Architecture article.

What You Will Learn

  • How the Arduino CPU works?
  • How the CPU of AVR ATmega328p chip works?
  • What are the CPU components of ATmega328p?
  • What are the different CPU instructions in ATmega328p?
  • How many CPU Registers are there in Arduino/ATmega328p?

8-Bit AVR CPU Operation

AVR CPU Core Architecture
AVR CPU Core Architecture

AVR uses a Harvard architecture – with separate memories and buses for program and data. Instructions in the program memory are executed with a single level pipelining. This improves performance and parallelism. While one instruction is being executed, the next instruction is pre-fetched from the program memory. This concept enables instructions to be executed in every clock cycle. The program memory is In-System Reprogrammable Flash memory.

The fast-access Register File contains 32 x 8-bit general purpose working registers with a single clock cycle access time. This allows single-cycle Arithmetic Logic Unit (ALU) operation. In a typical ALU operation, two operands are output from the Register File, the operation is executed, and the result is stored back in the Register File in one clock cycle.

Six of the 32 registers can be used as three 16-bit indirect address register pointers for Data Space addressing – enabling efficient address calculations. One of these address pointers can also be used as an address pointer for look up tables in Flash program memory. These added function registers are the 16-bit X-, Y-, and Z register described later in this section.

The ALU supports arithmetic and logic operations between registers or between a constant and a register. Single register operations can also be executed in the ALU. After an arithmetic operation, the Status Register is updated to reflect information about the result of the operation.

Program flow is provided by conditional and unconditional jump and call instructions, able to directly address the whole address space. Most AVR instructions have a single 16-bit word format. Every program memory address contains a 16- or 32-bit instruction.

Program Flash memory space is divided into two sections, the Boot Program section and the Application Program section. Both sections have dedicated Lock bits for write and read/write protection. The SPM instruction writes into the Application Flash memory section must reside in the Boot Program section. During interrupts and subroutine calls, the return address Program Counter (PC) is stored on the Stack. The Stack is effectively allocated in the general data SRAM, and consequently, the Stack size is only limited by the total SRAM size and the usage of the SRAM. All user programs must initialize the SP in the Reset routine (before subroutines or interrupts are executed). The Stack Pointer (SP) is read/write accessible in the I/O space. The data SRAM can easily be accessed through the five different addressing modes supported in the AVR architecture.

The memory spaces in the AVR architecture are all linear and regular memory maps. A flexible interrupt module has its control registers in the I/O space with an additional Global Interrupt Enable bit in the Status Register. All interrupts have a separate Interrupt Vector in the Interrupt Vector table. The interrupts have priority in accordance with their Interrupt Vector position. The lower the Interrupt Vector address, the higher the priority.

8-Bit AVR Arithmetic Logic Unit (ALU)

The high-performance AVR ALU operates in direct connection with all the 32 general purpose working registers. Within a single clock cycle, arithmetic operations between general purpose registers or between a register and an immediate are executed. The ALU operations are divided into three main categories – arithmetic, logical, and bit functions.

ADDRd, RrAdd two RegistersRd <- Rd + RrZ,C,N,V,H1
ADCRd, RrAdd with Carry two RegistersRd <- Rd + Rr + CZ,C,N,V,H1
ADIWRdl,KAdd Immediate to WordRdh:Rdl <- Rdh:Rdl + KZ,C,N,V,S2
SUBRd, RrSubtract two RegistersRd <- Rd – RrZ,C,N,V,H1
SUBIRd, KSubtract Constant from RegisterRd <- Rd – KZ,C,N,V,H1
SBCRd, RrSubtract with Carry two RegistersRd <- Rd – Rr – CZ,C,N,V,H1
SBCIRd, KSubtract with Carry Constant from Reg.Rd <- Rd – K – CZ,C,N,V,H1
SBIWRdl,KSubtract Immediate from WordRdh:Rdl <- Rdh:Rdl – KZ,C,N,V,S2
ANDRd, RrLogical AND RegistersRd <- Rd & RrZ,N,V1
ANDIRd, KLogical AND Register and ConstantRd <- Rd & KZ,N,V1
ORRd, RrLogical OR RegistersRd <- Rd | RrZ,N,V1
ORIRd, KLogical OR Register and ConstantRd <- Rd | KZ,N,V1
EORRd, RrExclusive OR RegistersRd <- Rd ⊕ RrZ,N,V1
COMRdOne’s ComplementRd <- 0xFF – RdZ,C,N,V1
NEGRdTwo’s ComplementRd <- 0x00 – RdZ,C,N,V,H1
SBRRd,KSet Bit(s) in RegisterRd <- Rd | KZ,N,V1
CBRRd,KClear Bit(s) in RegisterRd <- Rd & (0xFF – K)Z,N,V1
INCRdIncrementRd <- Rd + 1Z,N,V1
DECRdDecrementRd <- Rd – 1Z,N,V1
TSTRdTest for Zero or MinusRd <- Rd & RdZ,N,V1
CLRRdClear RegisterRd <- Rd ⊕ RdZ,N,V1
SERRdSet RegisterRd <- 0xFFNone1
MULRd, RrMultiply UnsignedR1:R0 <- Rd x RrZ,C2
MULSRd, RrMultiply SignedR1:R0 <- Rd x RrZ,C2
MULSURd, RrMultiply Signed with UnsignedR1:R0 <- Rd x RrZ,C2
FMULRd, RrFractional Multiply UnsignedR1:R0 <- (Rd x Rr) << 1Z,C2
FMULSRd, RrFractional Multiply SignedR1:R0 <- (Rd x Rr) << 1Z,C2
FMULSURd, RrFractional Multiply Signed with UnsignedR1:R0 <- (Rd x Rr) << 1Z,C2

8-Bit AVR Status Register

The Status Register contains information about the result of the most recently executed arithmetic instruction. This information can be used for altering program flow in order to perform conditional operations. Status Register is updated after all ALU operations. This will in many cases remove the need for using the dedicated compare instructions, resulting in faster and more compact code.

The Status Register is not automatically stored when entering an interrupt routine and restored when returning from an interrupt. This must be handled by software. When addressing I/O Registers as data space using LD and ST instructions, the provided offset must be used. When using the I/O specific commands IN and OUT, the offset is reduced by 0x20, resulting in an I/O address offset within 0x00-0x3F.

Bit 76543210
Initial Value 

Bit 7 – I: Global Interrupt Enable
The Global Interrupt Enable bit must be set for the interrupts to be enabled. The individual interrupt enable control is then performed in separate control registers. If the Global Interrupt Enable Register is cleared, none of the interrupts is enabled independent of the individual interrupt enable settings.

Bit 6 – T: Bit Copy Storage
The Bit Copy instructions BLD (Bit LoaD) and BST (Bit STore) use the T-bit as source or destination for the operated bit.

• Bit 5 – H: Half Carry Flag
The Half Carry Flag H indicates a Half Carry in some arithmetic operations. Half Carry Is useful in BCD arithmetic.

• Bit 4 – S: Sign Bit, S = N⊕ V
The S-bit is always an exclusive or between the Negative Flag N and the Two’s Complement Overflow Flag V.

• Bit 3 – V: Two’s Complement Overflow Flag
The Two’s Complement Overflow Flag V supports two’s complement arithmetic.

• Bit 2 – N: Negative Flag
The Negative Flag N indicates a negative result in an arithmetic or logic operation.

• Bit 1 – Z: Zero Flag
The Zero Flag Z indicates a zero result in an arithmetic or logic operation.

• Bit 0 – C: Carry Flag
The Carry Flag C indicates a carry in an arithmetic or logic operation.

8-Bit AVR General Purpose Registers

The Register File is optimized for the AVR Enhanced RISC instruction set. In order to achieve the required performance and flexibility, the following input/output schemes are supported by the Register File:

  • One 8-bit output operand and one 8-bit result input
  • Two 8-bit output operands and one 8-bit result input
  • Two 8-bit output operands and one 16-bit result input
  • One 16-bit output operand and one 16-bit result input
AVR 8-Bit CPU General Purpose Working Registers
AVR 8-Bit CPU General Purpose Working Registers

Most of the instructions operating on the Register File have direct access to all registers, and most of them are single cycle instructions. Each register is also assigned a data memory address, mapping them directly into the first 32 locations of the user Data Space. Although not being physically implemented as SRAM locations, this memory organization provides great flexibility in access of the registers, as the X-, Y- and Z-pointer registers can be set to index any register in the file.

The registers R26…R31 have some added functions to their general purpose usage. These registers are 16-bit address pointers for indirect addressing of the data space.

The X, Y, and Z Rregisters of AVR 8-Bit Core
The X, Y, and Z Rregisters of AVR 8-Bit Core

8-Bit AVR Stack Pointer / Register

The stack is mainly used for storing temporary data, local variables, and return addresses after interrupts and subroutine calls. It is implemented as growing from higher to lower memory locations. The Stack Pointer Register always points to the top of the stack; it points to the data SRAM Stack area where the subroutine and interrupt stacks are located.

The Stack in the data SRAM must be defined by the program before any subroutine calls are executed or interrupt are enabled. Initial Stack Pointer value equals the last address of the internal SRAM and the Stack Pointer must be set to point above the start of the SRAM. The AVR Stack Pointer is implemented as two 8-bit registers in the I/O space. The number of bits actually used depends on the implementation. Data space in some implementations of the AVR architecture is so small that only the Stack Pointer Low (SPL) register is needed. In this case, the Stack Pointer High (SPH) register will not be present.

0x3E (0x5E) SP15 SP14 SP13 SP12 SP11 SP10 SP9 SP8 
0x3D (0x5D) SP7 SP6 SP5 SP4 SP3 SP2 SP1 SP0 
Initial Value RAMEND
InstructionStack pointerDescription
PUSHDecrement by 1Data is pushed onto the stack
Decrement by 2The return address is pushed onto the stack with a subroutine call or interrupt
POPIncrement by 1Data is popped from the stack
Increment by 2The return address is popped from the stack with return from
subroutine or return from interrupt
Stack Pointer Instructions

8-Bit AVR Instruction Execution Timing

The AVR CPU is driven by the CPU clock clkCPU, directly generated from the selected clock source for the chip. No internal clock division is used. The basic pipelining concept is used to obtain up to 1 MIPS per MHz. The parallel instruction fetches and instruction executions enabled by the Harvard architecture and the fast-access Register File enable this performance.

AVR 8-Bit Parallel Instruction Fetches and Instruction Executions
AVR 8-Bit Parallel Instruction Fetches and Instruction Executions

In a single clock cycle, an ALU operation using two register operands is executed, and the result is stored back to the destination register.

AVR 8-Bit Single Cycle ALU Operation
AVR 8-Bit Single Cycle ALU Operation

8-Bit AVR Reset and Interrupt Handling and Timing

The AVR provides several different interrupt sources. These interrupts and the separate Reset Vector each have a separate program vector in the program memory space. All interrupts are assigned individual enable bits which must be written logic one together with the Global Interrupt Enable bit in the Status Register in order to enable the interrupt.

This list determines the priority levels of the different interrupts. The lower the address the higher is the priority level. The lowest addresses in the program memory space are by default defined as the Reset and Interrupt Vectors.

Vector No.Program
SourceInterrupt Definition
10x0000RESETExternal Pin, Power-on Reset, Brown-out Reset, and Watchdog System Reset
20x0002INT0External Interrupt Request 0
30x0004INT1External Interrupt Request 1
40x0006PCINT0Pin Change Interrupt Request 0
50x0008PCINT1Pin Change Interrupt Request 1
60x000APCINT2Pin Change Interrupt Request 2
70x000CWDTWatchdog Time-out Interrupt
80x000ETIMER2 COMPATimer/Counter2 Compare Match A
90x0010TIMER2 COMPBTimer/Counter2 Compare Match B
100x0012TIMER2 OVFTimer/Counter2 Overflow
110x0014TIMER1 CAPTTimer/Counter1 Capture Event
120x0016TIMER1 COMPATimer/Counter1 Compare Match A
130x0018TIMER1 COMPBTimer/Coutner1 Compare Match B
140x001ATIMER1 OVFTimer/Counter1 Overflow
150x001CTIMER0 COMPATimer/Counter0 Compare Match A
160x001ETIMER0 COMPBTimer/Counter0 Compare Match B
170x0020TIMER0 OVFTimer/Counter0 Overflow
180x0022SPI, STCSPI Serial Transfer Complete
190x0024USART, RXUSART Rx Complete
200x0026USART, UDREUSART, Data Register Empty
210x0028USART, TXUSART, Tx Complete
220x002AADCADC Conversion Complete
240x002EANALOG COMPAnalog Comparator
250x0030TWI2-wire Serial Interface
260x0032SPM READYStore Program Memory Ready

When an interrupt occurs, the Global Interrupt Enable I-bit is cleared and all interrupts are disabled. The user software can write logic one to the I-bit to enable nested interrupts. All enabled interrupts can then interrupt the current interrupt routine. The I-bit is automatically set when a Return from Interrupt instruction – RETI – is executed. When the AVR exits from an interrupt, it will always return to the main program and execute one more instruction before any pending interrupt is served.

The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles, the program vector address for the actual interrupt handling routine is executed. During this four clock cycle period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during the execution of a multi-cycle instruction, this instruction is completed before the interrupt is served. If an interrupt occurs when the MCU is in sleep mode, the interrupt execution response time is increased by four clock cycles. This increase comes in addition to the start-up time from the selected sleep mode.

A return from an interrupt handling routine takes four clock cycles. During these four clock cycles, the Program Counter (two bytes) is popped back from the Stack, the Stack Pointer is incremented by two, and the I-bit in SREG is set.

Crazy Engineer



Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.