Last modified: Jun 28, 2026Home / 90_archive / Fynur Os.Md

archive

writing for x86 architecture, we’ll use x86 assembly language. here’s a breakdown of how a computer starts up and how we can write a minimal os to run on it.

how a computer starts up

when a computer powers on, the bios is copied from a rom chip into ram. the bios immediately starts executing code, initializing hardware and running a series of tests called post (power-on self test). once this is done, the bios searches for an operating system to load, then starts it, after which the os takes over.

how the bios finds an os

there are two main methods:

legacy method (what we’ll use)

the bios loads the first sector of each bootable device into memory at address 0x7c00. it then checks for the signature 0xaa55. if found, the bios starts executing the code in that sector.
efi method

the bios looks for special efi partitions. the os must be compiled as an efi program to run this way.

setting up our os

our plan:

write some code, assemble it, and place it in the first sector of a floppy disk
add the required bios signature
test that the os loads correctly

the bios always loads our os at memory address 0x7c00. to let the assembler know where our code will live, we use the org directive:

org 0x7c00

org tells the assembler the expected load address so it can calculate label offsets correctly. at this point, it’s important to understand the difference between a directive and an instruction:

directive: guides the assembler during compilation; not translated to machine code. assembler-specific.
instruction: translated into machine code executed by the cpu.

next, we specify that we want 16-bit code, using the bits directive:

org 0x7c00
bits 16

x86 cpus maintain backward compatibility with the 8086 cpu, intel’s first 16-bit microprocessor. this ensures that code written for the 8086 still runs on modern processors. because of this, the cpu always starts in 16-bit mode.

our minimal os starts with a main label, where we simply halt the cpu:

org 0x7c00
bits 16

main:
    hlt

to make sure the cpu doesn’t run past our program, we add an infinite loop:

org 0x7c00
bits 16

main:
    hlt

.halt:
    jmp .halt

jmp jumps unconditionally to a label, like a goto in c. this keeps the cpu in a safe loop if it resumes execution.

adding the bios signature

bios expects the last 2 bytes of the first sector (512 bytes) to be 0xaa55. we fill the remaining bytes with zeros using times and db:

$ = current line’s memory offset
$$ = start of current section
$-$$ = program size so far
times 510-($-$$) db 0 fills the program with zeros until byte 510, leaving the last 2 bytes for the signature.

org 0x7c00
bits 16

main:
    hlt

.halt:
    jmp .halt

times 510-($-$$) db 0
dw 0AA55h

dw defines a 2-byte word in little-endian format, giving us the required 0xaa55 signature.

this completes a working minimal os: it loads correctly and halts the processor. progress, even if it does nothing yet.

building the project

make sure to have the following file structure:

don’t worry about the main.bin and main_floppy.img, they are created after we run the following Makefile

code for the Makefile to build the project:

add a rule to assemble main.asm using nasm and output it as a binary file.
add a rule to create a floppy disk image: take the previously built binary, copy it, and pad with zeros to reach 1.44 mb.

ASM=nasm

SRC_DIR=src
BUILD_DIR=build

# build floppy disk image by padding binary to 1.44 MB
$(BUILD_DIR)/main_floppy.img: $(BUILD_DIR)/main.bin
    cp $(BUILD_DIR)/main.bin $(BUILD_DIR)/main_floppy.img
    truncate -s 1440k $(BUILD_DIR)/main_floppy.img

# assemble main.asm to binary
$(BUILD_DIR)/main.bin: $(SRC_DIR)/main.asm
    mkdir -p $(BUILD_DIR)
    $(ASM) $(SRC_DIR)/main.asm -f bin -o $(BUILD_DIR)/main.bin

you can use any virtualization software, such as virtualbox or vmware. here, qemu is preferred because it’s easy to set up and can be run entirely from the command line.

since qemu is being used, the qemu-system-x86 package will run the os.

install it with:

apt install qemu-system-x86

run the built image using:

qemu-system-x86_64 -fda build/main_floppy.img

once executed, the OS boots from the floppy and performs no operations, exactly as intended.

now that it works, let’s try to print a “hello world” message to the screen.

before doing this, let us first understand some concepts about the basics of the x86 architecture.

x86 cpu registers
- all processors have a number of registers, which are really small pieces of memory that can be written and read very fast and are built into the CPU.
- here is a diagram of all the registers on an x86 CPU:

There are several types of registers:
- the general-purpose registers can be used for almost any purpose (RAX, RBX, RCX, RDX, R8-R15 including their smaller counter parts, EAX, AX, AL, AH etc)
- the index registers (RSI, RDI) are usually used for keeping indices and pointers; they can also be used for other purposes
- the program counter (RIP) is a special register which keeps track of which memory location the current instruction begins at
- the segment registers (CS, DS, ES, FS, GS, SS) are used to keep track of the currently active memory segments (which we will see in just a moment)
- there is also a flags register (RFLAGS) which contains some special flags set by various instructions
- there are a few more special purpose registers, but we will see them later only when we need them

Real memory model

now let’s talk a bit about RAM. the 8086 CPU had a 20-bit address bus, which meant that you could access up to 2²⁰ , or about 1 MB of memory. at the time, typical computers had around 64 to 128 KB, so the engineers at intel thought this limit was huge. for various reasons, they decided to use a segment and offset addressing scheme for addressing memory.

              0x1234:0x5678
              segment:offset

in this scheme, you use two 16-bit values, the segment and the offset. each segment contains 64 KB of memory, where each byte can be accessed by using the offset value. segments overlap every 16 bytes.

this means that you can convert a segment:offset address to an absolute address by shifting the segment four bits to the left (or multiplying it by 16), and then adding the offset.

linear_address = segment << 4 + offset;
// or
linear_address = segment * 16 + offset;

this also means that there are multiple ways of addressing the same location in memory. for example, the absolute address 0x7C00 (where the BIOS loads our operating system) can be written as any combination that you can see below:

segment:offset     linear_address
 0x0000:0x7C00         0x7C00
 0x0001:0x7BF0         0x7C00
 0x0010:0x7B00         0x7C00
 0x00C0:0x7000         0x7C00
 0x07C0:0x0000         0x7C00

there are some special registers which are used to specify the actively used segments:

CS contain the code segment, which is the segment the processor executes code from. the IP register (the program counter) only gives us the offset!
DS and ES are data segments. newer processors introduced additional data segments FS and GS
SS contains the current stack register

in order to access (read or write) any memory location, its segment needs to be loaded into one of these registers, by setting the corresponding register. the code segment can only be modified by performing a jump.

now, how do we reference a memory location from assembly? We use this syntax: [segment : base + index * scale + displacement]

where:

segment: one of CS, DS, ES, FS, GS, SS. Default: DS (SS if BP is used as base)
base
- 16-bit: BP or BX
- 32/64-bit: any general purpose register
index:
- 16-bit: SI or DI
- 32/64-bit: any general purpose register
scale (32/64-bit only): 1, 2, 4 or 8
displacement: a signed constant number

the processor is capable of doing some arithmetic for us, as long as we use this expression.

in 16-bit mode, there are a few limitations because that’s how the 8086 CPU was originally designed. this was probably done to keep the complexity and cost down. another example of one such limitation is that we can’t write constants to the segment registers directly, we have to use an intermediary register.

with the introduction of the 386 processor just a few years later, 32-bit mode was introduced which pretty much rendered 16-bit mode obsolete. a lot of newer CPU features were simply not added to the 16-bit mode, because it is obsolete and only exists for backwards compatibility. however, it is still useful to learn, because most of the things that apply to a 16-bit mode also apply to 32-bit and 64 bit modes. the main use today of 16-bit mode is in the startup sequence; most operating systems switch to 32 or 64-bit mode immediately after starting up.

we are limited to the first sector of a floppy disk (512 bytes) which is very little space. once we are able to load a from the disk, we can do a lot more.

all operating systems have to do the same thing in order to boot, but until we get there, let’s get back to referencing our memory locations

so, we already talked about the base and index operands. the scale and displacement operands are numerical constants; the scale can only be used in 32 and 64-bit modes, and it can only have a value of 1, 2, 4 or 8. the displacement can be any signed integer constant.

all the operands in a memory reference expression are optional, so we only have to use whatever you need.

examples:

example 1:

```nasm var: dw 100
```
mov ax, var; copy offset to axmov ax, [var]; copy memory contents of ds:var to ax
```
```

first,we defined a label which points to a word having the value 100.

the first instruction mov ax, var puts the offset of the label into the ax register.

then the second instruction mov ax, [var] copies the memory contents that our label points to. since we didn’t specify a segment register, DS is going to be used. we haven’t used the base, index or scale, but only a constant, which is the offset denoted by the “var” label. in assembly, labels are simply constants which point to specific memory offsets.
example 2:

```nasm array: dw 100, 200, 300
```
; read third element in array
mov bx, array ; copy offset to ax
mov si, 2 * 2 ; array[2], words are 2 bytes wide
mov ax, [bx + si] ; copy memory contents
```
```

here’s a more complicated example, where we want to read the third element in an array. we put the offset of the array into BX, and the index of the third element in SI. since we use zero-based indexing, the third element is at array[2]; each element in the array is a word, which is 2 bytes wide, so we put in SI the value 4.

you can see here that we use the multiplication symbol. the assembler is capable of calculating the result of constant expressions, and put the result in the resulting machine code. however, you can’t write mov bx, ax * 2. AX is not known at compile time, so it is not a constant.

to perform this multiplication, you have to use the MUL (multiply) instruction. referencing memory is the only place where you can put registers in an expression!

finally, we put into AX the third element in the array, by referencing the memory location at BX + SI. BX is our base register, and SI is our index register.

back to the OS - the initialization

back to our operating system, the code segment register has been set up for us by the BIOS and it points to segment 0. there are some BIOSes out there which actually jump to our code using a different segment and offset such 0x07C0:0x0000, but the standard behavior is to use 0x0000:0x7C00. we don’t know if DS and ES are properly initialized, so this is what we have to do next. since we can’t write a constant directly to a segment register, we have to use an intermediary register; we will use AX. the MOV (move) instruction copies data from the source on the left side to the destination on the right side.

main:
    ; setup data segments
    mov ax, 0           ; can't set ds/es directly
    mov ds, ax
    mov es, ax

    ; setup stack
    mov ss, ax
    mov sp, 0x7C00      ; stack grows downwards from where we are loaded in memory

we also set up the stack segment (SS) to 0, and the stack pointer (SP) to the beginning of our program.

hmm, so what exactly is this stack?

the stack is a piece of memory that we can access in a “first in last out” manner, using the PUSH and POP instructions. the stack also has a special purpose when using functions.

when you call a function, the return address is added to the stack, and when you return from a function, the processor will read the return address from the stack and then jump to it.

another thing to note about the stack is that it grows downwards! SP points to the top of the stack. when you push something, SP is decremented by the number of bytes pushed, and then the data is written to memory. this is why we set up the stack to point to the start of our operating system: because it grows downwards. if we set it up to the end of our program, it would overwrite our program. We don’t want that, so we just put it somewhere where it won’t overwrite anything. the beginning of our operating system is a pretty safe spot.

now we’ll start coding a puts function which prints a string to the screen.

start:
    jmp main

;
; Prints a string to the screen
; Params:
;   - ds:si points to string
;
puts:

    ; .......


main:

our function will receive a pointer to a string in DS:SI and it will print characters until it encounters a null character.because we decided to write the function above main,we have to add a jump instruction above, so main is still the entry point to our program.

first, we push the registers that we’re going to modify to the stack, after which we enter the main loop.

puts:
    ; save registers we will modify
    push si
    push ax
    push bx

.loop:
    lodsb               ; loads next character in al

the lodsb (load string byte) instruction loads a byte from the address DS:SI into the AL register, and then increments SI.

next, we wrote the loop exit condition; the or instruction performs a bit-wise “or” and stores the result in the left operand, in this case AL. OR-ing a value to itself won’t modify the value at all, but it will modify is the FLAGS register. if the result is 0, the “zero” flag (ZF) will be set.

    or al, al; verify if next character is null?
    jz .done; exit condition

; todo .....

    jmp .loop

.done:
    pop bx
    pop ax
    pop si
    ret