ARM64 floating point arithmetic
Project Overview
In previous posts, normally at last of execution we ask the Linux exit()
to return an integer result. But in this post we’ll do pointing point arithmetic, so we have to change the game plan – we’ll use the C function printf()
to print out the floating number.
We write out add.h
and main.c
:
add.h
:
float add(float a, float b);
main.c
:
#include "add.h"
#include <stdio.h>
int main() {
printf("%f\n", add(3.4, -2.7));
return 0;
}
what remains is to write an add.s
, and eventually link it with main.c
to produce the executable file.
The adding function in assembly
Recall that all constants are preceded by #
. 0x
denotes hexadecimal, otherwise decimal. So #0x10
is the hexadecimal 10, equivalent to decimal 16. #12
is just the decimal 12.
add.s
:
.global add
.text
add:
sub sp, sp, #0x10 // set sp to sp - 16 bytes
str s0, [sp, #12] // store s0 first param at sp + 12 bytes
str s1, [sp, #8] // store s1 second param at sp + 8 bytes
sub sp, sp, #0x10 // set sp to sp - 16 bytes once more
str lr, [sp, #8] // store link register lr at sp + 8 bytes
str fp, [sp] // store frame pointer fp at sp
mov fp, sp // set frame pointer fp to sp
fadd s0, s0, s1 // floating-point add
// store result in s0
ldr fp, [sp] // restore frame pointer fp from sp
ldr lr, [sp, #8] // restore link register lr from sp + 8 bytes
add sp, sp, #0x10 // set sp to sp + 16 bytes
// now, the params s0, s1 at function entry
// should be stored at sp + 8 bytes, sp + 16 bytes
// but we don't restore them
add sp, sp, #0x10 // set sp to sp + 16 bytes once more
// fully unwind stack used by this function
ret
Floating point registers
64-bit floating point registers are named d0
, d1
, … Their lower 32-bit parts are named s0
, s1
, …
The C float
type is typically 32-bit, or occupying 4 bytes. That’s why in the code above, the sp + 8 bytes to sp + 15 bytes on the stack are used to store the two parameters at entry s0
and s1
, while leaving sp to sp + 7 bytes empty:
sub sp, sp, #0x10 // set sp to sp - 16 bytes
str s0, [sp, #12] // store s0 first param at sp + 12 bytes
str s1, [sp, #8] // store s1 second param at sp + 8 bytes
Stack use
The stack grows downward: its pointer’s address decreases as more data is pushed onto it. Before the fadd
floating point operation, the stack would have been prepared and look like
s0 4 bytes
s1 4 bytes
[empty 8 bytes]
lr 8 bytes
old fp 8 bytes <-- sp
In the link register lr is stored the return address for the add()
function: somewhere inside the main()
function.
The frame pointer fp momentarily points to the current stack pointer sp during the life of the fadd
operation, before being restored to the old frame pointer, in preparation for returning to the main()
function.
Make the code run
We assemble the add.s
into a binary file add.o
:
$ as -o add.o add.s
then compile the remaing C files, and link with add.o
to produce the final executable file main
:
$ gcc -o main main.c add.o
We run it:
$ ./main
0.700000
References
Arm Compiler armasm User Guide. On https://developer.arm.com, search for “armasm user guide”. In the result list, find the latest version of “Arm Compiler armasm User Guide”.
ARM64 function calls, /2025/08/04/arm64-func.html.
Introduction to Aarch64 architecture, 8. The Stack, https://hrishim.github.io/llvl_prog1_book/stack.html.