Many assembly tutorials and books doesn’t cover how to write a simple assembly program on the Mac OS X. Here are some baby steps that can help people who are also interested in assembly to get started easier.
Mach-O file format
To get started on writing OSX assembly, you need to understand OSX executable file format – the Mach-O file format. It’s similar to ELF, but instead of sections of data, bss, and text, it has segments that contains sections.
A common assembly in Linux like
1 2 3
would translate into this in Mach-O
1 2 3 4 5 6
Mach-O is pretty flexible. You can embed a
cstring section in your
__TEXT segment instead
of putting it in
__DATA,__data. Actually this is
the default behavior that compiler does on your Mac.
Now we know how to translate common linux assembly to mac, let’s write a basic program – do a system call with an exit code.
On x86 you do a system call by
int x80 instruction. On
64 bit machine, you do this by
syscall. Here’s the sample
1 2 3 4 5 6 7
you can compile the code by the following commands:
1 2 3 4
To perform a system call, you put the system call number in
%eax, and put the actual exit code to
%ebx. The system
call number can be found in
1 2 3 4 5 6 7 8
The system call number need to add an offset
OSX has 4 different class of system calls. You can find the reference
here XNU syscall.
System call by using wrapper functions
If you’re like me that had no assembly background, you might
syscall is alien to you. In C, we usually use
wrapper functions to perform the call:
1 2 3 4 5 6 7 8 9 10 11 12
Now we call a
libc function instead of performing a system
call. To do this we need to link to libc by passing
ld. There are several things you need to do
to make a function call.
We need to prepare the stack before we call a function. Else
you would probably get a segmentation fault.
The values in
%rbp is used to preserve frame information.
To maintain the stack, you first push the base register
onto the stack by
then you copy the stack register
%rsp to the base register.
If you have local variables, you subtract
%rsp for space.
Remember, stack grows down and heap grows up.
When releasing the frame, you add the space back to
A live cycle of a function would look like this:
1 2 3 4 5 6 7 8 9
The stack size can be set at link time. On OSX, below are the
example parameters you can pass to
ld to set the stack size:
When setting the stack size, you also have to set the stack address. On the System V Application Binary Interface it says
Although the AMD64 architecture uses 64-bit pointers, implementations are only required to handle 48-bit addresses. Therefore, conforming processes may only use addresses from
I don’t know a good answer of how to chose a good stack address. I just copy whatever a normal code produces.
The rules for parameter passing can be found in System V Application Binary Interface:
- If the class is MEMORY, pass the argument on the stack. If the size of an object is larger than four eight bytes, or it contains unaligned fields, it has class MEMORY.
- If the class is INTEGER, the next available register of the sequence
- If the class is SSE, the next available vector register is used, the registers
are taken in the order from
exit() function only need one integer parameter, therefore we put
the exit code in
%edi. Since the parameter is type
int, we use 32 bit
variance of register
%rdi and the instruction is
movl (mov long) instead
movq (mov quad).
Now we know the basics of how to perform a system call, and how to call a function. Let’s write a hello world program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
The global variable
str can only be accessed through GOT
(Global Offset Table). And the GOT needs to be access from
the instruction pointer
%rip. For more curious you can
read Mach-O Programming Topics: x86-64 Code Model.
The register used for
syscall parameters are a little
bit different than the normal function call.
You cannot pass more than 6 parameters in
can you put the parameters on the stack.
Hello world using printf
Now you know the basics of assembly. A hello world example using printf should be trivial to read:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
The 64 bit assembly looks more vague than the tutorials written in X86 assembly. Once you know these basic differences, it’s easy for you to learn assembly in depth on your own, even if the material is designed for x86. I highly recommend the book “Programming from the ground up”. It is well written for self study purpose.