Many assembly tutorials and books doesn’t cover how to write a simple assembly program on the Mac OS X. Here are some baby steps that can help people who are also interested in assembly to get started easier.
Mach-O file format
To get started on writing OSX assembly, you need to understand OSX executable file format – the Mach-O file format. It’s similar to ELF, but instead of sections of data, bss, and text, it has segments that contains sections.
A common assembly in Linux like
1 2 3 |
|
would translate into this in Mach-O
1 2 3 4 5 6 |
|
Mach-O is pretty flexible. You can embed a
cstring
section in your __TEXT
segment instead
of putting it in __DATA,__data
. Actually this is
the default behavior that compiler does on your Mac.
Hello Assembly
Now we know how to translate common linux assembly to mac, let’s write a basic program – do a system call with an exit code.
On x86 you do a system call by int x80
instruction. On
64 bit machine, you do this by syscall
. Here’s the sample
code:
1 2 3 4 5 6 7 |
|
you can compile the code by the following commands:
1 2 3 4 |
|
To perform a system call, you put the system call number in
%eax
, and put the actual exit code to %ebx
. The system
call number can be found in /usr/include/sys/syscall.h
.
1 2 3 4 5 6 7 8 |
|
The system call number need to add an offset 0x2000000
, because
OSX has 4 different class of system calls. You can find the reference
here XNU syscall.
System call by using wrapper functions
If you’re like me that had no assembly background, you might
feel that syscall
is alien to you. In C, we usually use
wrapper functions to perform the call:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now we call a libc
function instead of performing a system
call. To do this we need to link to libc by passing -lc
to linker ld
. There are several things you need to do
to make a function call.
Call frame
We need to prepare the stack before we call a function. Else
you would probably get a segmentation fault.
The values in %rsp
and %rbp
is used to preserve frame information.
To maintain the stack, you first push the base register %rbp
onto the stack by pushq %rbp
;
then you copy the stack register %rsp
to the base register.
If you have local variables, you subtract %rsp
for space.
Remember, stack grows down and heap grows up.
When releasing the frame, you add the space back to %rsp
.
A live cycle of a function would look like this:
1 2 3 4 5 6 7 8 9 |
|
The stack size can be set at link time. On OSX, below are the
example parameters you can pass to ld
to set the stack size:
1
|
|
When setting the stack size, you also have to set the stack address. On the System V Application Binary Interface it says
Although the AMD64 architecture uses 64-bit pointers, implementations are only required to handle 48-bit addresses. Therefore, conforming processes may only use addresses from
0x00000000 00000000
to0x00007fff ffffffff
I don’t know a good answer of how to chose a good stack address. I just copy whatever a normal code produces.
Parameters passing
The rules for parameter passing can be found in System V Application Binary Interface:
- If the class is MEMORY, pass the argument on the stack. If the size of an object is larger than four eight bytes, or it contains unaligned fields, it has class MEMORY.
- If the class is INTEGER, the next available register of the sequence
%rdi
,%rsi
,%rdx
,%rcx
,%r8
and%r9
is used. - If the class is SSE, the next available vector register is used, the registers
are taken in the order from
%xmm0
to%xmm7
.
The exit()
function only need one integer parameter, therefore we put
the exit code in %edi
. Since the parameter is type int
, we use 32 bit
variance of register %rdi
and the instruction is movl
(mov long) instead
of movq
(mov quad).
Hello world
Now we know the basics of how to perform a system call, and how to call a function. Let’s write a hello world program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The global variable str
can only be accessed through GOT
(Global Offset Table). And the GOT needs to be access from
the instruction pointer %rip
. For more curious you can
read Mach-O Programming Topics: x86-64 Code Model.
The register used for syscall
parameters are a little
bit different than the normal function call.
It uses %rdi
, %rsi
, %rdx
, %r10
, %r8
and %r9
.
You cannot pass more than 6 parameters in syscall
, nor
can you put the parameters on the stack.
Hello world using printf
Now you know the basics of assembly. A hello world example using printf should be trivial to read:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Conclusion
The 64 bit assembly looks more vague than the tutorials written in X86 assembly. Once you know these basic differences, it’s easy for you to learn assembly in depth on your own, even if the material is designed for x86. I highly recommend the book “Programming from the ground up”. It is well written for self study purpose.
References
- OS X ABI Mach-O File Format Reference
- System V Application Binary Interface
- OS X Assembler Reference Assembler Directives
- Mach-O Programming Topics
- Mach-O Executables - Build Tools
- Book: Programming from the ground up.