We will now return to the problem we left at
– why was the exit value 97 instead of 5? We now know that volatile
registers don't retain their values across function calls, and when we
called getchar(), the contents of r3 got overwritten. How do we overcome
this? We can store the value of r3 in a non-volatile register before
calling getchar. By convention, while using non-volatile registers, we
use them in the order r31, r30 ... r13. Also, as we use each
non-volatile register, we need to save them in the stack, and restore
them when we return from the function. Hence the above program should be
modified as follow:
- .set r0, 0
- .set r1, 1
- .set r3, 3
- .set r31, 31
-
- .extern .getchar # Tell the assembler that .getchar is an external symbol
-
- .csect
- .globl .main
- .main:
- # Function prolog begins
- mflr r0 # Get the link register in r0
- stw r31, -4(r1) # Save caller's r31
- stw r0, 8(r1) # Save the link register
- stwu r1, -64(r1) # Store the stack pointer, and update. Create a frame of 64 bytes.
- # Function prolog ends
-
- li r3, 5
- mr r31, r3 # Copy contents of r3 into r31 for retention
- bl .getchar
- ori r0, r0, 0 # No-op, required by compiler/loader after a branch to an external function
- mr r3, r31 # Copy contents of r31 back to r3
-
- # Function epilog begins
- addi r1, r1, 64 # Restore the stack pointer
- lwz r31, -4(r1) # Restore caller's r31
- lwz r0, 8(r1) # Read the saved link register
- mtlr r0
- # Function epilog ends
-
- blr
In
line 13, we save the caller's (in this case __start's) r31 in the stack
frame. In line 19, we copy the contents of r3 into r31, so that when
getchar returns, although r3 would have been overwritten, we would still
retain the value in r31. After getchar returns, we copy the contents of
r31 to r3. We restore the caller's r31 in line 26 before returning from
this function.
We will digress a little and introduce the most frequently used
registers, before we deal with the question of why we got a return value
of 97 instead of 5 in the previous program.
In this section, we will not discuss all the registers of the PowerPC
architectures, but only the most frequently used registers of the UISA
(User Instruction Set Architecture) model. The UISA model defines the
architecture to which user level programs should conform.
In the UISA model, we have the following registers:
- 32 General Purpose Registers (GPRs)
- 32 Floating Point Registers (FPRs)
- Condition Register (CR)
- Floating Point Status and Control Register (FPSCR)
- Exception Register (XER)
- Link Register (LR)
- Counter Register (CTR)
There are 32 GPRs, named GPR0 – GPR31. These registers are 64-bit
registers in 64-bit implementation, and 32-bit in 32-bit
implementations. They can be used to manipulate integer data. GPR0 is
used in function prologs. GPR1 is used as the stack pointer, and GPR2 is
used as a pointer to the TOC, and these two registers should not be
used for any other purpose.
GPR0-GPR12 are volatile registers, that is, their values are not
preserved across function calls. GPR13-GPR31 are non-volatile registers.
If a function wishes to overwrite these non-volatile registers, it must
first save the value in the stack, and restore the value before
returning.
GPR12 is also used in special handling in the glink code (We'll see what
the glink code is later). In 64-bit architectures, GPR13 contains the
thread pointer.
The link register is used to store the return address from a function
call, and is generally automatically updated by the bl instruction. To
return to the address contained in the LR, the blr instruction is used.
The instruction 'mtlr' (move to link register) can be used to modify the
link register to an arbitrary value.
The condition register (CR) is a 32-bit register divided into eight
4-bit fields, named CR0-CR7. The results of arithmetic and logical
operations are stored in the condition register fields, and they can be
used to perform conditional branches. CR0 is volatile, and CR1-CR7 are
non-volatile. Hence, if any function attempts to change any of the
condition registers CR1-CR7, it must save the state and restore it
before returning to the caller.
The Counter Register (CTR) is used to perform branches, and used in
looping to hold the loop count value. The value of the counter register
may be modified by the 'mtctr' (move to counter register) instruction.
Suppose you want your program to wait for the user to press a key before
exiting, you would call the getchar() function which is exported by
libc. Calling getchar from our program is rather straight forward, and
all we have to do is to include the following line in our program:
bl .getchar
Note that in the above line, we have used .getchar instead of getchar.
However, including this line alone in our program will not work, and in
all probability, this program will just dump core. Do you know why?
We had seen that when we issue the bl instruction, the link register
gets overwritten with the address of the instruction following the
current one. Hence, after the bl instruction, the link register will
contain the address of the following instruction (which is a part of
.main). After returning from getchar(), when we issue the instruction
blr from .main, we would not return to __start, as we would have
over-written the link register set by __start when it issued the bl
instruction.
How do we solve the problem? We create a stack frame for main, and save the link register in the frame.
Here is how the program would look like:
- .set r0, 0
- .set r1, 1
- .set r3, 3
- .extern .getchar # Tell the assembler that
- # .getchar is an external symbol
- .csect
- .globl .main
- .main:
- #### Function prolog begins ####
- mflr r0 # Get the link register in r0
- stw r0, 8(r1) # Save the link register
- stwu r1, -64(r1) # Store the stack pointer, and
- # update. Create a frame of 64 bytes.
- #### Function prolog ends ####
- li r3, 5
- bl .getchar
- ori r0, r0, 0 # No-op, required by loader after a
- # branch to an external function
- #### Function epilog begins ####
- addi r1, r1, 64 # Restore the stack pointer
- lwz r0, 8(r1) # Read the saved link register
- mtlr r0
- #### Function epilog ends ####
- blr
In the above program, line 5 tells the assembler that .getchar is an external symbol that is not present in the current file.
Load and store operations cannot be performed directly on the link
register, and hence we have to copy the contents of the link register to
another general purpose register before storing it. The mflr (move from
link register) instruction takes as an argument another register, and
copies the contents of the link register to the specified register.
In PowerPC, the convention is to use the general purpose register r1 as
the stack pointer. In line 12, we save the value of the link register at
an offset of 8 bytes from the stack pointer.
In line 13, we use the special instruction stwu (or store word and
update) to advance the stack pointer and save the old stack pointer. In
this line, stwu stores the value of r1, at the address r1-64, and then
stores the value r1-64 in r1. Hence this single instruction allows us to
do the two tasks of decrementing the stack pointer, and storing the old
stack pointer at one go.
Having done this, we are ready to break into the main logic of the
program. We use the bl instruction in line 18 to call getchar.
There are several special instructions, which the assembler treats
specially. The instruction in line 19 is treated as a no-op. A no-op is
required by loader after a call to an external function is made. We
shall see why it is required later. xlc will not compile the program
without the no-op. 'as' will not complain about it and compile the
application.
Having done our job, we now have to restore the old values of the stack
pointer (r1) and the link register. In line 23, we restore the stack
pointer to its old value, by adding the immediate value 64 to it. We
then load the stored link register value in r0 at line 24. We then use
the mtlr (move to link register) to copy the contents of r0 to the link
register. We then finish it by calling blr.
When we run this program, we see that it waits for us to enter something, and then returns.
So far, so good, but when happens when I check the exit value returned by this program?
$ ./a.out
97
We are no longer getting 5!!!
Just a note of caution, in this post, I have not followed the
stack-linkage convention in its entirety, and have tried to simplify
things and have only tried to capture the essence of stack-linkage. I
will probably return to this topic in a later post.
You would have noticed that the load instruction in line 5 is rather
confusing as to which is the register, and which is the value being
read. To make things clearer, we will use the .set assembler pseudo-op. C
programmers can think of .set as a #define. The program will then look
like this:
- #File. 1_2.s
- .set r3, 3
- .csect
- .globl .main
- .main:
- li r3, 5
- blr
The above program has the same effect as 1_1.s
Its been quite some time since I last posted. In the next few posts,
I'll try to present an introduction to PowerPC assembly on AIX.
The motivation for this comes from my personal experience trying to
program in assembly on AIX. I found plenty of documentation on the
instruction set, assembler directives etc. However, what I couldn't find
was a step-by-step tutorial on how to write basic assembly programs.
True, there were some developerworks articles, but the code presented in
those articles hardly ever worked.
My endeavour is to present a primer into PowerPC assembly programming on
AIX. Most of my programs will be sub-optimal, and simplistic. The goal
is not to write perfect programs - rather to get someone started on
PowerPC assembly programming on AIX, so that he can go on from here and
take advantage of the large amount of material available on the web on
PowerPC programming.
The first program, usually written in any programming language, is the
hello world program. However, writing a hello world program, in assembly
is certainly not the easiest first. We will start with a much simpler
program. A program that does nothing, or, almost nothing. The program
just exits with an exit value.
The default extension of an assembly program is .s.
- #File. 1_1.s
- .csect
- .globl .main
- .main:
- li 3, 5
- blr
In this tutorial, we will use the xlc compiler to compile our first assembly program. $ cc 1_1.s And now, onto running this program: $ ./a.out $ echo $? 5
The first line in this program is a comment. Comments start with a #. A
Comment can be placed anywhere in a line. Any text after the # in a
line is ignored by the assembler. The second line in the program tells
the assembler that this is a csect, or a relocatable module. We will
learn more about csects in section .
The third line tells the assembler that .main is a global symbol, and other objects can link to it.
Line 4 is a label named .main. The assembler recognizes that this is a
label by the colon following the label name. Line 3 and line 4 work
together to signify that .main is a global symbol, and its address is
specified by the label ine line 4.
In PowerPC, the convention for a function to return a value is to store
it in register 3. Line 5 loads the value 5 into the register 3. 'li' is a
load instruction, and loads an immediate value into a register.
In AIX, whenever a binary is run, the function __start is automatically executed. __start then calls the symbol .main.
In PowerPC, whenever one function calls another, it does so by executing
the instruction bl, or branch and link. bl stores the address of the
next instruction to be executed in the link register, and then branches
to the specified address.
Therefore, when the callee function returns, it should start executing
the instruction whose address is specified in the link register. While
returning, the callee simply executes blr (branch to link register)
instruction, which automatically loads the contents of the contents of
the link register into the program counter and starts executing it.
More posts on PowerPC assembly to come in the following weeks.
While doing source code debugging, one generally compiles with -g
option, and assumes that all compiler optimizations have been turned
off. However, as far as the xlc compiler is concerned, this might not
necessarily be true. With -g the compiler puts in line-number
information and turns off some optimizations, but not all optimizations.
To tell the compiler to turn off ALL optimizations, the -qnoopt option
should be employed.
I haven't posted in a while, due to lack of time and too much work, and I think this post has been rather overdue. One
of the things aspects of AIX's malloc (or for that matter any other
operating system) is that if you free the memory you have allocated, it
won't reflect in the svmon output. This is because malloc subsystem
caches the memory, to be used for further malloc. An easy way to see
how much memory your application is using (the memory malloced by it, +
the memory in the free pool maintained by the malloc subsystem) is to
use the variable process_brk which is exported by libc. The way I usually go about it is to use the dbx subcommand (dbx) p &process_brk This gives me the address to dump, which I dump using a command similar to the one below (dbx) 0x12345678/3X This will give an output of three words.. 12345678 12345678 1 The
first word signifies what was the brk value before the first malloc was
done, and the second word tells you what was the brk value after the
second allocation was done. The third word tells how many sbrk()s were
done. Of course, this gives me a very good estimate of the total memory
used by my program. Another benifit of this thing is to check for
heap/stack collision. To check whether there has been a heap stack
collision in my 32-bit app, what I normally do is to dump the
stack-pointer, and check whether the stack_pointer falls within the
process_brk minimum and maximum limits. Hope this helps.
|