Pointers to the Stack
Let's say I had this odd piece of code:
int main(void)
{
int *arr[5]; //array of 5 integer pointers
int i;
for(i = 0; i < 5; i++) {
int x = rand() % 10 //make numbers between 0-9
printf("%d ", x); //print all of the numbers
arr[i] = &x; //set pointer to stack variable
}
printf("\n");
for(i = 0; i < 5; i++){
printf("%d ", *arr[i]);
}
printf("\n");
return 0;
}
The results I get are:
3 6 7 5 3
3 3 3 3 3
Why? Shouldn't there be an error because I'm using pointer that are are pointing to stack variables that have been erased? Instead it appears that all of the pointers are pointing at the last instance of the stack variable x.
Answer
tl;dr This is undefined behavior. Don't EVER do it. If you want to know a little more, keep reading. Otherwise that's all you really need to know.
The reason this is happening without an error is because the stack doesn't go away when it rolls back after each iteration. The pointers still point to something, even though there is no longer a guarantee of what the value is. You should NEVER do this because just for that reason: there is no guarantee about the value of a variable once it goes.
You might get an error if you did something like this with a pointer pointing somewhere on the heap (after you freed what it was pointing to) because of some sorta complex differences between the stack and heap.
If you're interested in what's actually happening, I have a (hopefully) understandable stack diagram below:
| ... |
| previous stack |
------------------
0x7FFF F0F8 | arr[4] |
0x7FFF F0F0 | arr[3] |
0x7FFF F0E8 | arr[2] | Function scope variables
0x7FFF F0E0 | arr[1] |
0x7FFF F0D8 | arr[0] |
0x7FFF F0D4 | i |
------------------
0x7FFF F0D0 | x | For loop scope variables
Space for the function scope variables is first allocated when you enter the function. Then when you enter the for loop each time, space for the `x` is allocated. That is, each `x` will have space allocated for it at 0x7FFFF0D0. Note that when you exit the for loop, you can think of the `x` going out of scope, meaning that the space held for it on the stack is no longer guaranteed. That is, we could save 0x7FFFF0D0 for some other variable later and that would be completely valid.
However, in your short little program, nothing will be given address 0x7FFFF0D0 after the for loop, so when you get to the final `printf`, you will be printing what is at address 0x7FFFF0D0, or 3, 5 times.
It's VERY important to note that you cannot rely on it printing the last value of `x`. It just happens to work here, but it is not guaranteed. In fact, you could change the output value by doing some non-trivial thing between the for loop and the second set of `printf`s.
What direction do the stack and heap grow?
I've always been a little confused about the fact that the stack grows down, heap grows up, the program code is at the "bottom", the stack "rolls up" after exiting a block, etc. Specifically, what do the directions refer to? They are up and down relative to what? And what are the implications of these directions?
Googling seems to indicate that it has to do with memory addresses - memory address 0 is at the bottom, and the heap growing up means that every new allocation on the heap will have an increasing memory address. Is this true? And then, things allocated on the stack would have decreasing memory addresses (except for arrays in which each entry increases in memory address).
Answer
We generally consider low addresses to be at the bottom and high addresses to be at the top of a memory diagram. For our purposes we have a general memory diagram of:
-------------------0x7FFFFFFF (high address)
| |
|-----------------|
| top of stack |
| |
| |
| more stack |
| | | Stack moving downwards
| v |
| bottom of stack |
|------------------
| |
| PLENTY of space |
| |
|-----------------|
| top of heap |
| | Heap grows upward.
| NOT FILLED | Just because you malloc
| IN PRECISE | after something doesn't
| ORDER!! | necessarily mean it
| ^ | will be higher on heap.
| | |
| bottom of heap |
|-----------------|
|Static stuff: |
| Program code |
| Static vars |
| String literals|
|-----------------|
| |
------------------- 0x00000000 (low address)
How are string literals handled?
See here for a more detailed paper I wrote.
Arrays on the Stack
Consider:
p[10];
printf("%p %p\n", &p[1], &p[7]);
Since p is a stack variable shouldn't p+7 have a lower memory address than p+1 since the stack grows down?
Answer
You are correct that the stack grows down, however, arrays as laid out such that p+7 will have a higher value than p+1. In terms of the stack picture p[0] is towards the bottom of the stack (the newest part) and p[sizeof(p) / sizeof(p[0])] is further away from the bottom of the stack.
Undefined Behavior?
Do I actually have to worry about undefined behavior?
Usually when we say things are undefined, they work out anyway. Does it actually matter?
Answer
Let's take an example:
Running `nc -l <positive, out of range port number>` yields an error on OS X, but Ubuntu (what the CLIC linux machines run on) will silently truncate the port number and continue (meaning you just set up netcat on a port you didn't directly specify!). If you're running OS X, try running the command below on your own computer and on CLIC:
nc -l 123456789
If you're more curious about why this is happening, it comes down to a call to `getaddrinfo`. If you read the man page for this function, you'll notice that there's no explicit mention of what should happen if an out-of-range port number (like 123456789) is entered (though there are arguably things that imply what should happen). This is undefined behavior! It just so happens that the OS X libc does a check for the port number, while Ubuntu's simply passes your out of range port number to another function in a way that will automatically truncate it. (In this case 123456788 --> 123456789 mod 2^16 = 52501)
It's important to note that both implementations can be considered valid (again, depending on your interpretation of the man page).
The moral of the story: don't rely on undefined behavior in your programs and always be sure to test your submissions on CLIC!
Miscellaneous Topics
Chaning Pointer Types
double d = 3.14;
double *pd = &d;
int *pi;
pi = pd; //compiler error
pi = (int *)pd; //compiles, but you better know what you're doing...."
*pi is now 1374389535 ... Why does the number change so drastically?
Answer
The value of a certain set of bits depends on how we interpret that set of bits. And the type of the pointer tells us how to interpret those bits. In lab 1, you sometimes saw a different result when you tried to print a set of bits as an unsigned integer and a signed integer.
In this situation, 3.14 is stored as a set of bits (specified by IEEE 754 standard for floating point numbers if you're interested or have taken Fundamentals). Because of how doubles are stored, when we reinterpret this set of bits as if it were an integer, we get a drastically different number! The reason that the difference between signed and unsigned integers was less drastic (sometimes) was because there is more similarly between signed and unsigned ints than floating point numbers and ints.
Why include 'core' in Makefile clean target?
When your program in certain ways (e.g. Seg fault, floating point exception, etc), UNIX will attempt to leave what is called a core dump, which contains the memory state of your program at the time of crashing. You can then examine this in a debugger such as gdb, which may make it easier to find your error. You'll noticed how when your program segfaults it says "Segmentation fault (core dumped)". That's this 'core' file.
At this point you're probably pretty confused because you've never seen a core file when your program segfaults. That's because by default the size limit on this core file is set to 0, meaning you'll never see it. If you're interested in playing around more, you can run `ulimit -c unlimited`, which will remove any limits on the size of your core dump files. In order to get back to the old way of not leaving core dumps, you can run (you guessed it) `ulimit -c 0`
Long story short, it's not a bad thing to have in case someone who chooses to generate core dumps by default uses your program/Makefile. Though it isn't strictly necessary.
Pre-processor Trickery
I noticed that you can comment huge blocks of code using pre-processor directives. Also I noticed that some opensource programs which you build your self will have things in code like:
#ifdef OS_MSDOS
#include
#elifdef OS_UNIX
#include "default.h"
#else
#error Wrong OS!!
#endif
Is this also the same as commenting out code?
Answer
Sorta. You can use pre-processor directives to achieve the same result as comments, but they're really fundamentally different.
Pre-processor directives are considered (as you may have guessed) during the pre-processing stage. This is before compilation actually begins. So blocks "commented out" with things like "#if 0" won't even make it to the compiler (they'll be removed when other things are #included).
However, comments (both block /**/ and line //) do make it to the compilation stage, where the compiler ignores them.
Using "#if" 0 is usually considered pretty hacky but they can be useful when you want to get rid of a block of code containing a block comment (since they don't nest). I would recommend (though not require) that you don't submit submit code using comments of this style.