ROB 599: Programming for Robotics: Class 4

A note from class3

The major thesis of class3 was that data on a computer can mean anything and be represented in different ways. Because I told you all that the character ‘A’ was 65 and ‘Z’ was 90, I saw a lot of code like this:

if (character >= 65 && character <= 90) {
    ...
}

But really, if characters and numbers are the same (and they are), it is both clearer and equivalent to write:

if (character >= 'A' && character <= 'Z') {
    ...
}

And if you want to convert from lower case to upper, try writing:

if (character >= 'a' && character <= 'z') {
    character += 'A' - 'a';
}

This way you can tell that what we want is to “add” uppercase.ness in preference to lowercase-ness.

If we instead had character += 32, this would be what we call a Magic Number, and they are frowned upon. If a specific number is important enough, it is generally better to use a named constant so that the meaning is clear.

In this case, using 'A' - 'a' directly works, but we could define a constant value:

#define LOWER_TO_UPPER ('A' - 'a')
...
character += LOWER_TO_UPPER;

Helpful gcc flags

There are a lot of options that we can give the gcc compiler to help us out more. We have tried to curate a good set of these to use for the remainder of the class. Essentially, these flags will instruct the compiler to give errors about various potential bugs or bad practices in your code. In addition,-fsanitize=address option is particularly sophisticated in that it can detect many errors relating to memory, addresses, and pointers and give you feedback about what happened. While this tool (the AddressSanitizer) is very useful, it can sometimes clash with other tools such as GDB or Valgrind, so you should know how to disable it (comment out that line with a #). We will explore the AddressSanitizer and other tools more in the next class session.

CFLAGS = -ggdb3 -std=c11 -Wall -Wunused-parameter -Wstrict-prototypes -Werror -Wextra -Wshadow
CFLAGS += -fsanitize=signed-integer-overflow -Wfloat-conversion
CFLAGS += -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable
CFLAGS += -fsanitize=address -fsanitize=undefined

addresses: addresses.c
    gcc -o $@ $^ $(CFLAGS)

In this same way, you could also add other variables to your makefile!

In-class problem 1: addresses

In this exercise we will be exploring the nature of “addresses”. On a computer, every piece of data lives somewhere in the computer’s memory and has an address, specifying that location. In C, we can get the address of a piece of data by prefixing the variable with &. We can also print out the address by using the %p specifier of printf and casting the type to void *, which is the type of a generic address.

For example:

int main(void) {
    int variable = 10;
    printf("%p\n", (void *)&variable); // does NOT print 10!
    return 0;
}

Declare variables char char1 and int int1 at the top of your program (so not in the main function). Have your program print out (in order) the addresses of both variables. You will notice that the addresses are printed out in hexadecimal, which is the convention for addresses.

Next output the “distance” (as a number, not a pointer!) between the char and int variables, in terms of their address. You can do this just with subtraction. You may notice that the compiler at first does not like the idea of subtraction on pointers of different types. We can make the subtraction work by casting each address to a long (long integer) before the subtraction. Do not cast to a long pointer, long * because we explicitly want to treat the address as a number and not an address.

Now also declare double doubles[3] at the top of your program. First print out the array itself (doubles) as an address, and then the addresses of each value in the array (with (void *)&doubles[0] and so forth).

Next output the distance between two consecutive values in the array. Since the pointers are the same type, you can directly subtract between them, without casting.

Also output the distance value you get by first casting each address to long before the subtraction.

Great! Now repeat all of the above steps, but with char char2, int int2, and float floats[3] all declared in your main function (not the top of the program).

Finally, let’s do one more! We’ll talk more about dynamic memory and malloc/free later, but add this last section too.

void *mem = malloc(1); // ask for memory
printf("%p\n", mem); // memory is already "given" to us as an address!
free(mem); // give memory back

What do these different memory addresses mean?

The variables at the top of your program exist in static memory, meaning that those memory addresses and their data are valid throughout the life of your program.

The variables declared inside of main exist on the stack. Whenever a function is called, it adds a bunch of data/variables onto the stack, and when that function returns, it “pops” those values off of the stack. So these memory addresses are only valid for as long as main is running. When main returns, these memory address and their data may be reused for something else.

Finally, the memory address returned by malloc is located on the heap. This is an area of memory used exclusively whenever the program explicitly asks for memory. It is valid until the exact memory address is released with free.

Since all three of these types of memory and data have different lifetimes, it makes sense that the memory addresses themselves are located in different regions of memory.

In-class problem 2: substring

In this problem we will search a file for a keyword and then print the text that comes before each instance of that keyword, as shown below.

./substring
usage: ./substring <file> <key> <lines before>
./substring searchs.txt season 0
Could not open searchs.txt: No such file or directory
./substring search.txt season 0
season

season

./substring search.txt season 1
it was the season

it was the season

./substring search.txt season 2
it was the epoch of incredulity,
it was the season

it was the season of Light,
it was the season

./substring search.txt best 3
It was the best

The lines before parameter means the number of lines to print before our target word. Zero means we only print the word itself. One means we print only the part of the current line that is before our target word (including it). And two or greater includes the full lines the come before. Essentially the lines before is the number of newline characters we need to find as we go backwards from the target word.

There could be many different approaches to solve this problem, but we are trying to get practice working with pointers, so let’s follow the approach below which avoids needing to copy memory:

  • Use the strstr substring function with your text and key. It will return a pointer to the first instance of the key that it finds in the text. If it can’t find it, it returns NULL.
  • Write a “get context” function that takes an instance of the key found by strstr and looks back to find the start of the requested number of lines, making sure not to look back earlier than the actual string goes.
  • The program will first call strstr to find the key and then call your context function to get the starting location. It then prints the text between the starting location and the end of the key. It repeats these three steps as many times as necessary, each time calling strstr with a pointer into the text at a point just after the last keyword we found. This way, the text string always “looks” smaller, even though we aren’t actually modifying it.
  • When printing out the full context around each instance, make sure to stop right after the found keyword. You may need to temporarily add a null terminator to the string.

We have provided you code that will read the file into memory.

A note about pointers

Although we can think about arrays and strings in C fairly naturally, we can also look at them as being pointers.
For example, the following equalities hold:

str[0]   == *str
str[1]   == *(str + 1)
str[-1]  == *(str - 1)
&str[0]  == str
&str[1]  == str + 1
&str[-1] == str - 1

Although from the perspective of an array or string literal, the index -1 makes no sense, as far as the computer is concerned, it is simply referring to an arithmetic operation on the address.