ROB 502: Programming for Robotics: Class 14

Basic threading

Many simple programs can be viewed as a function that starts with some input, performs a computation, and finally delivers the output. As we progress to writing more complicated programs, we eventually have to deal with multiple sources of input and output with different requirements. We may also have to deal with long running computations that need to not freeze up the main program.

For example, I am currently writing this document in the Atom text editor. Although the program is open, if I am not currently typing or interacting with the editor, it uses very little processing power. As I type, though, it is able to immediately react to display my typing on the screen. It even considers what words I may want to type and begins to offer suggestions. As soon as those suggestions are no longer possible, however, it causes them to disappear. In addition, if I change any of the files in the project I currently have open, it will also immediately show those changes in the file panel on the left.

These behaviors are only possible because the Atom text editor is taking advantage of threading, or the concurrent use of multiple threads of execution. The programs we are used to writing have only a single thread. By contrast, at the moment I checked the process, Atom was using 28 distinct threads. Many of these threads were paying attention to different forms of input from the user. One thread was keeping track of files. Another was monitoring the GPU, because apparently the editor uses it. One thread must have been waiting for me to type. Probably the majority of the threads were waiting to be given jobs to complete by another thread, because if multiple threads can work on a problem together, they can take advantage of the many CPU cores most computers now have.

In-class problem 1: helloworlds

In this problem we are going to start 16 different threads, and each thread will print out “Hello world from %d!” message, with the number of that thread.

We are going to be using C’s pthread library to handle threading. So make sure to have #include <pthread.h> in your file and also add -lpthread (use library pthread) to your makefile.

Let’s also make a new thread_info_t structure to hold both the required pthread_t and also the id number we are assigning to each thread:

typedef struct thread_info {
    int num;
    pthread_t thread;
} thread_info_t;

We need to keep track of these pthread_t values so that we can control and interact with our threads later. In addition, our only way to communicate with the threads is through a pointer to some value. So this thread_info_t can conveniently be the structure that we use to communicate with our threads.

Now we can start each thread with pthread_create:

thread_info_t thread_infos[N_THREADS];
for (int i = 0; i < N_THREADS; i++) {
    thread_infos[i].num = i;
    pthread_create(&thread_infos[i].thread, NULL, thread_start, &thread_infos[i]);
}

The second argument can be null or used to set extra attributes for the thread. The third argument, thread_start needs to be a function that will be run by the new thread. The fourth argument is the pointer that will be given to the thread_start function when the new thread begins.

Any thread start function needs to have the following signature. The first thing we do is cast the void pointer (meaning a pointer that could be to anything) to the correct pointer type. Although we return null, it would be possible for the main program to read the return value of this function.

void *thread_start(void *user) {
    thread_info_t *info = user;
    printf("Hello world from %d!\n", info->num);
    return NULL;
}

You will likely get something like this when you run your program:

./helloworlds
Hello world from 0!
Hello world from 2!
Hello world from 1!
Hello world from 4!
Hello world from 3!
Hello world from 5!
Hello world from 6!
Hello world from 7!
Hello world from 8!
Hello world from 9!
Hello world from 10!
Hello world from 11!
Hello world from 12!
Hello world from 13!
Hello world from 14!
Hello world from 2101487808!
Hello world from 2101487808!

And each time you run it, the result should be slightly different. If the results are all too clean and perfect (everything in order and no big numbers), try commenting out the -fsanitize=address line in your makefile; it is slowing down your program too much.

The reason that some of the numbers are coming out of order is that Linux makes no guarentees about how it schedules when to run different threads or processes. Our program can make all these threads, but we can’t control the order that they run in. This is okay! We will learn more about the difficulties and solutions to this situation in the next class.

The only actual problem we have is that on occasion the last couple lines we sometimes get very large numbers… and then somehow the wrong number of lines! This is 17 not 16!

The problem here is that the last thread was created right before the main program/thread returns. The reason that it prints out the wrong number is that the number was being stored in the thread_infos local variable of the main thread. When the main thread returns, this local variable no longer exists and gets overwritten with different data.

The solution to this problem is that we need to prevent the main program from exiting unless all of the threads have finished first. The function pthread_join waits for a pthread to finish running. So we just have to call pthread_join on each thread! We pass null for the second parameter because we do not care about the return value of the thread.

pthread_join(thread_infos[i].thread, NULL);

Run this and you should only ever get the 16 correct messages, although their order should still be unpredictable.

In-class problem 2: parallelhashcomp

In this problem we will perform the exact same work as in the earlier hashcomp problem from the class on hashing and hash tables. The only difference is that now we want to perform the various benchmarks in parallel using threads. So that we can see whether using threads like this impacts the timing, your program will also be given the number of threads to use as a command line argument.

If you did not complete this assignment last week, come speak to me and I will share my hashcomp code with you so that you can finish this assignment without having to go back and finish it first yourself.

The basic idea is that instead of directly calling some evaluate_hash_reduce function to evaluate a given pair of hash and reduce functions, we will instead create a thread which will then call that evaluate_hash_reduce function from its thread_start function. And once we have created n threads, we will wait for them all to finish before creating the next batch. You should modify your thread_info_t structure to contain all of the parameters necessary to call evaluate_hash_reduce.

Your program must output the benchmarks in the same order as the original hashcomp, so we need to figure out how to guarantee consistent ordering.

The key to this is that only the main thread will actually call printf. All of the “worker” threads will simply put their calculated result back into their thread_info_t object. After the main thread calls pthread_join on a thread, it will know that thread has finished and it can read out the result from the info, and print it to the screen. As the hash comparison problem uses a lot of data, your thread info structure will have many entries for all of the data input as well as the output (number of collisions and average time used). You do want to make sure you are sharing all of the word entries between the threads, because it would be inefficient for each thread to have to read all that data from book.txt again and again.

Using clock_gettime() instead of clock()

Using clock() in the original problem was a good solution because this is a function that is guaranteed to exist by the C standard. Unfortunately, the standard does not say how precise it is and this could be a problem in some scenarios. The real problem is that clock() measures the total CPU time taken by all the program threads summed together, so it won’t work for each thread to individually perform its benchmark for 0.5 seconds.

Instead we will use clock_gettime(), which should be both more precise and also allow us to measure the real-world time independently for each thread. The only problem is that this function is only available on “POSIX” compliant systems, mostly meaning operating systems related to Linux. As we have done before, we will have to define a certain preprocessor variable _GNU_SOURCE to allow us to use this function. Use the following function to get time in seconds:

#define _GNU_SOURCE
#include <time.h>

double seconds_now(void) {
    struct timespec now;
    if (clock_gettime(CLOCK_MONOTONIC, &now)) {
        fprintf(stderr, "Retrieving system time failed.\n");
        exit(1);
    }
    return now.tv_sec + now.tv_nsec / 1000000000.0;
}

Evaluation

The output of your program will be in the same order as the original hashcomp no matter how many threads you give it to use at a time. The difference is that the lines will appear on the screen in batches according to the number of threads you are using.

./parallelhashcomp 1
# Runs exactly the same as ./hashcomp

./parallelhashcomp 2
# Runs hopefully about 2 times faster

# and so on
./parallelhashcomp 4