This exercise aims to analyze a program that performs concurrent processing of a shared table. This version makes use of a simple concurrent programming pattern that distributes the load between the different threads of an application for partial processing of each part. In this case, the use of a multicore infrastructure available in many current machines, help reduce the execution times.
The basic idea is processing a long data structure and divide the work into several threads:
In the example given there are 5 threads, each of which has a fifth of the table, and keep the solution at an intermediate table (array).
As all threads have completed execution and leave the function do_work
, the main function
sums up the partial contributions of each of thread. To wait for the main
thread will use a pthread_join
for each thread.
Note that the code execution has no anomalies before running because each thread has a memory block exclusively (you can check your code with helgrind).
It is also true that the mixture of the data of the strands is safe because
the main
function in charge of summing up does not occur while the thread is running.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NTHREADS 5 #define ARRAYSIZE 100000000 #define ITERATIONS ARRAYSIZE / NTHREADS double sum=0.0; double a[ARRAYSIZE]; double mysums[NTHREADS]; void *do_work(void *tid) { int i, start, *mytid, end; double mysum=0.0; /* Initialize my part of the global array and keep local sum */ mytid = (int *) tid; start = (*mytid * ITERATIONS); end = start + ITERATIONS; printf ("\n[Thread %5d] Doing iterations \t%10d to \t %10d",*mytid,start,end-1); for (i=start; i < end ; i++) { a[i] = i * 1.0; mysum = mysum + a[i]; } mysums[*mytid]=mysum; printf ("\n[Thread %5d] Sum %e",*mytid,mysum); pthread_exit(NULL); } int main(int argc, char *argv[]) { int i, start, tids[NTHREADS]; pthread_t threads[NTHREADS]; pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); for (i=0; i<NTHREADS; i++) { tids[i] = i; pthread_create(&threads[i], &attr, do_work, (void *) &tids[i]); } /* Wait for all threads to complete then print global sum */ for (i=0; i<NTHREADS; i++) { pthread_join(threads[i], NULL); } /* Computing the output*/ sum=0.0; for (i=0;i<NTHREADS;i++) { sum = sum + mysums[i]; } printf("\n[MAIN] Total Sum= %e\n",sum); /* Clean up and exit */ pthread_attr_destroy(&attr); pthread_exit (NULL); } |
Now it is the time to experience the benefits in performance issues offered by multicore. To do this you are asked to make the following changes to the code:
Check your code, compile it with gcc
, and run it by measuring the time it takes to execute.
To do this you can use the Linux command time. For example, if your executable is called concurrent_loop
,
you can use time ./concurrent_loop
to verify the total runtime of your program.
Modify it so that there is only one thread in the application. Recalculate how long it takes to run your application. Why is that time higher than in the previous case?
Change the code so that there are no global variables and all are passed to threads
through its initialization pointer. All tables should be created in the main
function, and this this data
should be packed into an struct. All these data should be packed into a structure.
Finally, remodify the code so that instead of using more traditional concurrency setting (threads)
is used. To do this, a simple way to achieve this goal is to remove references to the threads and call the function
directly from the block of the main
function. Measure the time it takes to run this new version of your code.
Do you go out in the middle of the two previous times? Why do you think you are getting these results?