Thursday, February 16, 2012

The Power of Parallelism - Part III

In the last post we talked about multiprocessing, where a parent process forked children to work in parallel. However, creating multiple processes to share the load for a common task is not worth the effort, as switching between processes is a highly performance limiting operation for the operating system. You may sometime dive deep into the kernel source to see all the operation a scheduler needs to do while switching between processes.

In order to make parallel computing more efficient, lightweight processes called threads can be created within a scope of a process. Since threads are multiple workers running in parallel within a process, they share the logical addressing space, heap, memory maps, even the open files in the process. Thus, to switch between threads, only the stacks and the respective instruction pointers are to be swapped, which seems to be a much less overhead as compared to switching between separate processes.

However, it should be kept in mind, that threads run in parallel and independent of each other, working over shared global data, and hence synchronizing their operations is equally important. Just as multiple processes use semaphores, threads can, too! And what's more fun is that the semaphores don't need to be in shared memory, as all threads can access the global semaphores.

To work an example out, lets solve the problem stated in the last post using threads. To create threads, we just need to call

pthread_create(&newThread, &threadAttribute, operatingFunction, argumentToFunction);

Here, threadAttribute is a pthread_attr_t variable which decides the behavior of the newly created thread. Now, how do we tell the thread what to do after it is born? Well, not much; it just has to execute the code in the function named operatingFunction which takes a void * argument and returns a void * value. Just because the parameter and the return value are both void pointers, any type of data can be sent to and received from the thread. Here argumentToFunction is the argument we send to the thread. Once the thread is created, its ID is sent back in newThread so that it can be controlled and monitored. More reference of thread related functions can be found here.

The worked out solution to the problem discussed in the last post can be found here. One thing to be noted is that between multiple trials to lock the semaphores, the thread relinquishes control of the CPU by using sched_yield. This stops its continuously hogging the CPU polling the lock, and thus provides chance to other threads to finish their work and eventually unlock the locks. I would suggest you comment the two lines containing sched_yield, compile and execute to check out the drastic fall in performance.


I hope now its clear how easy it is to use threads to improve response time of programs. Here you can find a worked out (and well commented, as well) multi-threaded solution to counting occurances of a specific byte in a file. The screenshot below might interest you to have a look into it ...


N.B. To all who had a gaze into the above program, its pretty awkward to reserve memory to store the entire file. Its equally awkward to read such huge chunks of data in a single call to read. You might find it interesting to make the program a bit more savvy ... remember the synchronizing issues, of course!

Well, that's all for this series of posts related to parallel processing. I hope they prove to be helpful. Keep reading for more ...

1 comment:

  1. thank you for the feedback .... keep reading, and keep inspiring :)

    ReplyDelete