Thursday, February 16, 2012

The Power of Parallelism - Part III

In the last post we talked about multiprocessing, where a parent process forked children to work in parallel. However, creating multiple processes to share the load for a common task is not worth the effort, as switching between processes is a highly performance limiting operation for the operating system. You may sometime dive deep into the kernel source to see all the operation a scheduler needs to do while switching between processes.

In order to make parallel computing more efficient, lightweight processes called threads can be created within a scope of a process. Since threads are multiple workers running in parallel within a process, they share the logical addressing space, heap, memory maps, even the open files in the process. Thus, to switch between threads, only the stacks and the respective instruction pointers are to be swapped, which seems to be a much less overhead as compared to switching between separate processes.

However, it should be kept in mind, that threads run in parallel and independent of each other, working over shared global data, and hence synchronizing their operations is equally important. Just as multiple processes use semaphores, threads can, too! And what's more fun is that the semaphores don't need to be in shared memory, as all threads can access the global semaphores.

To work an example out, lets solve the problem stated in the last post using threads. To create threads, we just need to call

pthread_create(&newThread, &threadAttribute, operatingFunction, argumentToFunction);

Here, threadAttribute is a pthread_attr_t variable which decides the behavior of the newly created thread. Now, how do we tell the thread what to do after it is born? Well, not much; it just has to execute the code in the function named operatingFunction which takes a void * argument and returns a void * value. Just because the parameter and the return value are both void pointers, any type of data can be sent to and received from the thread. Here argumentToFunction is the argument we send to the thread. Once the thread is created, its ID is sent back in newThread so that it can be controlled and monitored. More reference of thread related functions can be found here.

The worked out solution to the problem discussed in the last post can be found here. One thing to be noted is that between multiple trials to lock the semaphores, the thread relinquishes control of the CPU by using sched_yield. This stops its continuously hogging the CPU polling the lock, and thus provides chance to other threads to finish their work and eventually unlock the locks. I would suggest you comment the two lines containing sched_yield, compile and execute to check out the drastic fall in performance.


I hope now its clear how easy it is to use threads to improve response time of programs. Here you can find a worked out (and well commented, as well) multi-threaded solution to counting occurances of a specific byte in a file. The screenshot below might interest you to have a look into it ...


N.B. To all who had a gaze into the above program, its pretty awkward to reserve memory to store the entire file. Its equally awkward to read such huge chunks of data in a single call to read. You might find it interesting to make the program a bit more savvy ... remember the synchronizing issues, of course!

Well, that's all for this series of posts related to parallel processing. I hope they prove to be helpful. Keep reading for more ...

Wednesday, February 15, 2012

The Power of Parallelism - Part II

So, we have all dealt with processes, at some time or the other. Even the browser we are reading this in, is a process in itself. If we have a look at the output of ps command, we can find multiple processes running in a system at any given time. We have learnt to write programs which compiles into an executable and then becomes a process when launched. In the last post, we have also talked about launching an executable file to create a new process.

In this post, we will see how we can create processes on the fly. In fact, it's quite an easy thing to do. anywhere in your program write this

1:   int pid;
2:
3:   pid=fork();
4:   if(pid==-1){
5:           /*The system failed to spawn a new process*/
6:   }
7:   if(pid==0){
8:           /*This part is executed by the child*/
9:   }else{
10:          /*This is parent's job to do*/
11:  }

fork  is a system call, and when called from a process, dives deep into the kernel to create a copy of the calling process and return to both of them. If the process creation fails, then it can return only to the parent, and hence returns -1. If, however, the child is born successfully, there are now two processes to return to! So it returns 0 to the child, and the child's process id to the parent.

Now, being independent of each other, the two process can finish their job at any time and exit. But if the parent process is really concerned about the child's exit status, and must wait to clean up anything its children might have messed up with, it can do so with a call to wait. If any child forked by the caller is running, the system call wait makes the caller sleep until it exits. If no such child exists, the call returns -1 immediately.

A much awaited query follows the discussion inevitably ... now that we have two processes, can they talk to each other? Well, of course they can! Inter process communication systems (ipc, in short) saves the day. Two processes can share data using sockets, message queues or shared memories, and in this post we will deal with ipc using shared memory.

A process can create a shared memory using

int shmid = shmget(key_t key, size_t size, int shmflg);

which returns the ID of the newly created shared memory segment, or -1 if it fails. Such a shared memory created can later be destroyed by

shmctl(shmid, IPC_RMID, NULL);

Once a shared memory segment is created, the processes need to attach to it, to be able to use it. A call to shmat does this. Similarly, shmdt detaches the virtual memory area corresponding to the shared memory segment from the memory map of the calling process.

Having said all this, its time to bring all these into action. We can think of a problem, where there is a shared variable, and each process monitors it. They check the value and go to sleep. If the find the value unchanged after waking up, they stay happy. Otherwise, hard luck! Writing a program to do that should not be that tough, right? So here it goes ....


As it is clearly seen, all the children can read and write the shared memory just as expected, and that leads to yet another problem ... synchronizing their activities. This is required as all of them are now working on a common data, and its quite possible for one to overwrite data that is being used by another.

So, how do we synchronize the access to the variables? Nothing much, we add two semaphores to protect the two variables. Semaphores can be thought of as counters, which count down to zero when locked (or waited upon), and once the value reaches zero, the caller is blocked unless someone else unlocks (or posts) to increment the value.

We tune the code, therefore, so that each process locks the semaphores first (well, a small amount of rest in between the two locks, the reasons for which will become clear shortly) and then go to sleep. Upon rechecking the value after waking up, the semaphores are unlocked, so that others can try their feat, too. However, which semaphore to lock first, is decided at random. The modified code here was tried out a few times


What happened now? Both the children were happy in the first case, while in the second case, when 10 children were created, they just hung! We had to kill the system!

This is because, because of the randomness in the order of locking and the sleep in between, a situation might have occurred, where a child acquired prot_one semaphore and requested prot_two, while someone else held prot_two while asking for prot_one! A deadlock!

Well, the writers of real-time libraries must have thought of this in advance, and so we do have a workaround ... a sem_trywait! It is not so aggressive as sem_wait and hence does not block if the semaphore is already blocked by someone else. Using this trick, we rewrite the code once again


Voila! it works perfectly now! All the children are finally happy ...

So, in this part, we have discussed creating children, sharing data and synchronization ... the keys required for parallel processing. But creating processes to carry out multiple tasks is an inefficient way of doing things, because, switching between process takes up a lot of processing power. As we will see in the next post, there is another lightweight way of doing this ... so keep waiting for threads to take over ...

Sunday, February 12, 2012

The Power of Parallelism - Part I

It would not have been nice if we were still living in the days when computers could do only one thing at a time. Most of us enjoy maintaining a messy taskbar, full of text editors, web browsers, file system explorers and what not. And the world of parallel computing does not seem to limit itself to creating and running multiple tasks, but multiple parallel works inside a single task. Don't you enjoy the Tabbed Browsing features of Firefox (or Chrome) ?

So I decided to write a trilogy of posts which cover the basics of multitasking. This is the first in the series which talks about spawning a process from another. Well, the idea is to demonstrate what a shell (e.g. bash) does when we type in a command. In here, we will write a program which will launch gcc to compile a C program, and if compiled successfully, will launch the output executable.

Ok, to begin with, I wrote the C source code which will be compiled. It just needs to be a hello world sort of a code, so I just typed the following down and named it sample.c

1:   #include <stdio.h>
2:
3:   int main(){
4:           int data=0xdeadcafe;
5:
6:           printf("This is the proof that I was executed: %x\n",data);
7:           return 1;
8:   }


Alrite, now we need to cook something up, which can launch gcc to compile this. This can be done by the system call execve. The syscall execve launches an executable when the path to the executable, and the input arguments to the same, are passed as parameters to it. But, there is a catch ... execve never returns if it successfully launches the new task. So, we need to call fork before we call execve [We will deal with fork in the next post]. To launch gcc, therefore, we need to do something like this

1:   char *argv_to_gcc[]={"gcc","-o","sample.out","sample.c",NULL};
2:   int pid,status;
3:   
4:   pid=fork();
5:   if(pid){
6:           wait(&status);
7:   }else{
7:           status=execve("/usr/bin/gcc",argv_to_gcc,envp);
8:           return status;
9:   }


The return value from gcc will be present in status, which we can check to decide whether the executable sample.out was built or not. Note that the return value can be obtained by examining bits 8-15 in status. If gcc returned 0, we launch sample.out like this

1:   char *argv_to_output_executable[]={"sample.out",NULL};
2:   int pid,status;
3:
4:   pid=fork();
5:   if(pid){
6:           wait(&status);
7:   }else{
8:           status=execve("./sample.out",argv_to_output_executable,envp);
9:           return status;
10:  }

Putting it all together, the program launcher.c was written to do all this.


I hope that makes clear how shells work for us to launch programs. In the next post, we will talk about spawning processes using fork and how to synchronize the parallel operations on shared memory.