Thursday, January 26, 2012

Compiling a Kernel

Any kernel-compile by a starter is not its worth unless you can see the changes for yourself. Really speaking, how many of us could really detect the changes from kernel version 2.6.xx that were brought into 2.8.xx [of course you are not allowed to peek into the change logs]? So I decided to edit a kernel source code and compile it, so that the changes are really visible to the user space.

We should be aware that any user space program, upon completion calls library function exit(), directly or indirectly, so that the kernel can spew it out of the list of processes. This function, is actually a wrapper around a system call sys_exit. So, how about we edit the function sys_exit so that it prints a message at the controlling terminal of the calling process? Now, wait! Not every process has a controlling terminal, right? Like daemons, and tasks like init? Well, they can print their goodbye messages in the system log, for that!

Having said all that, we now need the kernel source tree. But which one, then? There are loads of them ... Well, the safest way is to get the one nearest in version to your existing kernel. So, I just took a look at the kernel version of my system:

$ uname -r
2.6.32-32-generic

and went about to download a 2.6.32.xx kernel from here. The download will be a tarball of the source tree compressed with one of gzip, bzip2 or xz algorithms, so you can download any one of the .tar.gz, .tar.bz2 or .tar.xz files.

Once this was done, the source tree archive was extracted. This can be done by the traditional tar command as shown below


or in these days of X and gnome (or any other desktop manager), there will be many archive manager GUI tools out there to do that for you.

Ok, now we edit the function sys_exit(). It can be found in kernel/exit.c, well not exactly as sys_exit(), but as SYSCALL_DEFINE1(exit, int, code)! Any SYSCALL_DEFINEx's you find here are mere preprocessor directives [defined in /include/linux/syscalls.h] to make the code more readable.
Note
If you really want to dig into the kernel code to see how it works, make friends with grep -R.

We can see that sys_exit()  does most of the useful work in do_exit(). Lets insert this bit of code there at the start of do_exit()

1: char go_down_msg[64+TASK_COMM_LEN];
2: struct tty_struct *tty_curr=get_current_tty();
3: if(tty_curr==NULL){
4:         printk(KERN_ALERT "%s going down with code = %ld. no tty found.\n",current->comm,code);
5: }else if(tty_curr->ops==NULL){
6:         printk(KERN_ALERT "%s going down with code = %ld. no tty operations found.\n",current->comm,code);
7: }else if(tty_curr->ops->write==NULL){
8:         printk(KERN_ALERT "%s going down with code = %ld. no tty write operation found.\n",current->comm,code);
9: }else{
10:        snprintf(go_down_msg,64+TASK_COMM_LEN,"\r\n%s going down with code = %ld.\r\n",current->comm,code);
11:        tty_curr->ops->write(tty_curr,go_down_msg,strlen(go_down_msg));
12: }

Now we are ready to compile our new kernel. We open the Makefile and edit the first few lines to name our new kernel linux-2.6.32.27-sys_exit-modified. Also we need to configure our new kernel before we build it. The easiest way to do this will be to copy the configuration file of our existing kernel and use it.

Note
make oldconfig may ask for your choices for a few cases, and its better to google for a safe choice!

The next few steps are quite simple. We need to build the kernel now. And then install it, and the accompanying modules. So, let's do it. And yes, these are going to take quite some time; so let's get some popcorn!

$ make
$ sudo make modules_install
$ sudo make install

Now  got three files in /boot directory:
  • vmlinuz-2.6.32.27-sys_exit-modified
  • System.map-2.6.32.27-sys_exit-modified
  • config-2.6.32.27-sys_exit-modified
and the modules in /lib/modules/linux-2.6.32.27-sys_exit-modified/

Exited? Well, we are almost there to boot into our new kernel. Just a few more steps ... We need to create a ramfs file system for our kernel to boot. this is required to help the kernel with the initial stages of booting where the modules required for accessing the file system has not yet been loaded. After that, the grub menu needs to be updated so that we can boot into our new kernel. All these can be done by

$ cd /boot
$ sudo mkinitramfs -o initrd.img-2.6.32.27-sys_exit-modified 2.6.32.27-sys_exit-modified
$ sudo update-grub

Now, we can reboot and start our new kernel. And if you get to see a lot of messages rolling down the console, you have made it successfully! Once inside, we get to see our new kernel in action. So just open a terminal, and enter any command. And well, I got this! "ls going down with code = 0."! that was real nice a feeling!


Well, Thats all for now ... Keep exploring the kernel source tree!

Saturday, January 21, 2012

A Loadable Kernel Module ...

This is intended to be a motivation towards writing Loadable Kernel Modules (LKMs) and the power they can equip the user with, under the hood of course. The module itself has been kept as simple as possible, while codes for error checking and synchronization have completely been stripped off. You can definitely try this at home ... while learning the other important aspects all along.

To start with we write a simple C program named edit_passwd.c, which tries to open the file /etc/passwd in write mode. Provided the program is run under non-superuser privilege, it is sure to fail, as the file can be opened only by root. You can easily verify this


So, when edit_passwd was executed, we got something like this.

Needless to say, we did not insert any module before pressing the key. 

Now, we can write a module which would change the credentials of this task so as to enable it to open this file. A sample module which did this for me can be found here. Now it was time to compile this module. To do this, we create a Makefile [To know about makefiles and make utility, check this out] in the same directory as the module source file. The sample makefile can be downloaded from here. Once this is done, we can build the module by saying 

$ make


Now, its time to see this module in action. We start our demo program edit_passwd, and while it waits for the user to press a key, we insert the module by saying

$ sudo insmod module_demo.ko

and then press the key. And what! The program now opens the protected file!


So, what is it that we do in our module that empowers the task? Well, lets look into the module's init function hello() [A init function is called when the module is loaded] a bit.
  • When it is being inserted, it gets the pointer to the task_struct in current. Then it iterates over the list of all tasks to find the task named edit_passwd
  • Once the task is located, it changes fsuid element in the cred and real_cred structures. This element is checked by kernel for verifying privilege while doing VFS operations, i.e., file handling. Since it has been changed to the uid of root, any file which can be opened by root can now be opened by the task.
I hope you could see the catch by now. Everytime we need to do this for a file with a different name, the module will need to be recompiled. All these could be avoided if the task could directly communicate from user space to the module in kernel space. Well, not so tough, though ... we can do this by sysfs or procfs interfaces. Let's leave it to the readers, rite? And yes, this will help you for sure.

Wednesday, January 18, 2012

Function on the fly ...

When I was undergoing a course on Calculus in college, I often felt the urge to plot the graphs of various functions, just to check them out how they look like, to have a feel. So, back then I wrote a small program which evaluated the values of y for a set of values of x and then plotted them. The mathematical expression for y was written into a C function which was called by main() while iterating over the values of x.

But soon, it started to irritate me! Because, every time I wanted to do this for another expression, I had to rewrite the function, compile and link it! But I had no choice. The least I could do was to write a parser which read the expression from the input, build a parse tree (dont know much about parsing? check it out here), and then recursively traverse that to evaluate y. Well, no! I wanted something in machine code. Yes, you got it; I wanted a mini-compiler inside my program, which would read an expression string and create a function online to be called again and again.

The rest of this post illustrates an example of how to do it. To keep it short and simple, a binary operation will be parsed. Also, this example will be absolutely non-portable. That means, in short, that it will not even compile on a non-x86-non-unix machine. Well, casting it to suit a particular architecture/platform wont be very hard, I believe! For reference to x86 instructions, refer here.

And one more thing! This is going to be extremely dirty down there. Please hold on to it with patience (and this).

The first thing we require is allocate memory for the function that will be generated. Since we are going to use mprotect() to obtain write (and then read/exec) permissions to this memory, its better to allocate an entire page for this. posix_memalign() will do that for us.

#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

void *ptr;
posix_memalign(&ptr, getpagesize(), getpagesize());
mprotect(ptr,1,PROT_WRITE);

The next job will be to write the function in the memory pointed to by ptr. create_function() will do it when called. We will deal with this function sometime later. After we manage to do that, the page should be made read and execute only and assigned to an appropriate function pointer.

char operator; //this can have '+' or '-'int (*operation_function_created_online)(int, int);

create_function(ptr, operator);
mprotect(ptr,1,PROT_READ|PROT_EXEC);
operation_function_created_online=ptr;

This done the function can be called just like any other C function.

#include <stdio.h>

int result,val1,val2;

result=operation_function_created_online(val1, val2);
printf("operation %s = %d\n", expr_string, result);

Okay, take a break, get a beer! The next part tells what create_function() does for us.


Coming back, a C function of the type as declared here

int function(int val1, int val2)

will be called like this:
  1. the arguments are passed in stack right to left. so val1will be pushed and then val2 will be.
  2. The function is called, so the next element in the stack is the return address.
  3. The function, in its first instruction, will push ebp to stack, to save its value, so that the stack can be addressed using this register.
  4. The return value is sent back in eax.
  5. The original value of ebp is popped back before returning.
So, the typical caller-callee instruction sequence is something like

Callee
Line numberOpcodeInstruction
155pushl %ebp
289 e5movl %esp, %ebp
... rest of the function ...
... store result in EAX ...
85dpopl %esp
9c3ret
Caller ... some text removed ... 6: mov op2, %eax 7: push %eax 8: mov op1, %eax 9: push %eax 10: call Callee ... do something with EAX ... ... some text removed ...

In our case, we need to have a function like this

1:  push %ebp  
2:  movl %esp, %ebp  
3:  movl 8(%ebp), %eax  
4:  movl 12(%ebp), %eax  
5:
6: ;do the operation here to store result in EAX
7: 8: pop %ebp 9: ret

All that we need to do in create_function() is to fill the memory with these instructions and the intended operation. Once we are done, the program is ready to compile our expression. A sample program can be downloaded from here (modified to parse all elementary binary operations).  The output from this program looks like


This is a very simple illustration of how a compiler works. Hope this motivates the reader towards further studying compiler design in depth.

Monday, January 16, 2012

Memory Protection

It was a lazy afternoon when I was reading a book on Vedic Mathematics. So I asked a friend sitting next to me:

"Hey, how fast can you add 123456789 and 987654321?"

Clever as he is, he quickly opened the calculator application in his desktop, and started typing. To that, I screamed:

"Hey, you are not supposed to use calculators! And anyways, do you think your computer is always right?"

"Well, of course! As far as calculations are concerned! And its just an addition!"

"Ok, how much for my computer adding 2 and 2 to give 5? or 6? or anything I want?"

"You can bet a 1000 bucks if you want! I just love easy money! :P"

So, I started off with it. I wrote a program to add 2 and 2 like this

1:   /*Listing of boozed_desktop.c
2:     It adds 2 and 2 and then faints*/
3:
4:   #include <stdio.h>
5:
6:   void do_magic(void);
7:   
8:   int main(){
9:           do_magic();
10: printf("I want to print 4, and printed %d\n",2+2);
11: }

Ok, so it adds 2 and 2 in the printf() statement. That should convince him. And that means I have to write the function do_magic() for the real trick.

Well, to start off, every operation must be having in its operands somewhere stored in memory. My job should be to change the value of any one of the two operands, so that the addition gives a result I want it to produce. To do that, I need to check where in memory are the two operands (the two 2's) stored. Well, a disassembly can do that for me. That's how the function main() looks when disassembled.

1:          pushl       %ebp  
2:          movl        %esp, %ebp  
3:          andl        $-16, %esp  
4:          subl        $16, %esp  
5:          call        do_magic  
6:          movl        $.LC0, %eax  
7:          movl        $4, 4(%esp)  
8:          movl        %eax, (%esp)  
9:          call        printf  
10:         leave  
11:         ret  

So, at line 5, I see do_magic is being called, and in line 6, .LC0 which must be containing the string literal "I want to print 4, and printed %d\n" to be passed to printf(), is being stored at eax. But I could not find any add operations in there. Well, then I realized that gcc must have added them automatically and put that statement

movl $4, 4(%esp)

at line 7. So, now I have to directly change this value inside the function do_magic()! But, how? How am I supposed to know the address of this movl statement in there?

Well, of course I can! When do_magic() will be called, the address to the next instruction will be pushed to the stack. Then, the first instruction in that function will push the value of ebp in the stack to save its contents. Next, the value of esp will be stored in ebp. Confused, how? Well, That is what every function is compiled to do. Check out lines 1 and 2 in the disassembly of main(). That says it, right?

So, after the first two instructions, ebp will contain the address of the beginning of the stack, where previous value of ebp is saved. Just above that (yes, stack grows downwards) is the address of line 6 of main(), that is the address of

movl $.LC0, %eax

Once I know the size of this instruction, I can calculate the location of my desired statement. Now, how can I find the size of this instruction? Ok, lets compile this code and get the hex dump of the object file. This can be done as

$ gcc -c -o boozed_desktop.o boozed_desktop.c
$ objdump --disassemble boozed_desktop.o

And the object dump will look like this


The highlighted line is the one we are looking for. So, its 5 bytes in length. And not just that, the next line is where we need to do the change. That instruction is of length 8 bytes. Now, a little intuition says that 0xC7 0x44 0x24 0x04 is the opcode to move an immediate value to the address specified by 4(%esp). the trailing 0x04 says it all. So, the "4" we need to change is written as a 32 bit int value in little-endian format as 0x04 0x00 0x00 0x00. That's it! So, summing it all up, do_magic has to do these in order:

  1. Get the return address from stack by using ebp.
  2. Add 5 bytes + 4 bytes to that address to get the address of "4".
  3. Modify it!
Is that all? No, not quite ... Actually, the memory region we are going to modify is holding the executable code. So this memory page must be read-and execute-only page. And we need to have the write permissions to this page. No problem! mprotect() can do that for me. And for that, we will send mprotect() the address of the starting location of this page [which is done by address & (~(getpagesize()-1))]. This will be done by 

mprotect((unsigned char *)((unsigned int)address & ~(getpagesize()-1)),1,PROT_READ|PROT_EXEC|PROT_WRITE);

Oh, you got to notice that! mprotect() was asked to grant all permissions on this page. This is because if do_magic() is in the same memory page as main() [which it will usually be], with only the PROT_WRITE permission (and hence PROT_READ and PROT_EXEC revoked), the call to mprotect() will never be able to return!

Okay, saying all that, I just wrote it down to a function, and added a bit of salt and sugar, to cook up this up

1:   #include <stdio.h>  
2:   #include <unistd.h>  
3:   #include <sys/mman.h>  
4:    
5:   void do_magic(void);  
6:    
7:   int main(){  
8:           do_magic();  
9:           printf("I want to print 4, and printed %d\n",2+2);  
10:  }  
11:    
12: void do_magic(void){ 13: int *var; 14: unsigned char *retloc; 15: int number; 16: 17: printf("How much do you want \"2+2\" to print?\n"); 18: scanf("%d",&number); 19: 20: __asm__ volatile("mov 4(%%ebp), %0\n\t":"=r"(retloc):); 21: mprotect((unsigned char *)((unsigned int)retloc & ~(getpagesize()-1)),1,PROT_READ|PROT_EXEC|PROT_WRITE); 22: *((int *)(retloc+(5+4)))=number; 23: }

Well, that should be enough to irritate my friend! Sure he will fire up when he gets to see this:


Now, just in case, if somebody is thinking that these can only be used to goof up with systems, in the next post, we will see how such memory protection schemes can be put to some good.

Saturday, January 14, 2012

The C in every Matlab: mex

Most of the engineers require to perform certain simulation experiments in their career, and Matlab proves to be the application of preferred choice to most (if not all) due to the ease of programming and the numerous libraries of matrix manipulation it exposes to the user. However, the only factor which hinders rapid development is the speed with which it executes (or rather say interprets) programs. This is because, Matlab reads a line of code, interprets it, and then executes it, which is much slower as compared to a compiled code.

So, one day I decided to make my Matlab programs execute faster. I read somewhere about mex and wanted to do it myself. The following discussion talks about how to use mex files and enhance the performance of a Matlab program. It is to be noted that in my Linux system running on an x86 machine, Matlab was installed at /matlab2009 and the programs were written and compiled keeping that in mind. One might need to make necessary modifications to those listings and commands to suit one's own bill.

Well, to begin with, what exactly is a mex file? Well, it is just a shared library that Matlab links with upon loading, and the functions contained inside can be accessed from Matlab. So, to start with, we require to configure Matlab to successfully compile a C file and generate a mex file. This is done by executing the following command in Matlab console

>> mex -setup

This will display the available C compiles in your computer and will ask the user to choose one from them. It then accordingly sets the mexopts.sh configuration file in the user's home directory which will be reffered to when compiling a C program.

Once this is done, we are all set for writing C programs to be compiled into mex files. A sample file can be downloaded from here.

This function takes two matrices as input, multiplies and divides them element wise and return the result in two output matrices. To test the performance enhancement, we wrote three .m files

The profiler is turned on, and the file Test.m is executed. This is done by 

>> mex sample_mex.c
>> profile on
>> Test
>> profile viewer

And we can see from the profiler output that the mex file has reduced execution time from 4.6 seconds to 0.13 seconds!


Now, lets have a look into the file sample_mex.c. The only function in the file is defined as

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

This is the entry function when the function sample_mex(inparm_one , inparm_two) is called from matlab. Any matrix in Matlab is represented as an mxArray inside a mex function. The input matrices passed from matlab are send in an array of mxArray pointers called prhs and the number of input arguments is sent in nrhs. The output matrices post calculation is sent back in plhs and the number of output variables passed back is indicated in nlhs. The dimensions of a matrix can be found out by using functions mxGetNumberOfDimensions() and mxGetDimensions().

In lines 31 to 42, a check is made whether the input matrices are acceptable to the function. Checks are made to ensure that the matrices

  • are of equal dimensions. 
  • are numeric. This is done by using mxIsNumeric().
  • are of type double precision floating point numbers. This is done by mxIsDouble().
It should be noted that the output matrices to be returned should be created inside the mex file by using mxCreateNumericArray(). Also, the data elements of an mxArray cannot be accessed directly. Therefore, it is required to get the storage address of the data by using mxGetPr() and upon calculation, set the data of the mxArray by using mxSetPr(). This is done in lines 44 to 58. 

One more point is to be made here. The data passed in a matrix is represented in the form of a linear array inside a mex file. The size of the array is equal to the product of the  dimension sizes of the matrix. So, if we create a matrix in Matlab,

>> a = ones(300 , 600 , 4);

the size of the array will be 300 x 600 x 4 = 720000. And an element a(i , j , k)  of the matrix can be accessed as this (keeping in mind that indexing starts from 0 in C):

*(data_of_a + ( k - 1 ) * 300 * 600 +( j - 1 ) * 300 + ( i -1 ))

So, putting them together, the calculation was done in lines 57 to 60.

That's it. Hope this motivates readers towards writing short mex files and save quite a few hours of their work. The complete reference to Matlab API for C can be obtained here for further details.



Well, if you think that's the end of the story, you got it all wrong. Just like you can call functions from libraries written in C inside Matlab, you can let Matlab do specific tasks for a program written in C. For this you require to prepare your system by

  • installing the C shell, csh. This is required as engOpen() executes Matlab from csh.
  • creating a link named matlab to MATLAB executable in one of the directories in the search path of csh (namely /usr/local/bin or /usr/bin). For example, in my system, this was done by 

$ ln -s /matlab2009/bin/glnx86/MATLAB /usr/bin/matlab

All this done, and you are ready to write C programs which use Matlab functions. For example, this program starts the Matlab engine and calls Matlab function plot to show a sine wave animation. All you need to remember is to include the headers from the correct directory and link against Matlab libraries. To know more about compiling in gcc, check this out. In my system it was done as:

$ gcc plotter.c -I/matlab2009/extern/include -lm -L/matlab2009/bin/glnx86 -lmat -lmex -leng -Wl,-rpath,/matlab2009/bin/glnx86 

And that's it! The output was there


Well, that's all for today. Keep reading ...



Thursday, January 12, 2012

Compiling a C program ...

Sometime back I got hold of a third-party library which had a file libsample.so (the shared library) and another named sample.h, the header containing the prototype declaration of the functions contained in the library. I put them both in my home directory /home/subhajit and started writing a program sampletest.c in my home directory to test it.

1:   #include <stdio.h>  
2:   #include <stdlib.h>  
3:   #include <sample.h>
4:   #include <math.h>  
5:    
6:   int main(){  
7:           int var1=2  
8:           int var2=4;  
9:           int var3=var2++var3;  
10:          double var4;  
11:    
12:          /*this inline assembly is of no use.*/  
13:          /*just added to have some flavour*/  
14:          __asm__("push %eax\n\t" "mov %eax,$1\n\t" "pop %eax");  
15:    
16:          var4=pow(add(2,6),0.25);  
17:          printf("%lf\n",var4);  
18:  }  

Once done, I start to put the code together. The first step would be to preprocess the code to expand all the header files and the preprocessor directives.


Whoa! I could very well see the file sample.h in the directory listing. Well, then gcc was never told to look for a header file in the current directory! We need to tell it now, and we do it like this


Now that the preprocessing succeeded, we can have a look into the output file sampletest.i.

1:   # 1 "sampletest.c"  
2:   # 1 "<built-in>"  
3:   # 1 "<command-line>"  
4:   # 1 "sampletest.c"  
5:   # 1 "/usr/include/stdio.h" 1 3 4  
6:    
7:   ... Many Lines Removed ...  
8:    
9:    
10: typedef _G_fpos_t fpos_t; 11: 12: ... Some Lines Removed ... 13: 14: extern struct _IO_FILE *stdin; 15: extern struct _IO_FILE *stdout; 16: extern struct _IO_FILE *stderr; 17: 18: ... Some Lines Removed ... 19: 20: extern int fprintf (FILE *__restrict __stream, 21: __const char *__restrict __format, ...); 22: 23: extern int printf (__const char *__restrict __format, ...);
24: 25: ... Many Lines Removed ... 26: 27: # 1 "/home/subhajit/sample.h" 1 28: 29: extern int add(int,int); 30: # 4 "sampletest.c" 2 31:
32: int main(){ 33: int var1=2 34: int var2=4; 35: int var3=var2++var3; 36: double var4; 37: 38: 39: 40: __asm__("push eax\n\t" "mov %eax,$1\n\t" "pop %eax"); 41: 42: var4=pow(add(2,6),0.25); 43: printf("%lf\n",var4); 44: }

Ok, so we can see the files stdio.h, stdlib.h and sample.h (and all the files they include) have been expanded (some of the prototype declarations are shown in gray). Also, the comment in our program has been removed.

Next, we need to compile the program to convert it from C to machine language, which in my case would be an x86 assembly program. we can do it like this:


Not again! Now, all these jargons of syntax and semantics ... So, the preprocessor can't detect this,rite? Alrite, we got the errors ...  the missing semicolon at line 7 and the extra '+' in line number 9. Corrected them, preprocessed and compiled again, all to have a sampletest.s in my home directory. WHOOO!!!

With that, I was ready for assembling. So I ran the assembler, as.


Just what I was waiting for! Another error! And this time its the assembler. Which waits to catch errors in inline assembly codes or any inappropriate changes made to the generated .s file. In this case, the error is just that we can not move the value of eax into an immediate operand, while the reverse is true. So we change "mov %eax, $1\n\t" to "mov $1, eax\n\t" in line 14 of sampletest.c and obtain the binary file in machine code with all the symbol tables and relocation directives. This is what we usually do by

gcc -c -o sampletest.o sampletest.c

Well, well! All that is left is linking the object file to the shared libraries and get the binary executable. This time I will be extra cautious and will run ld, the linker.


so the linker command is quite long and is given below

ld -dynamic-linker /lib/ld-linux.so.2 -o sampletest sampletest.o /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o `gcc -print-file-name=`/crtbegin.o `gcc -print-file-name=`/crtend.o -L/home/subhajit -lc -lm -lsample -rpath=/home/subhajit
See how the linker options have been written in separate lines using '\'. Lets take the options one by one.

  • -dynamic-linker /lib/ld-linux.so.2 is required to be linked to the executable, because this .so file loads the program into memory and takes care of the relocation of the sections when the program is executed.
  • -o sampletest is the executable name to be created and sampletest.o is the source object file
  • /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o are the C runtime object files which come with the routines _start, _init and _fini (actually a lot more functions, just do a objdump --disassemble /usr/lib/crt1.o to peek into its internals). These functions are the one which call main() and when main() returns, call _exit() to return control back to kernel.
  • `gcc -print-file-name=`/crtbegin.o `gcc -print-file-name=`/crtend.o are required to link the program against gcc object files. `gcc -print-file-name` gives the absolute path to these objects (in my case it was /usr/lib/gcc/i486-linux-gnu/4.4.3/).
  • -L/home/subhajit tells the linker to look for the libraries in /home/subhajit in addition to the default directory for libraries; and -lc -lm -lsample tell that the libc.a, libm.so (for pow() function) and libsample.so (for add() function) are the libraries to be linked.
  • Finally, when the program will be executed, it will require to load libsample.so into memory which is not present in the standard library path (which may be /usr/lib or /lib). So, -rpath=/home/subhajit tells that the program can load libraries from /home/subhajit at runtime.
This hectic task of linking the sampletest.o to sampletest (with execute permission, verify that with ls -l sampletest) is generally done by

gcc sampletest.o -L/home/subhajit -lsample -lm -Wl,-rpath,/home/subhajit

So, all we can say after this, is that gcc takes care of all these task by preprocessing, compiling, assembling by internally calling as and linking through ld, and thereby saves the day for us. and all we care to do is:



Tuesday, January 10, 2012

Using inline assembly ...

Disclaimer
This is not intended to be a guide towards using inline assembly, and therefore does not cover the topic in full. This can be treated as merely a motivation towards using inline assembly. To explore the topic in depth, please read this.

I am, and always was really scared of assembly language. When I was an under-grad student, my x86 assignments were mostly codes written in C and submission of the disassembly. Of course, with C being pretty low-level, one does not need to dirty their hands (no offence meant to assembly programmers) with the syntax of mnemonics and opcodes.

However, there are times when C fails to meet some requirements of a programmer. An example has already been put up in my last post, where a integer variable needs to be rotated bitwise. Or to perform particularly arithmetic or logical shifts to a variable, since the bitwise right shift operation in C (>>) is implementation specific.

This is where inline assembly saves the day for a C programmer. A few lines of assembly code can be embedded effortlessly in a C program to carry out specific architecture dependent tasks. The basic format for inline assembly is

__asm__ volatile ("your first assembly statement here\n\t"
"your second assembly statement here\n\t")

Of course I could have used the keyword asm instead of __asm__ , and the volatile keyword specifies that since I am using inline assembly, I know exactly what I intend to do, so no optimizing tricks, please!

So, now I can go for a logical shift to 0xAAAAAAAA to obtain 0x55555555, and I can be sure that I get what I want. A small code for this can be written as

1:   #include <stdio.h>  
2:     
3:   int var1;  
4:   int var2;
5:  
6:   int main(){  
7:           var=0xaaaaaaaa;  
8:           __asm__ volatile        ("pushl %eax\n\t"  
9:                                    "pushl %ebx\n\t"  
10:                                   "mov $var1, %ebx\n\t"  
11:                                   "movl (%ebx), %eax\n\t"  
12:                                   "shrl $1, %eax\n\t"
13:                                   "mov $var2, %ebx\n\t"  
14:                                   "mov %eax, (%ebx)\n\t"  
15:                                   "popl %ebx\n\t"  
16:                                   "popl %eax\n\t"  
17:                                  );  
18:    
19:          printf("%p  after 1 bit logical shift = %p\n",(unsigned char *)var1, (unsigned char *)var2);  
20:  }  

and the output is certainly


Needless to say, the size of the main function is increased significantly due to this. Much can be accounted to the push and pop statements (which are required to save the registers to be used in the inline assembly) and moving the data (0xAAAAAAAA in this case) to and from memory. Also, the variable var was declared global for making it accessible to the assembly code.

To get around these shortcomings, we generally use extended assembly, and the same code can there be written as

1:   #include <stdio.h>  
2:       
3:   int main(){   
4:           int var1=0xaaaaaaaa,var2; /*local variables now*/
5:   
6:           __asm__ volatile        ("movl %0, %%eax\n\t"   
7:                                    "shrl $1, %%eax\n\t"  
8:                                    "movl %%eax, %1\n\t"  
9:                                    :"=r"(var2):"r"(var1):"%eax"  
10:                                  );   
11:      
12:          printf("%p after 1 bit logical shift = %p\n",(unsigned char *)var1, (unsigned char *)var2);   
13:  }

to obtain


What was the code, anyway? Well, lets explain it in bits and pieces.


  • First of all, the %0 and %1 are placeholders. %0  is filled in from input variable var1 which will be stored in a register, as denoted by the constraint "r" preceding it. 
  • %1 placeholder represents an output variable in register (denoted by "=r") and will be placed in var2. The registers to be used for storing the variables is decided in compile time (in my case it was edx). 
  • In the entire operation, only the value placed in eax will be tinkered with, and hence it is placed in the list of clobbered registers. This instructs gcc not to use eax in other parts of the code, or to save the value before entering and restore the value after leaving the assembly code.

The disassembly of the main function looks like this

1:       pushl %ebp  
2:           movl        %esp, %ebp  
3:           andl        $-16, %esp  
4:           subl        $32, %esp  
5:           movl        $-1431655766, 28(%esp)  
6:           movl        28(%esp), %edx  
7:
8: movl %edx, %eax 9: shrl $1, %eax 10: movl %eax, %edx
11: 12: movl %edx, 24(%esp) 13: movl 24(%esp), %ecx 14: movl 28(%esp), %edx 15: movl $.LC0, %eax 16: movl %ecx, 8(%esp) 17: movl %edx, 4(%esp) 18: movl %eax, (%esp) 19: call printf 20: leave 21: ret

Hope this motivates you towards inline assembly. For a complete understanding, please read here. For a complete reference to x86 instructions and opcodes, check this out.

Monday, January 9, 2012

What is volatile ...

Of all the keywords in C, the one which intrigued me the most was this: volatile !! Most of the sources instructed use of volatile when a variable is suspected to be changed outside the context of a program. According to the K&R book:

The purpose of volatile is to force an implementation to suppress optimization that could otherwise occur. For example, for a machine with memory-mapped input/output, a pointer to a device register might be declared as a pointer to volatile, in order to prevent the compiler from removing apparently redundant references through the pointer.

Well, that was absurd! How can a variable change in a sequential program? Needless to say, little did I know how the computer hides a lot of details beyond the visibility of an user, or a user-level programmer !

So, finally I set foot to solve this mystery with inline assembly. Well, the story goes like this:

C language does not have an operator or a built-in function to perform bitwise rotations on variables, though a function for this can be easily written like this. However, as x86 architecture has a machine instruction to rotate bitwise, why not use that? All I needed was to insert an inline assembly in my C code. If you are not familiar with inline assembly, refer this.

So, I wrote a code like this:

1:   #include <stdio.h>  
2:    
3:   int main(){  
4:           int count;  
5:           int variable_one,variable_two;  
6:           variable_one=20;  
7:           variable_two=2;  
8:           __asm__("mov $0x80000001, %edx\n\t");  
9:           for(count=0;count<16;count++){  
10:                  variable_one=variable_one*variable_two;  
11:                  __asm__("roll $1,%%edx\n\t");  
12:          }  
13:          variable_two=variable_one+2;  
14:          printf("variable_one = %d\n",variable_one);
15:  }   

So, this code rotates edx 16 times so that at the end of the loop edx contains 0x00018000 (0x80000001 rotated left 16 times). Also, in the same loop, it multiplies 20 in  variable_one with 2 in variable_two to obtain 20 x (2^16).  Lastly, the value of 1310720 (= 20 x 2^16) contained in variable_one is printed. [Please understand that the loop was completely unnecessary and this code is particularly for demonstration purpose]


So, I compiled the program and ran it, just to get an erroneous result. To see what went wrong, I disassembled the code, (use gcc -S to get the disassembly of a C code) which is given below:

1:  .LC0:  
2:           .string        "variable_one = %d\n"  
3:    
4:  ... Some Text Removed ...  
5:    
6:  main:  
7:           pushl        %ebp  
8:           movl        %esp, %ebp  
9:           andl        $-16, %esp  
10:          subl        $16, %esp  
11:    
12: mov $0x80000001, %edx
13: 14: movl $20, %edx
15: movl $0, %eax 16: .L2: 17: addl %edx, %edx 18:
19: roll $1,%edx
20: 21: addl $1, %eax 22: cmpl $16, %eax 23: jne .L2
24: movl %edx, 8(%esp) 25: movl $.LC0, 4(%esp) 26: movl $1, (%esp) 27: call __printf_chk 28: leave 29: ret

The two inline assembly lines entered in the code have been highlighted in gray. The for loop has been highlighted in light blue. It can be seen that, during the loop, the value of variable_one is held in edx and the inline assembly is tinkering with that in the next line. gcc is not to be blamed, as it is completely unaware that the assembly will change the value contained in edx, it does nothing to protect it.

Not to worry, as volatile saves the day ... I just changed line 5 in hack_asm.c to look like this:

1:   #include <stdio.h>  
2:    
3:   int main(){  
4:           int count;  
5:           volatile int variable_one,variable_two;  
6:           variable_one=20; 
7: 
8:  ... Some Text Removed ...
9:
10:          printf("variable_one = %d\n",variable_one);
11:  }   

and recompiled the code and ran it again. This time, it was perfect!


So what did happen back there? A dive into the disassembly could reveal the caveats

1:   .LC0:  
2:           .string        "variable_one = %d\n"  
3:    
4:  ... Some Text Removed ...  
5:    
6:   main:  
7:           pushl        %ebp  
8:           movl        %esp, %ebp  
9:           andl        $-16, %esp  
10:          subl        $32, %esp  
11:          movl        $20, 28(%esp)  
12:          movl        $2, 24(%esp)  
13:    
14: mov $0x80000001, %edx
15:
16: movl $0, %eax 17: .L2: 18: movl 28(%esp), %edx 19: movl 24(%esp), %ecx 20: imull %ecx, %edx 21: movl %edx, 28(%esp) 22:
23: roll $1,%edx
24: 25: addl $1, %eax 26: cmpl $16, %eax 27: jne .L2
28: movl 28(%esp), %eax 29: addl $2, %eax 30: movl %eax, 24(%esp) 31: movl 28(%esp), %eax 32: movl %eax, 8(%esp) 33: movl $.LC0, 4(%esp) 34: movl $1, (%esp) 35: call __printf_chk 36: leave 37: ret

A quick view reveals that the code inside the loop has increased significantly due to the introduction of the word volatile. the keyword volatile told gcc that the variables variable_one and variable_two  can be changed and should always be saved and reloaded. So for each iteration, the value of variable_one is loaded from memory, multiplied and is put back to the memory. Though this increases program size and execution time significantly, the risk of the value being altered is avoided.

So it is advisable to use volatile keyword with variables when using inline assembly. Or no, there's another way? Well, we will deal with it next.

P.S. Your feedback is my most valuable reward. Regards, Subhajit


Introduction

I started developing software in C about 10 years since. The journey began with 16-bit Turbo C compiler which soon made its way to 32-bit GNU C compiler (gcc) under LINUX for conformation to standards. I must say that I have not grown into an expert yet, but over the past years I have experienced a lot of obstacles, and eventually made my way through by learning new tricks.

This blog is sincerely about demonstrating the various facts that seemed to be beyond comprehension, and thus to throw some light so as to aid others in grasping the facts quick and early.

My sincere note of appreciation goes to my friend Tanmay, without whose inspiration this would have still been in my 'extended To-Do list' ...

I would be highly grateful if readers kindly point out errors/faults and provide appropriate feedback to enhance the quality of this post.

The posts of this blog can be accessed here:

  1. What is volatile ...
  2. Using inline assembly ...
  3. Compiling a C program ...
  4. The C in every Matlab: mex
  5. Memory Protection
  6. Function on the fly ...
  7. A Loadable Kernel Module ...
  8. Compiling a kernel ...
  9. The Power of Parallelism - Part I
  10. The Power of Parallelism - Part II
  11. The Power of Parallelism - Part III

Thanks,
Subhajit