IA-32 & System Calls
	A system call is the mechanism used by an application program 
to request service from the operating system.  This often includes access 
to hardware or data structures that would otherwise be restricted to 
access only by sufficiently privileged code (such as kernel code).

A few common examples of system calls are:
	* read, write
	* open, close
	* kill
	* fork

	Intel's 32-bit processor architectures (IA-32) offer exception and
interrupt handling features that Linux uses for system calls and of course,
interupts.
	More specifically, for system calls and operating systems design,
Intel offers 4 code privilege levels (or protection rings), gates, software 
interrupts and hardware interrupts.







	Applications usually run in privilege level 3, where only general 
purpose instructions can be executed and no access to I/O ports is allowed.
There are instructions available to grant access to I/O ports and the right
to enable and disable interrupts. These instructions must be executed by the
O.S. at PL 0 and remain in effect when the control is returned to the process
at PL 3.  The Current Privilege Level (CPL) is stored in bits 0-1 of the CS
(contains segment selector for code segment) and SS (contains segment selector
for stack segment) registers.

	Gates are special descriptors that provide protected gateways
to system procedures that operate at different privilege levels than application
programs.

	Linux uses Intel's software interrupt feature for system calls.  Linux 
uses the INT n instruction, which specifies an interrupt vector.  The vector
number (0-255) is the index to a gate descriptor in the Interrupt Descriptor
Table (IDT).  The gate descriptor then contains the address of the interrupt or
exception handler - as well as providing a protected (via segment-based
privileging) path to that handler.




A System Call in Linux

1.	The initialization required to allow for system calls begins with the
trap_init() function.  This sets up the IDT so that vector 0x80 points to the
address of system_call entry from arch/i386/kernel/entry.S.

2.	Next is the actual invocation of the system call by a user program or
the operating system itself.

3.	The library call results in a call to the syscallX (include/asm-i386/unistd.h)
macro. This is a generic name referring to one of several macros defined in
include/asm-i386/unistd.h.  There, there is a macro defined for each possible
number of arguments to the system call.


		#define _syscallX(type,name,type1,arg1) \
		type name(type1 arg1) \
		{ \
		long __res; \
		__asm__ volatile ("int $0x80" \
			: "=a" (__res) \
			: "0" (__NR_##name),"b" ((long)(arg1))); \
		if (__res >= 0) \
			return (type) __res; \
		errno = -__res; \
		return -1; \
		}


This macro expands into assembly (using the chdir system call as an example):


		_syscall1(int,chdir,char*,path); 

		_chdir:
		         subl $4,%exp
		         pushl %ebx			; save address
		         movzwl 12(%esp),%eax		; prepare parameters
		         movl %eax,4(%esp)
		         movl $23,%eax
		         movl 4(%esp),%ebx
		         int $0x80			; software interrupt changes to kernel mode and jumps to handler
		         movl %eax,%edx
		         testl %edx,%edx		; check for error
		         jge L2				; if no error, go to L2
		         negl %edx
		         movl %edx,_errno
		         movl $-1,%eax
		         popl %ebx
		         addl $4,%esp
		         ret
		       L2:
		         movl %edx,%eax			; clean up
		         popl %ebx
		         addl $4,%esp
		         ret				; return


When INT 0x80 is called, the privilege level finally changes.  Specifically,
control goes to the _system_call() function defined in arch/i386/kernel/entry.S.


		_system_call:
			pushl %eax			; save orig_eax
			SAVE_ALL
			movl $-ENOSYS,EAX(%esp)
			cmpl $(NR_syscalls),%eax
			jae ret_from_sys_call
			movl _sys_call_table(,%eax,4),%eax
			testl %eax,%eax
			je ret_from_sys_call
			movl _current,%ebx
			andl $~CF_MASK,EFLAGS(%esp)	; clear carry - assume no errors
			movl $0,errno(%ebx)
			movl %db6,%edx
			movl %edx,dbgreg6(%ebx) 	; save current hardware debugging status
			testb $0x20,flags(%ebx)
			jne 1f
			call *%eax
			movl %eax,EAX(%esp)		; save the return value
			movl errno(%ebx),%edx
			negl %edx
			je ret_from_sys_call
			movl %edx,EAX(%esp)
			orl $(CF_MASK),EFLAGS(%esp)	; set carry to indicate error
			jmp ret_from_sys_call


	This routine basically does some error checking on the value of the
NR_identifier, it then looks up the address in _sys_call_table and transfers
execution control to the handler at that address.  After the handler completes
execution of the system call, control returns to the user program.
	Many handlers are defined by kernel/sys.c.  chdir() is in fs/open.c.

Interrupts A thorough paper on interrupts under Linux 2.4.


PowerPoint Slideshow: Features of Intel Processor Architectures that Lend to Operating System Design

Slideshow References:
Other Links:
Wikipedia On:
HOME