IA-32 & System Calls
A system call is the mechanism used by an application program
to request service from the operating system. This often includes access
to hardware or data structures that would otherwise be restricted to
access only by sufficiently privileged code (such as kernel code).
A few common examples of system calls are:
* read, write
* open, close
* kill
* fork
Intel's 32-bit processor architectures (IA-32) offer exception and
interrupt handling features that Linux uses for system calls and of course,
interupts.
More specifically, for system calls and operating systems design,
Intel offers 4 code privilege levels (or protection rings), gates, software
interrupts and hardware interrupts.
Applications usually run in privilege level 3, where only general
purpose instructions can be executed and no access to I/O ports is allowed.
There are instructions available to grant access to I/O ports and the right
to enable and disable interrupts. These instructions must be executed by the
O.S. at PL 0 and remain in effect when the control is returned to the process
at PL 3. The Current Privilege Level (CPL) is stored in bits 0-1 of the CS
(contains segment selector for code segment) and SS (contains segment selector
for stack segment) registers.
Gates are special descriptors that provide protected gateways
to system procedures that operate at different privilege levels than application
programs.
Linux uses Intel's software interrupt feature for system calls. Linux
uses the INT n instruction, which specifies an interrupt vector. The vector
number (0-255) is the index to a gate descriptor in the Interrupt Descriptor
Table (IDT). The gate descriptor then contains the address of the interrupt or
exception handler - as well as providing a protected (via segment-based
privileging) path to that handler.
A System Call in Linux
1. The initialization required to allow for system calls begins with the
trap_init() function. This sets up the IDT so that vector 0x80 points to the
address of system_call entry from arch/i386/kernel/entry.S.
2. Next is the actual invocation of the system call by a user program or
the operating system itself.
3. The library call results in a call to the syscallX (include/asm-i386/unistd.h)
macro. This is a generic name referring to one of several macros defined in
include/asm-i386/unistd.h. There, there is a macro defined for each possible
number of arguments to the system call.
#define _syscallX(type,name,type1,arg1) \
type name(type1 arg1) \
{ \
long __res; \
__asm__ volatile ("int $0x80" \
: "=a" (__res) \
: "0" (__NR_##name),"b" ((long)(arg1))); \
if (__res >= 0) \
return (type) __res; \
errno = -__res; \
return -1; \
}
This macro expands into assembly (using the chdir system call as an example):
_syscall1(int,chdir,char*,path);
_chdir:
subl $4,%exp
pushl %ebx ; save address
movzwl 12(%esp),%eax ; prepare parameters
movl %eax,4(%esp)
movl $23,%eax
movl 4(%esp),%ebx
int $0x80 ; software interrupt changes to kernel mode and jumps to handler
movl %eax,%edx
testl %edx,%edx ; check for error
jge L2 ; if no error, go to L2
negl %edx
movl %edx,_errno
movl $-1,%eax
popl %ebx
addl $4,%esp
ret
L2:
movl %edx,%eax ; clean up
popl %ebx
addl $4,%esp
ret ; return
When INT 0x80 is called, the privilege level finally changes. Specifically,
control goes to the _system_call() function defined in arch/i386/kernel/entry.S.
_system_call:
pushl %eax ; save orig_eax
SAVE_ALL
movl $-ENOSYS,EAX(%esp)
cmpl $(NR_syscalls),%eax
jae ret_from_sys_call
movl _sys_call_table(,%eax,4),%eax
testl %eax,%eax
je ret_from_sys_call
movl _current,%ebx
andl $~CF_MASK,EFLAGS(%esp) ; clear carry - assume no errors
movl $0,errno(%ebx)
movl %db6,%edx
movl %edx,dbgreg6(%ebx) ; save current hardware debugging status
testb $0x20,flags(%ebx)
jne 1f
call *%eax
movl %eax,EAX(%esp) ; save the return value
movl errno(%ebx),%edx
negl %edx
je ret_from_sys_call
movl %edx,EAX(%esp)
orl $(CF_MASK),EFLAGS(%esp) ; set carry to indicate error
jmp ret_from_sys_call
This routine basically does some error checking on the value of the
NR_identifier, it then looks up the address in _sys_call_table and transfers
execution control to the handler at that address. After the handler completes
execution of the system call, control returns to the user program.
Many handlers are defined by kernel/sys.c. chdir() is in fs/open.c.
Interrupts
A thorough paper on interrupts under Linux 2.4.
PowerPoint Slideshow: Features of Intel Processor Architectures that Lend to Operating System Design
Slideshow References:
Other Links:
Wikipedia On:
HOME