Wednesday, February 8, 2012

Difference between Process and Threads

A process has five fundamental parts:
  • code ("text")
  • data (VM)
  • stack
  • file I/O, and 
  • signal tables. 
Processes have a significant amount of overhead when switching: all the tables have to be flushed from the processor for each task switch. Also, the only way to achieve share information between processes is through pipes and "shared memory". If a process spawns a child process using fork(), the only part that is shared is the text.

    Threads reduce overhead by sharing fundamental parts. By sharing these parts, switching happens much more frequently and efficiently. Also, sharing information is not so "difficult" anymore: everything can be shared. There are two types of threads: user-space and kernel-space.

When we create a new thread in a process, the new thread of execution gets it's own stack(and hence local variables) but shares global variables, file descriptors, signal handlers, and its current directory state with process that created it.

When a process executes a "fork" call, a new copy of process is created with it's own variables and it's own pid. The new process is schedules independently and in general executes almost independently of the process that created it.

 Every process is protected from every other process in the system by the kernel using a Memory Management Unit (MMU). Since each process is independent of the others the kernel can schedule several to execute in parallel when there are several CPUs or cores to schedule on.

Threads enhance the process model with multiple, parallel, flows of execution within a process. All threads within a process share the same memory space.

The kernel treats threads as separate and independent entities so it can schedule several threads to run in parallel, just as it can with complete processes.

So the key difference between processes and threads is the way memory is managed. The two of the most most important things about thread are
  • Inter-thread communication is fast
  • There is no protection between threads
Since processes do not naturally share memory it is difficult for one process to communicate with another. Several Inter-Process Communications (IPC) methods exist but they all rely on passing data via some intermediary such as the file system or network stack. Ultimately the kernel manages communications between them.

Threads, on the other hand, can communicate directly using shared memory objects such as arrays of data (buffers).The disadvantage of threads is that  bug in one thread can corrupt the memory being used by another thread.

POSIX is the standard for threads

POSIX.1 specifies a set of interfaces (functions, header files) for threaded programming commonly known as POSIX threads, or Pthreads.  A single process can contain multiple threads, all of which are executing the same program.These threads share the same global memory (data and heap segments), but each thread has its own stack (automatic variables).

POSIX.1 also requires that threads share a range of other attributes (i.e.these attributes are process-wide rather than per-thread):

       -  process ID

       -  parent process ID

       -  process group ID and session ID

       -  controlling terminal

       -  user and group IDs

       -  open file descriptors

       -  record locks

      -  signal dispositions

       -  file mode creation mask

       -  current directory and root directory

       -  interval timers and POSIX timers

       -  nice value

       -  resource limits

       -  measurements of the consumption of CPU time and resources

       As well as the stack, POSIX.1 specifies that various other attributes are
       distinct for each thread, including:

       -  thread ID

       -  signal mask

       -  the errno variable

       -  alternate signal stack

       -  real-time scheduling policy and priority

How to view processes and threads in Linux?

To see every process on the system using standard syntax:
  • ps -e
  • ps -ef
  • ps -eF
  • ps -ely
To see every process on the system using BSD syntax:
  • ps ax
  • ps axu
 To print a process tree:
  •  ps -ejH
  • ps axjf
To get info about threads:
  •  ps -eLf
  • ps axms
To get security info:
  • ps -eo euser,ruser,suser,fuser,f,comm,label
  • ps axZ
  • ps -eM
To see every process running as root (real & effective ID) in user format:
  •   ps -U root -u root u
To see every process with a user-defined format:
  • ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
  • ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
  • ps -eopid,tt,user,fname,tmout,f,wchan
Print only the process IDs of syslogd:
  •  ps -C syslogd -o pid=
Print only the name of PID 42:
  •  ps -p 42 -o comm=

Mysql is a good example, which uses threads for managing client connections.

[root@dhcppc3 ~]# ps -eL | grep mysql
   2787    2787 pts/0    00:00:00 mysqld_safe
   2916    2916 pts/0    00:00:00 mysqld
   2916    2918 pts/0    00:00:00 mysqld
   2916    2919 pts/0    00:00:00 mysqld
   2916    2920 pts/0    00:00:00 mysqld
   2916    2921 pts/0    00:00:00 mysqld
   2916    2923 pts/0    00:00:00 mysqld
   2916    2924 pts/0    00:00:00 mysqld
   2916    2925 pts/0    00:00:00 mysqld
   2916    2926 pts/0    00:00:00 mysqld
   2916    2927 pts/0    00:00:00 mysqld

So we observe that all the mysql threads have same process id - 2916, in this example.
A more elaborate output

[root@dhcppc3 ~]# ps H -Le | grep mysql
   2787    2787 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
   2916    2916 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2918 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2919 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2920 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2921 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2923 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2924 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2925 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2926 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
   2916    2927 pts/0    Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock




 


No comments:

Post a Comment