Thursday, January 31, 2013

How to find SATA disk(sda) serial number using hdparm?

The serial number and model number of the SATA disk(sda) can be found as follows


# hdparm -I /dev/sd? | grep Number
        Model Number:       Hitachi HDS721616PLA380
        Serial Number:      PVF904Z9TN7VLN

Tuesday, January 29, 2013

Fragmentation


Mainly to avoid fragmentation, the ext2/ext3 filesystem is divided into block groups.

When allocating space for a new file or extending an existing file, the ext2/ext3 filesystem will preallocate upto eight data blocks for the file. Unused preallocated blocks are freed when the file is closed, truncated or when a non-sequential write is detetcted. When extending a file, the ext2/ext3 filesystem will try to get  a new block for the file near the last  block that was allocated for the file. If a free block cannot be found in the file's block group, a free block is allocated from another block group.

How To check the fragmentation at various levels?

1) To check fragmentation for a specific file, using filefrag command

# filefrag -v messages-20130106
Filesystem type is: ef53
File size of messages-20130106 is 1637646 (400 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  2804193               1
   1       1  2786878  2804193      2
   2       3    25640  2786879     19
   3      22    25504    25658     42
   4      64  1279296    25545     60
   5     124  2636896  1279355    276 eof
messages-20130106: 6 extents found


2) To find the fragmentation for a mounted filesystem, use dumpe2fs command

# dumpe2fs /dev/sda2

Group 143: (Blocks 4685824-4718591) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Checksum 0x4cd5, unused inodes 8096
  Block bitmap at 4194319 (+4294475791), Inode bitmap at 4194335 (+4294475807)
  Inode table at 4201926-4202431 (+4294483398)
  32768 free blocks, 8096 free inodes, 0 directories, 8096 unused inodes
  Free blocks: 4685824-4718591
  Free inodes: 1157729-1165824

Group 144: (Blocks 4718592-4751359) [INODE_UNINIT, ITABLE_ZEROED]
  Checksum 0x6a35, unused inodes 8096
  Block bitmap at 4718592 (+0), Inode bitmap at 4718608 (+16)
  Inode table at 4718624-4719129 (+32)
  26164 free blocks, 8096 free inodes, 0 directories, 8096 unused inodes
  Free blocks: 4718605-4718607, 4718621-4718623, 4725202-4751359
  Free inodes: 1165825-1173920

The ext2/ext3 filesystem is divided into block groups.  So in above case, we observe that for block group Group43, there is no fragmentation. The free blocks are available in a single stretch (Free blocks: 4685824-4718591). However, the block group Group44, has number of small gaps in it's free blocks(Free blocks: 4718605-4718607, 4718621-4718623, 4725202-4751359).

How to find filesystem block size in linux?

To find the size of a block in Linux filesystem, we can try the following
The commands dumpe2fs, tune2fs and blockdev, give the block size of a file in bytes.

# dumpe2fs /dev/sda2 | grep "Block size"
dumpe2fs 1.41.12 (17-May-2010)
Block size:               4096


# tune2fs -l /dev/sda2 | grep -i "Block Size"
Block size:               4096


# blockdev --getbsz /dev/sda2
4096


Create a non-empty file and check it's size

# echo xyx > abc

# du -h abc
4.0K    abc

Saturday, January 26, 2013

How to find the process causing IO wait?

To identify if there is any IO wait happening in the system, run the top command

# top


Cpu(s):  0.2%us,  0.5%sy,  0.0%ni, 98.8%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st

The %wa tells if there is an IO wait.

How to identify the process causing high IO wait?

1) Using iotop command - iotop watches I/O usage information output by the Linux kernel (requires 2.6.20 or later) and displays a table of current I/O usage by processes or threads on the system.

# iotop


Total DISK READ: 0.00 B/s | Total DISK WRITE: 99.97 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  361 be/3 root        0.00 B/s    7.69 K/s  0.00 %  4.20 % [jbd2/sda2-8]
 6673 be/4 root        0.00 B/s  315.30 K/s  0.00 %  0.00 % wget -c http://mirror.steadfast.net/centos/6.3/isos/i386/CentOS-6.3-i386-bin-DVD1.iso

In case if iotop command is not available, one can try dstat and pidstat commands

2) Using dstat command - dstat is a versatile replacement for vmstat, iostat and ifstat. dstat overcomes some of the limitations and adds some extra features.

#  dstat -ta --top-bio

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----most-expensive----
  date/time   |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |  block i/o process
27-01 01:10:09|  0   0  99   0   0   0|  15k   27k|   0     0 |   0     0 |  81   124 |bash         11B   17k
27-01 01:10:10|  1   1  99   0   0   0|   0     0 | 187k  188k|   0     0 | 307   439 |wget          0   168k
27-01 01:10:11|  1   1  97   1   0   0|   0    12k| 179k  179k|   0     0 | 289   422 |wget          0   176k
27-01 01:10:12|  1   1  98   0   0   0|   0     0 | 195k  196k|   0     0 | 330   445 |wget          0   180k
27-01 01:10:13|  1   1  98   0   0   0|   0     0 | 177k  178k|   0     0 | 274   383 |wget          0   164k
27-01 01:10:14|  1   1  98   0   0   0|   0     0 | 189k  190k|   0     0 | 302   412 |wget          0   172k
27-01 01:10:15|  1   1  98   0   0   0|   0     0 | 189k  190k|   0     0 | 307   440 |wget          0   188k
27-01 01:10:16|  1   1  98   1   0   0|   0    12k| 188k  188k|   0     0 | 291   427 |wget          0   168k



3) Using pidstat command - pidstat command is provided by sysstat package


# pidstat -d 2
Linux 2.6.32-042stab068.8 (dhcppc5)     01/27/2013      _x86_64_        (2 CPU)

12:39:02 AM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
12:39:04 AM       361      0.00      9.90      0.00  jbd2/sda2-8
12:39:04 AM       563      0.00      1.98      0.00  flush-8:0
12:39:04 AM      1466      0.00      3.96      0.00  java
12:39:04 AM      6519      0.00    178.22      0.00  wget

So from the above commands we understand that the processes  jbd2/sda2-8 and wget are causing IO wait.

Disks, Tuning Sequential Disk Acess, IO Scheduling


Disks are electro-mechanical devices.

Disks are managed by disk controllers. The disk controllers are connected to the processor(CPU) through bus. Most PC-based systems use PCI (Peripheral Component Interconnect) bus to connect peripheral devices such as  hard disk and sound card to the processor. Technically there are other buses as well. For example, the Universal Serial Bus (USB) is a way of connecting things like cameras, scanners and printers to your computer. It uses a thin wire to connect to the devices, and many devices can share that wire simultaneously. Firewire is another bus, used today mostly for video cameras and external hard drives.

lspci is a command on Unix-like operating systems that prints detailed information about all PCI buses and devices in the system.  It is based on a common portable library "libpci" which offers access to the PCI configuration space on a variety of operating systems.

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB Controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
00:04.0 SCSI storage controller: Qumranet, Inc. Virtio block device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
00:06.0 SCSI storage controller: Qumranet, Inc. Virtio block device
00:07.0 RAM memory: Qumranet, Inc. Virtio memory balloon

lsusb, is a similar command for USB buses and devices. To make use of all the features of this program, you need to have a Linux kernel which supports the /proc/bus/usb interface (e.g., Linux kernel 2.3.15 or newer)

# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 093a:2500 Pixart Imaging, Inc. USB Optical Mouse


Hard Disk Types


For the computer to interface with peripheral device such as hard disk, we have the following hard disk interfaces(or physical block device interfaces) standards
  • IDE(ATA)
  • SATA
  • SCSI
  • SAS
  • USB
  • iSCI+GigE
  • Fibre Channel
Why serial devices are replacing the parallel devices?

Both SCSI and IDE(ATA), use parallel interface.  

A parallel interface is a channel capable of transferring date in parallel mode — that is transmitting multiple bits simultaneously. However, the parallel technologies were inefficient in handling more bandwidth - Why? Though faster the bits are sent the better, however the timing of signals must be the same. This becomes more difficult with faster and longer connections in case of parallel technologies. So serial technologies, which send bits one after another, are better for high bandwidth usage. A serial bus design is much simpler, sending 1 bit at a time over a single wire at much higher rates than parallel. By combining multiple serial data paths, even faster speeds can be realized that dramatically exceed the capabilities of traditional parallel buses.

So now the serial devices are replacing the parallel devices and now we have devices such as

  • SAS - Serial Attached SCSI
  • SATA - Serial ATA
in common use.

IO Subsystem



The I/O subsystem is a series of processes responsible for moving blocks of data between disk and memory.

In general, each task performed by either kernel or user is one of the following


  • Read - Reading a block of data from disk to memory
  • Write - Writing a block of data from memory to disk


Read or write requests are transformed into block device requests that go into a queue. The I/O subsystem then batches similar requests that come within a specific time window and processes them all at once. What kind of block device requests get batched together? They are


  1. They are the same type of operation (read or write).
  2. They belong to the same block device (i.e. Read from the same block device, or are written to the same block device.
  3. Each block device has a set maximum number of sectors allowed per request. The block device request should not exceed this limit in order for the merge to occur.
  4. The block device requests to be merged immediately follow or precede each other.

Which has more priority - read or write operation?

Read requests are crucial to system performance because a process cannot commence unless its read request is serviced. This latency directly affects a user's perception of how fast a process takes to finish. 

Write requests, are serviced by batch by pdflush kernel threads. Since write requests do not block processes (unlike read requests), they are usually given less priority than read requests.

Tuning sequential read access


Read/Write requests can be either sequential or random. The speed of sequential requests is most directly affected by the transfer speed of a disk drive. Random requests, on the other hand, are most directly affected by disk drive seek time.

Sequential read requests can take advantage of read-aheads.

While reading files, kernel tries to take advantage of sequential disk access. Read-ahead is based on the assumption that an application reading in block of data, block A , most likely will also read blocks of data adjacent to it, block B, C, D and so on. So kernel will read the blocks B, C, D ahead of the application and cache those pages in memory. By doing this, 

  1. the kernel is able to serve application's request for data more quickly. 
  2. reduce the load on the disk controller

This results in improved response time.

But, read-ahead will not make sense if the application has to read random blocks of data or if the application has to re-read the same block of data. However, the read-ahead algorithm is designed to turn itself off if it detects patterns as said above.

The read-ahead window controls how much data the kernel will prefetch when performing file I/O.  From 2.6 kernel, the read-ahead is managed by two internally calculated values : current-window and ahead-window.  While an application is reading pages from current window, the kernel will do I/O(read from disk) and store the data in ahead window. Once the application finishes reading the current window, the ahead window will now become the current window. So now, if the application reads from the current window, then the size of the new read ahead window  will be increased by two pages. But if the application does not read from the current window, then read ahead window will be gradually shrunk.

By default, how much data is read ahead by Linux kernel from a block device?

To know how much data is read-ahead from a block device by kernel, there are two ways to do it

1) # cat /sys/block/sda/queue/read_ahead_kb
128

2) The blockdev command reports read-ahead in sectors, where each sector is always 512 bytes in 2.6 kernel

# blockdev --getra /dev/sda
256

So there are 256 sectors in the read-ahead. So the size is 256*512 = 131072 bytes. It converts to 131072/1024 = 128 kb.

# blockdev --report /dev/sda
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw   256   512  4096          0    160041885696   /dev/sda

How to change the read-ahead value?

Say, I want to change the "read-ahead" value to 2 MB from the default 128 KB. Since, blockdev command reports in sectors

512 bytes = 1 sector
2 MB(2*1024*1024 bytes) = 4096 sectors

# blockdev --setra 4096 /dev/sda

Check if the read-ahead value is changed by running the following commands

# blockdev --getra /dev/sda
4096

# cat /sys/block/sda/queue/read_ahead_kb
2048

# blockdev --report /dev/sda
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  4096   512  4096          0    160041885696   /dev/sda

To make it permanent upon system reboot, just add this command entry in /etc/rc.local

This also equivalent to setting /sys/block/sda/queue/read_ahead_kb except that the unit is different. The unit is #sectors for blockdev and kilobyte for read_ahead_kb

But here is a catch. The setting the readahead to a value larger than max_sectors_kb (/sys/block/sda/queue/max_sectors_kb) has no effect. The minimum value of both is taken.

In our system,

# /sys/block/sda/queue/max_sectors_kb
512

I/O Scheduling Algorithms


When scheduling I/O requests, the kernel needs to balance between the following goals :
  1. To keep disk access pattern as sequential as possible
  2. Kernel must ensure that all processes must receive IO in a timely fashion to avoid IO starvation
To manage IO scheduling requests, there are various IO scheduler algorithms , also called as elevators, to manage different workloads. Scheduler algorithms are sometimes called “elevators” because they operate in the same manner that real-life building elevators do. The algorithms used to operate real-life building elevators make sure that it services requests per floor efficiently. To be efficient, the elevator does not travel to each floor depending on which one issued a request to go up or down first. Instead, it moves in one direction at a time, taking as many requests as it can until it reaches the highest or lowest floor, then does the same in the opposite direction.


Choosing the best suited I/O elevator not only depends on the workload, but on the hardware, too. Single ATA disk systems, SSDs, RAID arrays, or network storage systems, for example, each require different tuning strategies.

The default I/O scheduler is determined as a kernel compile option. However, the IO scheduler can be changed on the fly per block device. This makes it possible to set different algorithms for e.g. the device hosting the system partition and the device hosting a database.

What are the different IO scheduling algorithms?

  • deadline
  • anticipatory
  • noop
  • cfq (Completey Fair Queuing)

By default the CFQ (Completely Fair Queuing) scheduler is used.

[root@dhcppc5 ~]#  grep -i "cfq" /boot/config-*
/boot/config-2.6.32-042stab068.8:CONFIG_IOSCHED_CFQ=y
/boot/config-2.6.32-042stab068.8:CONFIG_CFQ_GROUP_IOSCHED=y
/boot/config-2.6.32-042stab068.8:CONFIG_DEFAULT_CFQ=y
/boot/config-2.6.32-042stab068.8:CONFIG_DEFAULT_IOSCHED="cfq"

[root@dhcppc5 ~]# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

The algorithm in square brackets is the  default IO scheduler.

To change the elevator for a specific device in the running system, run the following command:

echo <SCHEDULER> > /sys/block/<DEVICE>/queue/scheduler

Eg:
echo deadline > /sys/block/sda/queue/scheduler
echo anticipatory > /sys/block/sda/queue/scheduler
echo noop > /sys/block/sda/queue/scheduler
echo cfq > /sys/block/sda/queue/scheduler

Deadline Scheduler

DEADLINE is a latency-oriented I/O scheduler. Each I/O request has got a deadline assigned. Usually, requests are stored in queues (read and write) sorted by sector numbers. The DEADLINE algorithm maintains two additional queues (read and write) where the requests are sorted by deadline. As long as no request has timed out, the “sector” queue is used. If timeouts occur, requests from the “deadline” queue are served until there are no more expired requests. Generally, the algorithm prefers reads over writes.

Anticipatory Scheduler

After servicing an IO request, the Anticipatory IO scheduler will wait for a short amount of time to see if there is any request for a disk block near to the block of data that was recently accessed.

Noop Scheduler

The no-op scheduler just passes down the IO requests to the disk - just queues them.

CFQ Scheduler

CFQ is a fairness oriented scheduler. The IO bandwidth is divided equally among all processes that are doing IO.  The algorithm assigns each thread a time slice in which it is allowed to submit I/O to disk. This way each thread gets a fair share of I/O throughput. It also allows assigning tasks I/O priorities which are taken into account during scheduling decisions.

simfs : openvz container filesystem

OpenVZ guests get a filesystem called "simfs" for the root filesystem.

simfs is a proxy-filesystem. simfs is not an actual filesystem; it’s a map to a directory on the host (by default /vz/private/). This file system allows to isolate a particular CT from other CTs.

The /proc/mounts file in the guest VM looks like this


[root@centos32 /]# cat /proc/mounts
/dev/simfs / simfs rw,relatime 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
none /dev devtmpfs rw,relatime,mode=755 0 0
none /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
none /dev/shm tmpfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0

The df command displays the mounted partition as follows


[root@centos32 /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/simfs             10G  1.2G  8.9G  12% /
none                  128M  4.0K  128M   1% /dev
none                  128M     0  128M   0% /dev/shm

Can we run fsck on the simfs filesystem?

No. fsck can be run only on file systems on block devices (such as /dev/sda for example) and we cannot run fsck on proxy file system such as simfs.

Monday, January 14, 2013

Valgrind - debugging and profiling tool


Valgrind is a set of tools to do code profiling and memory debugging so that memory leaks can be detected in Linux. In addition it can also serve as a framework for building new debugging tools.

Valgrind runs on the following architectures

  • i386
  • x86_64 (AMD-64)
  • ppc
  • ppc64System z

Let us see how to use valgrind


The main advantage of Valgrind is that it works with existing compiled executables. We do not need to recompile or modify programs to make use of valgrind.

Valgrind usage is as follows

valgrind <valgrind_options> <your-prog> <your-program-options>

valgrind consists of many tools and one of the important configuration option is --tool. This option tells valgrind which tool to run. If the --tool option is omitted, then the tool "memcheck" is chosen by default.

Eg: 

valgrind --tool=memcheck find . -mtime +10 -type f

If there us a memory leak, then the number of allocs and the number of frees will differ. So for a detailed analysis, we can rerun the program with "--leak-check=yes" option

valgrind --tool=memcheck --leak-check=yes <program>

Here is a list of standard valgrind tools

  • memcheck - Detects memory errors. It helps you tune your programs to behave correctly.
  • cachegrind - Profiles cache prediction. It helps you tune your programs to run faster.
  • callgrind - Works in a similar way to cachegrind but also gathers additional cache-profiling information.
  • exp-drd - Detects thread errors. It helps you tune your multi-threaded programs to behave correctly.
  • helgrind - Another thread error detector. Similar to exp-drd but uses different techniques for problem analysis.
  • massif - A heap profiler. Heap is an area of memory used for dynamic memory allocation. This tool helps you tune your program to use less memory.
  • lackey - An example tool showing instrumentation basics

Using valgrind, we can

  • Find invalid Pointer Use
  • Detect the use Of Uninitialized Variables

Specifying options for valgrind tool in a file

The options for valgrind can be specified in the file .valgrindrc, placed in the home directory of the user who runs valgrind command.

For example, if we want memcheck to always write profile data to the /tmp/memcheck_PID.log, add the following line to the .valgrindrc file in our home directory: 

--memcheck:memcheck-out-file=/tmp/memcheck_%p.log

How valgrind works?

valgrind needs a real executable (machine code) as an argument. valgrind takes control of the executable before it starts. The executable's code is redirected to the selected valgrind tool, and the tool adds its own code to handle its debugging. Then the code is handed back to the valgrind core and the execution continues.

For example, memcheck adds its code, which checks every memory access. As a consequence, the program runs much slower than in the native execution environment.

valgrind not only checks the code of the program, but also all libraries related to the code.


ltrace and strace



  • ltrace is a tracing tool used for tracing the library function calls made by a running process. 
  • strace is a tracing tool used for tracing the  system calls made by a running process and signals received by the process 

Note : While monitoring a running process with these tracing tools, the performance is the process is greatly reduced. So these tracing tools shall be used only when we need to collect data.

Before going further, let us what is library function call and system function call.

Library and System function calls


Basically functions are divided into two categories

  1. Library function calls 
  2. System function calls
The functions which are part of the programming language library are known as library functions. Say, for instance, in C language use of standard C library string manipulation functions like strcmp(), strlen() are examples of library function calls.

The system function calls are part of the OS and are executed in the system kernel. The functions which change the execution mode of the program from user mode to kernel mode of an OS are known as system function calls. These function calls are entry point into the kernel  and therefore NOT linked into the program or code. These are not portable calls.

Because the system calls are part of the OS, the program has to make a context switch to the kernel, when a system call is made. The time used for executing the system call function is assigned to the OS and not the user program.

Library functions can be debugged easily using a debugger while System calls cannot be debugged as they are executed by the kernel.

Tracing system calls with strace

strace can either run a new command and trace its system calls, or you can attach strace to an already running command. 

Each line of the strace command's output contains the system call name, followed by its arguments in parenthesis and its return value.

$ strace pwd
execve("/bin/pwd", ["pwd"], [/* 27 vars */]) = 0
brk(0)                                  = 0x11cc000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74274c9000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=91089, ...}) = 0
mmap(NULL, 91089, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f74274b2000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355!M<\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1922112, ...}) = 0
mmap(0x3c4d200000, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c4d200000
mprotect(0x3c4d389000, 2097152, PROT_NONE) = 0
mmap(0x3c4d589000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x3c4d589000
mmap(0x3c4d58e000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c4d58e000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74274b1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74274b0000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74274af000
arch_prctl(ARCH_SET_FS, 0x7f74274b0700) = 0
mprotect(0x3c4d589000, 16384, PROT_READ) = 0
mprotect(0x3c4d01f000, 4096, PROT_READ) = 0
munmap(0x7f74274b2000, 91089)           = 0
brk(0)                                  = 0x11cc000
brk(0x11ed000)                          = 0x11ed000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158576, ...}) = 0
mmap(NULL, 99158576, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f742161e000
close(3)                                = 0
getcwd("/home/foo", 4096)              = 11
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f74274c8000
write(1, "/home/foo\n", 11/home/foo
)            = 11
close(1)                                = 0
munmap(0x7f74274c8000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?

Attaching strace to a already running process

# strace -p 3126
Process 3126 attached - interrupt to quit
accept4(4, ^C <unfinished ...>
Process 3126 detached

# strace -p 1993
Process 1993 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 553110}) = 0 (Timeout)
wait4(-1, 0x7fff99e4b67c, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
wait4(-1, 0x7fff99e4b67c, WNOHANG|WSTOPPED, NULL) = 0

The -e option understands several sub-options and arguments.

$ strace -e poll,select,connect,recvfrom,sendto telnet localhost 80
sendto(3, "\24\0\0\0\26\0\1\3\10\f\363P\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
Trying 127.0.0.1...
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
Connected to localhost.
Escape character is '^]'.
select(4, [0 3], [], [3], {0, 0})       = 0 (Timeout)
select(4, [0 3], [], [3],

To trace network related system calls alone

$ strace -e trace=network nc -v -z google.com 80
socket(PF_NETLINK, SOCK_RAW, 0)         = 3
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=6792, groups=00000000}, [12]) = 0
sendto(3, "\24\0\0\0\26\0\1\3\217\r\363P\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\217\r\363P\210\32\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\217\r\363P\210\32\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 192
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\217\r\363P\210\32\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
sendto(3, "\323\211\1\0\0\1\0\0\0\0\0\0\6google\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, NULL, 0) = 28
sendto(3, "\251.\1\0\0\1\0\0\0\0\0\0\6google\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, NULL, 0) = 28
recvfrom(3, "\323\211\201\200\0\1\0\v\0\0\0\0\6google\3com\0\0\1\0\1\300\f\0\1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 204
recvfrom(3, "\251.\201\200\0\1\0\1\0\0\0\0\6google\3com\0\0\34\0\1\300\f\0\34"..., 1844, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 56
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.199")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(45205), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.201")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(32846), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.196")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(34565), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.197")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(39205), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.206")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(58868), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.194")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(60579), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.192")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(58725), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.200")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(55188), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.198")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(43411), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.195")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(53734), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.193")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(57004), sin_addr=inet_addr("192.168.1.7")}, [16]) = 0
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2404:6800:4007:803::1000", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("74.125.236.199")}, 16) = -1 EINPROGRESS (Operation now in progress)
getsockopt(3, SOL_SOCKET, SO_ERROR, [5573484639458164736], [4]) = 0
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
Connection to google.com 80 port [tcp/http] succeeded!

To trace all child processes of a process, use -f

# strace -f httpd

The -c calculates the time the kernel spent on each system call:

# strace -c pwd
/root
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  -nan    0.000000           0         1           read
  -nan    0.000000           0         1           write
  -nan    0.000000           0         3           open
  -nan    0.000000           0         5           close
  -nan    0.000000           0         4           fstat
  -nan    0.000000           0        10           mmap
  -nan    0.000000           0         3           mprotect
  -nan    0.000000           0         2           munmap
  -nan    0.000000           0         3           brk
  -nan    0.000000           0         1         1 access
  -nan    0.000000           0         1           execve
  -nan    0.000000           0         1           getcwd
  -nan    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                    36         1 total


Tracing Library Calls with ltrace


By default, ltrace uses /etc/ltrace.conf or ~/.ltrace.conf configuration files. However, an alternative configuration file can be specified with  -F option.

# ltrace pwd
(0, 0, 0, 0x7f36de024000, 88)                                                                             = 0x3c4d021160
__libc_start_main(0x401f60, 1, 0x7fff5fbca8a8, 0x404890, 0x404880 <unfinished ...>
getenv("POSIXLY_CORRECT")                                                                                 = NULL
strrchr("pwd", '/')                                                                                       = NULL
setlocale(6, "")                                                                                          = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")                                                          = "/usr/share/locale"
textdomain("coreutils")                                                                                   = "coreutils"
__cxa_atexit(0x4029b0, 0, 0, 0x736c6974756572, 0x3c4d58eee8)                                              = 0
getopt_long(1, 0x7fff5fbca8a8, "LP", 0x606040, NULL)                                                      = -1
getcwd(NULL, 0)                                                                                           = ""
puts("/root"/root
)                                                                                             = 6
free(0x1679030)                                                                                           = <void>
exit(0 <unfinished ...>
__fpending(0x3c4d58d780, 0, 0x3c4d58e330, 0x3c4d58e330, 0xffffffff)                                       = 0
fclose(0x3c4d58d780)                                                                                      = 0
__fpending(0x3c4d58d860, 0, 0x3c4d58ee10, 0, 0x7f36de023700)                                              = 0
fclose(0x3c4d58d860)                                                                                      = 0
+++ exited (status 0) +++

ltrace can also used to trace system calls with -S option. The system calls shall be specified with "SYS" in the output

# ltrace -S pwd
SYS_brk(NULL)                                                                                             = 0x1d5b000
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                                                                      = 0x7f5ed7259000
SYS_access(0x3c4ce1cb00, 4, 0x3c4ce00158, 0, 0)                                                           = -2
SYS_open("/etc/ld.so.cache", 0, 01)                                                                       = 4
SYS_fstat(4, 0x7fffa4182e10, 0x7fffa4182e10, 0, 0xfefefefefefefeff)                                       = 0
SYS_mmap(0, 91089, 1, 2, 4)                                                                               = 0x7f5ed7242000
SYS_close(4)                                                                                              = 0
SYS_open("/lib64/libc.so.6", 0, 00)                                                                       = 4
SYS_read(4, "\177ELF\002\001\001\003", 832)                                                               = 832
SYS_fstat(4, 0x7fffa4182e60, 0x7fffa4182e60, 4, 0x3c4d021188)                                             = 0
SYS_mmap(0x3c4d200000, 0x3928a8, 5, 2050, 4)                                                              = 0x3c4d200000
SYS_mprotect(0x3c4d389000, 0x200000, 0, 1, 4)                                                             = 0
SYS_mmap(0x3c4d589000, 20480, 3, 2066, 4)                                                                 = 0x3c4d589000
SYS_mmap(0x3c4d58e000, 18600, 3, 50, 0xffffffff)                                                          = 0x3c4d58e000
SYS_close(4)                                                                                              = 0
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                                                                      = 0x7f5ed7241000
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                                                                      = 0x7f5ed7240000
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                                                                      = 0x7f5ed723f000
SYS_arch_prctl(4098, 0x7f5ed7240700, 0x7f5ed723f000, 34, 0xffffffff)                                      = 0
SYS_mprotect(0x3c4d589000, 16384, 1, 34, 32768)                                                           = 0
SYS_mprotect(0x3c4d01f000, 4096, 1, 34, 32768)                                                            = 0
(0, 0, 0, 0x7f5ed7241000, 88)                                                                             = 0x3c4d021160
SYS_munmap(0x7f5ed7242000, 91089)                                                                         = 0
__libc_start_main(0x401f60, 1, 0x7fffa4183808, 0x404890, 0x404880 <unfinished ...>
getenv("POSIXLY_CORRECT")                                                                                 = NULL
strrchr("pwd", '/')                                                                                       = NULL
setlocale(6, "" <unfinished ...>
SYS_brk(NULL)                                                                                             = 0x1d5b000
SYS_brk(0x1d7c000)                                                                                        = 0x1d7c000
SYS_open("/usr/lib/locale/locale-archive", 0, 011526160020)                                               = 4
SYS_fstat(4, 0x3c4d58e040, 0x3c4d58e040, 5, 4)                                                            = 0
SYS_mmap(0, 0x5e90a30, 1, 2, 4)                                                                           = 0x7f5ed13ae000
SYS_close(4)                                                                                              = 0
<... setlocale resumed> )                                                                                 = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")                                                          = "/usr/share/locale"
textdomain("coreutils")                                                                                   = "coreutils"
__cxa_atexit(0x4029b0, 0, 0, 0x736c6974756572, 0x3c4d58eee8)                                              = 0
getopt_long(1, 0x7fffa4183808, "LP", 0x606040, NULL)                                                      = -1
getcwd(NULL, 0 <unfinished ...>
SYS_getcwd("/root", 4096)                                                                                 = 6
<... getcwd resumed> )                                                                                    = ""
puts("/root" <unfinished ...>
SYS_fstat(1, 0x7fffa41835d0, 0x7fffa41835d0, 4096, 0x3c4d58d780)                                          = 0
SYS_mmap(0, 4096, 3, 34, 0xffffffff)                                                                      = 0x7f5ed7258000
SYS_write(1, "/root\n", 6/root
)                                                                                = 6
<... puts resumed> )                                                                                      = 6
free(0x1d5c030)                                                                                           = <void>
exit(0 <unfinished ...>
__fpending(0x3c4d58d780, 0, 0x3c4d58e330, 0x3c4d58e330, 0xffffffff)                                       = 0
fclose(0x3c4d58d780 <unfinished ...>
SYS_close(1)                                                                                              = 0
SYS_munmap(0x7f5ed7258000, 4096)                                                                          = 0
<... fclose resumed> )                                                                                    = 0
__fpending(0x3c4d58d860, 0, 0x3c4d58ee10, 0, 0x7f5ed7240700)                                              = 0
fclose(0x3c4d58d860 <unfinished ...>
SYS_close(2)                                                                                              = 0
<... fclose resumed> )                                                                                    = 0
SYS_exit_group(0 <no return ...>
+++ exited (status 0) +++

Sunday, January 13, 2013

What is fat pipe and long pipe in network?

A high-bandwidth connection is referred to as fat pipe.
A high-latency connection is sometimes called a long pipe.

Doubling bandwidth or doubling latency will double the capacity of the connection.

Saturday, January 12, 2013

Interprocess Communication in Operating System


What is a process?

  • A process is in main memory, ready for execution(waiting for CPU).
  • A process is not a program, but each program may have multiple processes, running concurrently or one after another. 
  • Same program can be run as multiple processes concurrently(at the same time).
A process has

  • A body of code to execute
  • A reserved piece of memory
  • It's own set of file descriptors
  • A unique process id

How do processes communicate with each other?


  1. Shared Memory (common areas in ram - main memory) - Shared memory allows processes to share parts of their virtual address space. Shared memory regions allow processes to communicate by reading and writing  to and from same region of memory. 
  2. Signals (interupts like SIGINT, SIGKILL,..)
  3. Message Queues (data + server process) - Messages allow processes to  send formatted data streams to arbitrary processes. Thus messages allow  processes to  cooperatively function by exchanging messages.
  4. Semaphores (mainly for OS resource control)- Allow processes to synchronize execution using flags - This is used for things like OS telling if one has read access, write access. Semaphores allow two or more processes to coordinate access to shared resources and other behaviours.
  5. Pipes(The '|' character from the command line)
  6. FIFO (A named Pipe, made with mknod)
  7. Sockets (point-to-point, 2 way communication)

Shared Memory


Shared memory is about two processes sharing a common segment of memory that they can both read to and write from to communicate with one another. Just a chunk of memory.

Because it’s just memory, shared memory is the fastest IPC mechanism of them all. 

/dev/shm is an implementation of shared memory.  It is an efficient means of passing data between programs. One program will create a memory portion, which other processes (if permitted) can access. This will result into speeding up things on Linux. Recent Linux distributions based on the 2.6 kernel have started to offer /dev/shm as shared memory in the form of a RAM disk, more specifically as a world-writable directory (a directory in which every user of the system can create files) that is stored in memory. Both the RedHat and Debian based distributions include it by default. Support for this type of RAM disk is completely optional within the kernel configuration file.

The main problem with shared memory is the race-condition. One very simple way of solving is semaphores.


Semaphores


They allow different processes to synchronize their access to certain resources.

In Computer Science, the most common and simplest kind of semaphore is called a binary semaphore because they have two states locked or unlocked. These act much like traffic lights.

When a process wants exclusive access to a resource, shared memory being an example, they attempt to lock the semaphore associated with that resource. If the semaphore they are attempting to lock is already locked, the caller is suspended, otherwise they are granted to lock. When you’ve finished doing whatever you wanted to do, you unlock the resource and any processes that have attempted to lock that semaphore in the meantime are woken up again to attempt the lock again. This way only one process can have access to the resource at once.

Pipes

  • Pipes are one way stream to send data between two programs. Mainly, pipes can be defined as one way byte streams between related processes(parent/child interaction).
  • The system assures that the order of output is same as the order of input(FIFO) and that no data is lost in communications.
  • Programmed with the C function pipe() from <sys/unistd.h> or at the command line with "|"
  • Modern implementations use sockets as the basis for pipes.
Note : stdin  is more like a pipe

FIFO


It is a Named Pipe which works on /dev devices. Using FIFO we can access /dev/urandom, cdrom device when mounted. Using mknod we can create named pipes.

Socket


Called BSD sockets, since the idea of sockets was developed by BSD.


  • Allow point-to-point, two way communication between processes.
  • They serve as endpoints in communication. The end points have IP addresess or filenames.
  • All sockets have a type(selected when created) and options(chosen on fly - say like how long a socket should be open till they get timed out)
  • They exist in "domains" which dictate how they are addressed and their communication protocols.
  • There are 23 different socket domains. The most commonly used are the UNIX and INET domains.
  • Created without names and must be bound to one.
  • Communication between sockets is not symmetrical, but follows a client/server model.


There are 4 major types of BSD sockets, which dictates how data is sent and received and which protocols can be used to govern this exchange of information.

BSD socket types

full-duplex means data can be send back and forth at same time.


  1. Stream - full duplex, sequential stream of data.
  2. Datagram - full-duplex, non-sequential data transmission. Used for intermittent transmission.
  3. Sequential Packet - full duplex, reliable, sequential connection for datagrams of fixed size.
  4. Raw - An unregualted socket for constructing new  socket types and protocols.


Sockets will only connect with others of the same type and protocol.

Let us see about two important socket domains - Unix socket and Inet socket

Unix  sockets


  1. used for local communication between processes
  2. They are bound to pathnames, which do not already exist in the filesystem (/tmp directory)


How to find them?

netstat -an --unix

Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING     14270  /tmp/orbit-gdm/linc-8db-0-41bf49d5a8a9b
unix  2      [ ACC ]     STREAM     LISTENING     14041  @/tmp/gdm-greeter-NhVFwyLS
unix  2      [ ACC ]     STREAM     LISTENING     14342  /tmp/orbit-gdm/linc-8d4-0-28534edab128c
unix  2      [ ACC ]     STREAM     LISTENING     12082  /var/run/acpid.socket
unix  2      [ ACC ]     STREAM     LISTENING     7945   @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     14477  /tmp/orbit-gdm/linc-8ee-0-1aab978fd6906
unix  2      [ ACC ]     STREAM     LISTENING     14481  /tmp/orbit-gdm/linc-8f0-0-52db2ebee2075
unix  2      [ ACC ]     STREAM     LISTENING     14485  /tmp/orbit-gdm/linc-8ea-0-6c2623f0e22da
unix  2      [ ACC ]     STREAM     LISTENING     10925  /var/run/rpcbind.sock
unix  2      [ ACC ]     STREAM     LISTENING     14602  /tmp/orbit-gdm/linc-8fa-0-4933106a2345e
unix  2      [ ACC ]     STREAM     LISTENING     13129  public/cleanup
unix  2      [ ACC ]     STREAM     LISTENING     14184  @/tmp/gdm-session-sYKCyvao
unix  2      [ ACC ]     STREAM     LISTENING     13969  @/tmp/.X11-unix/X0
unix  2      [ ACC ]     STREAM     LISTENING     14162  @/tmp/dbus-2sHX0FJO2L
unix  2      [ ACC ]     STREAM     LISTENING     11167  /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     11387  /var/run/avahi-daemon/socket
unix  2      [ ACC ]     STREAM     LISTENING     14803  /tmp/orbit-gdm/linc-8fb-0-6c6f4de337f7f
unix  2      [ ACC ]     STREAM     LISTENING     14831  /tmp/orbit-gdm/linc-8fc-0-4e60774e3fdc5
unix  2      [ ]         DGRAM                    8114   @/org/kernel/udev/udevd
unix  2      [ ACC ]     STREAM     LISTENING     15045  /var/lib/gdm/.pulse/2d622416adabbaa7f7ffa3c10000000f-runtime/native
unix  23     [ ]         DGRAM                    10630  /dev/log
unix  2      [ ]         DGRAM                    11130  /var/run/fcm/fcm_clif
unix  2      [ ]         DGRAM                    12269  @/org/freedesktop/hal/udev_event
unix  2      [ ACC ]     STREAM     LISTENING     11886  /var/run/cups/cups.sock
unix  2      [ ACC ]     STREAM     LISTENING     12644  /var/run/pcscd.comm
unix  2      [ ACC ]     STREAM     LISTENING     14205  @/tmp/.ICE-unix/2260
unix  2      [ ACC ]     STREAM     LISTENING     13136  private/tlsmgr
unix  2      [ ]         DGRAM                    11128  @0001b
unix  2      [ ACC ]     STREAM     LISTENING     13140  private/rewrite
unix  2      [ ACC ]     STREAM     LISTENING     13144  private/bounce
unix  2      [ ACC ]     STREAM     LISTENING     13148  private/defer
unix  2      [ ACC ]     STREAM     LISTENING     13152  private/trace
unix  2      [ ACC ]     STREAM     LISTENING     13156  private/verify
unix  2      [ ACC ]     STREAM     LISTENING     13160  public/flush
unix  2      [ ACC ]     STREAM     LISTENING     13164  private/proxymap
unix  2      [ ACC ]     STREAM     LISTENING     13168  private/proxywrite
unix  2      [ ACC ]     STREAM     LISTENING     13172


INET domain sockets


  1. used for remote connection between processes
  2. They are bound to an IP address(dotted quads) and a port number(65,356).


netstat -an --inet

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 127.0.0.1:199               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:52053               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN
tcp        0     52 192.168.1.7:22              192.168.1.2:52471           ESTABLISHED
udp        0      0 0.0.0.0:5353                0.0.0.0:*
udp        0      0 0.0.0.0:111                 0.0.0.0:*
udp        0      0 0.0.0.0:631                 0.0.0.0:*
udp        0      0 0.0.0.0:768                 0.0.0.0:*
udp        0      0 0.0.0.0:901                 0.0.0.0:*
udp        0      0 0.0.0.0:51085               0.0.0.0:*
udp        0      0 0.0.0.0:43032               0.0.0.0:*
udp        0      0 0.0.0.0:161                 0.0.0.0:*
udp        0      0 0.0.0.0:68                  0.0.0.0:*

Socket Protocols

There are two major protocols, one for stream sockets(TCP) and the other for datagram sockets(UDP)