Road to CCIE R&S: 2014

Wednesday, June 25, 2014

Internetworking with TCP/IP Notes (Chapter 12)

Chapter 12 Routing Architecture: Cores, Peers and Algorithms

12.1 Original Internet architecture and Cores
The advantage of a core routing architecture lies in autonomy: the manager of a noncore router can make changes locally. The chief disadvantage is inconsistency: an outlying site can introduce errors that make some destinations unreachable.

12.2 Automatic Route Propagation and a FIB

Routing protocols serve two important functions. First, they compute a set of shortest paths. Second, they respond to network failures or topology changes by continually updating the routing information.

Although a routing protocol computes shortest paths, the routing protocol software does not store information directly in the router’s forwarding table. Instead, routing software creates aForwarding Information Base(FIB).

When the FIB changes, routing software recomputes a forwarding table for the

router and installs the new forwarding table.

12.3 Distance-Vector (Bellman-Ford) Routing
Each router keeps a list of all known destinations in its FIB. When it boots, a router initializes its FIB to contain an entry for each directly connected network. Each entry in the FIB identifies a destination network, a next-hop router used to reach the destination, and the “distance” to the network (according to some measure of distance).

A directly-connected network is zero hops away; if a datagram must travel through N
routers to reach a destination, the destination is N hops away.
When routes change rapidly, however, the computations may not stabilize. When a route changes (i.e, a new connection appears or an old one fails), the information propagates slowly

from one router to another. Meanwhile, some routers may have incorrect routing information.
12.4 Reliability and Routing Protocols
Most routing protocols use connectionless transport — early protocols encapsulated messages directly in IP; modern routing protocols usually encapsulate in UDP.

Routing protocols use several techniques to handle reliability. First, checksums are used to handle corruption. Loss is either handled bysoft state‡ or through acknowledgements and retransmission. Sequence numbersare used to handle two problems.

First, sequence numbers allow a receiver to handle out-of-order delivery by placing incoming messages back in the correct order. Second, sequence numbers can be used to handle replay, a condition that can occur if a duplicate of a message is delayed and arrives long after newer updates have been processed.

12.5 Link-State (SPF) Routing
The primary alternative to distance-vector algorithms is a class of algorithms known aslink state, link status,orShortest Path First†(SPF). The SPF algorithm requires each participating router to compute topology information.

Each router participating in an SPF algorithm performs two tasks:
* Actively test the status of each neighboring router.
Two routers are considered neighbors if they attach to a common network.

* Periodically broadcast link-state messages of the form, “The link between me and router X is up” or “The link between me and router X is down.”

To inform all other routers, each router periodically broadcasts a message that lists the status (state) of each of its links. A status message does not specify routes — it simply
reports whether communication is possible between pairs of routers.
Whenever a link-state message arrives, software running on the router uses the information to update its map of the internet. First, it extracts the pair of routers mentioned in the message and makes sure that the local graph contains an edge between the two. Second, it uses the status reported in the message to mark the link as up or down.

Internetworking with TCP/IP Notes (Chapter 11)

Chapter 11 Reliable Stream Transport Service (TCP)

11.1 Properties of the Reliable Delivery Service
The reliable transfer service that TCP provides to applications can be characterized by five features that are discussed below:

*Stream Orientation
* Virtual Circuit Connection
* Buffered Transfer
* Unstructured Stream
* Full Duplex Communication

Stream Orientation. When two application programs use TCP to transfer large volumes of data, the data is viewed as a stream of octets. The application on the destination host receives exactly the same sequence of octets that was sent by the application on the source host.

Virtual Circuit Connection. Before data transfer starts, both the sending and receiving applications must agree to establish aTCP connection. TCP monitors data transfer; if communication fails for any reason, the application programs are informed.

Buffered Transfer. To make transfer more efficient and to minimize network traffic, implementations usually collect enough data from a stream to fill a reasonably large datagram before transmitting it across an internet.

Unstructured Stream. The TCP/IP stream service does not provide structured data
streams. Application programs using the stream service must understand stream content and agree on a stream format before they initiate a connection.

Full Duplex Communication. Connections provided by the TCP/IP stream service allow concurrent transfer in both directions. The advantage of a full duplex connection is that the underlying protocol software can send control information for one stream back to the source in datagrams carrying data in the opposite direction.

11.2 Reliability: Acknowledgements and Retransmission

TCP rely on a fundamental technique known as positive acknowledgement with retransmission (PAR) to ensure the reliability. The technique requires a recipient to communicate with the source, sending back an acknowledgement(ACK) each time data arrives successfully. When it sends a packet, the sending software starts a timer. If an acknowledgement arrives before the timer expires, the sender cancels the timer and prepares to send more data. If the timer expires before an acknowledgement arrives, the sender retransmits the packet.

A sender must retain a copy of a packet that has been transmitted in case the packet must be retransmitted. In practice, a sender only needs to retain the data that goes in the packet along with sufficient information to allow the sender to reconstruct the packet headers. The idea of keeping unacknowledged data is important in TCP.

Reliable protocols detect duplicate packets by assigning each packet a sequence number and requiring the receiver to remember which sequence numbers it has received. To avoid ambiguity, positive acknowledgement protocols arrange for each acknowledgement to contain the sequence number of the packet that arrived.

Illustration of timeout and retransmission when a packet is lost.

11.3 The Sliding Window Paradigm
A simple positive acknowledgement protocol wastes a substantial amount of network capacity because it must delay sending a new packet until it receives an acknowledgement for the previous packet.

The sliding window technique uses a more complex form of positive acknowledgement and retransmission. The key idea is that a sliding window allows a sender to transmit multiple packets before waiting for an acknowledgement. The protocol places a small, fixed-size window on the sequence and transmits all packets that lie inside the window.

Technically, the number of packets that can be unacknowledged at any given time is constrained by the window size, which is limited to a small, fixed number.

The window partitions the sequence of packets into three sets:
* those packets to the left of the window have been successfully transmitted, received, and acknowledged;
* those packets to the right have not yet been transmitted; and
* those packets that lie in the window are being transmitted.

The lowest numbered packet in the window is the first packet in the sequence that has not been acknowledged.

11.4 The Transmission Control Protocol
TCP is a communication protocol, not a piece of software. The protocol specifies the format of the data and acknowledgements that two computers exchange to achieve a reliable transfer, as well as the procedures the computers use to ensure that the data arrives correctly.

It specifies how TCP software distinguishes among multiple destinations on a given machine, and how communicating machines recover from errors like lost or duplicated packets. The protocol also specifies how two computers initiate a TCP connection and how they agree when it is complete.

11.5 Layering, Ports, Connections and Endpoints
TCP, which resides in the transport layer just above IP, allows multiple application programs on a given computer to communicate concurrently, and it demultiplexes incoming TCP traffic among the applications.

Like the User Datagram Protocol, TCP uses protocol port numbers to identify application programs. Also like UDP, a TCP port number is sixteen bits long. TCP ports are much more complex because a single port number does not identify an application. Instead, TCP has been designed on a connection abstraction in which the objects to be identified are TCP connections, not individual ports.

TCP uses the connection, not the protocol port, as its fundamental abstraction; connections are identified by a pair of endpoints.

TCP defines an end point to be a pair of integers (host,port), where host is the IP address for a host and portis a TCP port on that host. Because TCP identifies a connection by a pair of endpoints, a given TCP port number can be shared by multiple connections on the same
machine.

11.6 Passive and Active Opens
TCP is a connection-oriented protocol that requires both endpoints to agree to participate.

The application program on one end performs a passive open by contacting the local operating system and indicating that it will accept an incoming connection for a specific port number.

The application program on the other end can then perform an active open by requesting that a TCP connection be established.

11.7 Segments, Streams and Sequence Numbers
TCP views the data stream as a sequence of octets that it divides into segments for transmission. The TCP form of a sliding window protocol also solves the end to-end flow control problem by allowing the receiver to restrict transmission until it has sufficient buffer space to accommodate more data.

The first pointer marks the left of the sliding window, separating octets that have been sent and acknowledged from octets yet to be acknowledged.
A second pointer marks the right of the sliding window and defines the highest octet in the sequence that can be sent before more acknowledgements are received.
The third pointer marks the boundary inside the window that separates those octets that have already been sent from those octets that have not been sent.

The protocol software sends all octets in the window without delay, so the boundary inside the window usually moves from left to right quickly.

We think of the transfers as completely independent because at any time data can flow across the connection in one direction, or in both directions. Thus, TCP software on a computer maintains two windows per connection: one window slides along as the data stream is sent, while the other slides along as data is received.

11.8 Variable Window Size and Flow Control
Each acknowledgement, which specifies how many octets have been received, contains a window advertisement that specifies how many additional octets of data the receiver is prepared to accept beyond the data being acknowledged. In response to an increased window advertisement, the sender increases the size of its sliding window and proceeds to send octets that have not been acknowledged.

TCP software must not contradict previous advertisements by shrinking the window past previously acceptable positions in the octet stream. Instead, smaller advertisements accompany acknowledgements, so the window size only changes at the time it slides forward.

Having a mechanism for flow control is essential in an environment where computers of various speeds and sizes communicate through networks and routers of various speeds and capacities. To fulfill this target, there are two independent problems.

1. Protocols need to provide end-to-end flow control between the source and ultimate destination.

2. A mechanism is needed that allows intermediate systems (i.e., routers) to control a source that sends more traffic than the machine can tolerate. When intermediate machines become overloaded, the condition is called congestion

TCP uses its sliding window scheme to solve the end-to-end flow control problem. For Congestion problem, it will use Tail Drop and RED to overcome.

11.9 TCP Segment Format
The unit of transfer between the TCP software on two machines is called a segment. Segments are exchanged to establish a connection, transfer data, send acknowledgements, advertise window sizes, and close connections. Because TCP allows piggybacking, an acknowledgement traveling from computer A to computer B may travel in the same segment as data traveling from computer A to computer B, even though the acknowledgement refers to data sent from B to A

TCP header consists of at least 20 octets and may contain more if the segment carries options. The maximum is 60 octets.

Fields SOURCE PORT and DESTINATION PORT contain the TCP port numbers that identify the application programs at the ends of the connection.

SEQUENCE NUMBER field identifies the position in the sender’s octet stream of the data in the segment.

ACKNOWLEDGEMENT NUMBER field identifies the number of the octet that the source expects to receive next.
Note that the sequence number refers to the stream flowing in the same direction as the segment, while the acknowledgement number refers to the stream flowing in the opposite direction from the segment.

HLEN field contains an integer that specifies the length of the segment header measured in 32-bit multiples. It is needed because the OPTIONS field varies in length, depending on which options are included.

The 6-bit field marked RESERVED is reserved for future use TCP software uses the 6-bit field labeled CODE BITS to determine the purpose and contents of the segment. The below figure illustrate the usage.

TCP software advertises how much data it is willing to accept every time it sends a segment by specifying its buffer size in the WINDOW field. The field contains a 16-bit
unsigned integer and maximum size is 64KBytes.

Window advertisements provide an example of piggybacking because they accompany all segments, including those carrying data as well as those carrying only an acknowledgement.

11.10 Out of Band Data (urgent bit)
Although TCP is a stream-oriented protocol, it is sometimes important for the program at one end of a connection to send dataout of band, without waiting for the program at the other end of the connection to consume octets already in the stream.

To accommodate out-of-band signaling, TCP allows the sender to specify data as urgent, meaning that the receiving application should be notified of its arrival as quickly as possible, regardless of its position in the stream.

The mechanism used to mark urgent data when transmitting it in a segment consists of the URG code bit and the URGENT POINTER field in the segment header. When the URG bit is set, the URGENT POINTER field specifies the position in the segment where urgent data ends.

11.11 TCP Options
TCP header can contain zero or more options. Recall that the header length is specified in 32-bit multiples. If the options do not occupy an exact multiple of 32 bits, PADDING is added to the end of the header.

11.11.1 Maximum Segment Size Option
TCP uses amaximum segment size(MSS) option to allow a receiver to specify the maximum size segment that it is willing to receive. MSS negotiation is especially significant because it permits heterogeneous systems to communicate.

Unlike a TCP segment, a fragment cannot be acknowledged or retransmitted independently; all fragments must arrive or the entire datagram must be retransmitted.

In theory, the optimum segment size,S, occurs when the IP datagrams carrying the segments are as large as possible without requiring fragmentation anywhere along the path, from the source to the destination.

11.11.2 Windo Scaling Option
Because theWINDOW field in the TCP header is 16 bits long, the maximum size window is 64 Kbytes.

To accommodate larger window sizes, a window scaling option was created for TCP. The option consists of three octets: a type, a length, and a shift value,S. In essence, the shift value specifies a binary scaling factor to be applied to the window value. When window scaling is in effect, a receiver extracts the value from the WINDOW field,W, and shifts W left S bits to obtain the actual window size.

11.11.3 Timestap Option
The TCP timestamp option was invented to help TCP compute the delay on the underlying network.

11.12 TCP Checksum Computation
To compute the checksum, TCP software on the sending machine follows a procedure similar to UDP. Conceptually, TCP prepends a pseudo-header to the TCP segment, appends enough zero bits to make the segment a multiple of 16 bits, and computes the 16-bit checksum over the entire result.

The purpose of using a TCP pseudo-header is exactly the same as in UDP. It allows the receiver to verify that the segment has reached the correct endpoint, which includes both an IP address and a protocol port number.

11.13 Acknowledgements, Retransmission and Timeouts
The receiver always acknowledges the longest contiguous prefix of the stream that has been received correctly. Each acknowledgement specifies a sequence value one greater than the highest octet position in the contiguous prefix it received. In short, a TCP acknowledgement specifies the sequence number of the next octet that the receiver expects to receive.

This TCP acknowledgement scheme is called cumulative because it reports how much of the stream has accumulated. Cumulative acknowledgements have both advantages and disadvantages. One advantage is that acknowledgements are both easy to generate and unambiguous. A major disadvantage is that the sender does not receive information about all successful transmissions, but only about a single position in the stream that has been received. Thus the sender must decide whether to sent the particular segment or all the segments.

Every time it sends a segment, TCP starts a timer and waits for an acknowledgement. If the timer expires before data in the segment has been acknowledged, TCP assumes that the segment was lost or corrupted and retransmits it.

TCP uses an adaptive retransmission algorithm which TCP monitors the round trip time on each connection and computes reasonable values for timeouts. As the performance of a connection changes, TCP revises its timeout value (i.e., it adapts to the change).

TCP records the time at which each segment is sent and the time at which an acknowledgement arrives for the data in that segment. From the two times, TCP computes an elapsed time known as a round trip sample. Whenever it obtains a new round trip sample, TCP must adjust its notion of the average round trip time for the connection.

To accommodate the varying delays encountered in an internet environment, TCP uses an adaptive retransmission algorithm that monitors delays on each connection and adjusts its timeout parameter accordingly.

11.14 Karn's Algorithm and Timer Backoff
Karn’s algorithm: when computing the round trip estimate, ignore samples that correspond to retransmitted segments, but use a backoff strategy and retain the timeout value from a retransmitted packet for subsequent packets until a valid sample is obtained.

11.15 Respons to congestion
TCP must also react to congestionin an internet. Congestion is a condition of severe delay caused by an overload of datagrams at one or more switching points

TCP can help avoid congestion by reducing transmission rates when congestion occurs. In fact, TCP reacts quickly by reducing the transmission rate automatically whenever delays occur.

To avoid congestion, the TCP standard now recommends using two techniques: slow-start and multiplicative decrease. The two are related and can be implemented easily. We said that for each connection, TCP must remember the size of the receiver’s window. To control congestion, TCP maintains a second limit, called the congestion window size or congestion window that it uses to restrict data flow to less than the receiver’s buffer size when congestion occurs. Because TCP reduces the congestion window by half for every loss, it decreases the window exponentially if loss continues.

Multiplicative Decrease Congestion Avoidance: upon loss of a segment, reduce the congestion window by half (but never reduce the window to less than one segment). When transmitting segments that remain in the allowed window, backoff the retransmission timer exponentially.

How can TCP recover when congestion ends? TCP uses a technique named slow-start to scale up transmission. Slow-Start (Additive) Recovery: whenever starting traffic on a new
connection or increasing traffic after a period of congestion, start the congestion window at the size of a single segment and increase the congestion window by one segment each time an acknowledgement arrives.

To avoid increasing the window size too quickly and causing additional congestion, TCP adds one additional restriction. Once the congestion window reaches one half of its original size before congestion, TCP enters a congestion avoidance phase and slows down the rate of increment. During congestion avoidance, it increases the congestion window by1only if all segments in the window have been acknowledged. The overall approach is known asAdditive Increase Multiplicative Decrease(AIMD).

11.16 Fast Recovery and other response modifications
A solution known asTCP Friendly Rate Control(TFRC) was proposed. TFRC attempts to emulate TCP behavior by having a UDP receiver report datagram loss back to the sender and by having the sender use the reported loss to compute a rate at which UDP datagrams should be sent; TFRC has only been adopted for special cases.

11.17 Explicit Feedback mechanisms (SACK and ECN)

11.17.1 Selective Acknowledgement (SACK)
It allows a sender to know exactly which segments to retransmit. TCP includes two options for SACK. The first option is used when the connection is established to allow a sender to specify that SACK is permitted. The second option is used by a receiver when sending an acknowledgement to include information about specific blocks of data that were received. The information for each block includes the first sequence number in a block (called the left edge) and the sequence number immediately beyond the block (called the right edge).

11.17.2 Explicit Congestion Notification
This mechanism requires routers throughout an internet to notify TCP as congestion occurs. TCP segment passes through the internet, routers along the path use a pair of bits in the IP header to record congestion. Thus, when a segment arrives, the receiver knows whether the segment experienced congestion at any point.

ECN uses two bits in the IP header to allow routers to record congestion, and uses two bits in the TCP header (taken from the reserved area) to allow the sending and receiving TCP to communicate. One of the TCP header bits is used by a receiver to send congestion information back to a sender; the other bit allows a sender to inform the receiver that the congestion notification has been received.

11.18 Congestion, Tail Drop and TCP
Tail-Drop Policy For Routers: if a packet queue is filled when a datagram must be placed on the queue, discard the datagram. The nametail-drop arises from the effect of the policy on an arriving sequence of datagrams. Once the queue fills, the router begins discarding all additional datagrams.

Tail-drop has an interesting effect on TCP. In the simple case where datagrams traveling through a router carry segments from a single TCP connection, the loss causes TCP to enter slow-start, which reduces throughput until TCP begins receiving ACKs and increases the congestion window. A more severe problem can occur, however, when the datagrams traveling through a router carry segments from many TCP connections because tail-drop can cause global synchronization.

11.19 Random Early Detection (RED)
How can a router avoid global synchronization? RED is the answer.

Random Early Drop,or Random Early Discard, the scheme is more frequently referred to by its acronym, RED. The general idea behind RED lies in randomization: instead of waiting until a queue fills completely, a router monitors the queue size. As the queue begins to fill, the router chooses datagrams at random to drop.

A router uses two threshold values to mark positions in the queue: TminandTmax.

The general operation of RED can be described by three rules that determine the disposition of a datagram that must be placed in the queue:

* If the queue currently contains fewer than Tmindatagrams, add the new datagram to the queue.
* If the queue contains more than Tmax datagrams, discard the new datagram.
* If the queue contains between Tmin and Tmax datagrams, randomly discard the datagram with a probability, p, that depends on the current queue size.

RED Policy For Routers: if the input queue is full when a datagram arrives, discard the datagram; if the input queue is below a minimum threshold, add the datagram to the queue; otherwise, discard the datagram with a probability that depends on the queue size.

However, a router should not drop datagrams unnecessarily, because doing so has a negative impact on TCP throughput. RED computes a weighted average queue size,avg, and uses the average size to determine the probability.

Both analysis and simulations show that RED works well. It handles congestion, avoids the synchronization that results from tail-drop, and allows short bursts without dropping datagrams unnecessarily. Consequently, the IETF now recommends that routers implement RED.

11.20 Establishing a TCP Connection
To establish a connection, TCP uses a three-way handshake.
The first segment of a handshake can be identified because it has the SYN bit set in the code field.
The second message has both the SYN and ACK bits set to indicate that it acknowledges the first SYN segment and continues the handshake.
The final handshake message is only an acknowledgement and is merely used to inform the destination that both sides agree that a connection has been established.

11.21 Intial Sequence Numbers
The three-way handshake accomplishes two important functions. It guarantees that both sides are ready to transfer data (and that they know they are both ready) and it allows both sides to agree on initial sequence numbers. Sequence numbers are sent and acknowledged during the handshake.

11.22 Closing a TCP Connection
hen an application program tells TCP that it has no more data to send, TCP will close the connection in one direction. To close its half of a connection, the sending TCP finishes
transmitting the remaining data, waits for the receiver to acknowledge it, and then sends
a segment with theFINbit set†. Upon receipt of a FIN, TCP sends an acknowledgement and then informs the application that the other side has finished sending data.

nce a connection has been closed in a given direction, TCP refuses to accept
more data for that direction. Meanwhile, data can continue to flow in the opposite
direction until the sender closes it. Of course, a TCP endpoint that is still receiving data
must send acknowledgements, even if the data transmission in the reverse direction has
terminated. When both directions have been closed, the TCP software at each endpoint deletes its record of the connection.

11.23 TCP Connection Reset
Sometimes abnormal conditions arise that force an application or the network software to break a connection without a graceful shutdown. TCP provides a reset facility to handle abnormal disconnections.

To reset a connection, one side initiates termination by sending a segment with the RST (RESET) bit in the CODE field set. The other side responds to a reset segment immediately by aborting the connection. When a reset occurs, TCP informs any local application that was using the connection.

11.24 TCP State Machine
Like most protocols, the operation of TCP can best be explained with a theoretical model called afinite state machine.

LISTEN (server) represents waiting for a connection request from any remote TCP and port

SYN-SENT (client) represents waiting for a matching connection request after having sent a connection request.

SYN-RECEIVED (server) represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request.

ESTABLISHED (both server and client) represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

FIN-WAIT-1 (both server and client) represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent.

FIN-WAIT-2 (both server and client) represents waiting for a connection termination request from the remote TCP.

CLOSE-WAIT (both server and client) represents waiting for a connection termination request from the local user.

CLOSING (both server and client) represents waiting for a connection termination request acknowledgment from the remote TCP.

LAST-ACK (both server and client) represents waiting for an acknowledgment of the connection termination request previously sent to the remote TCP (which includes an acknowledgment of its connection termination request).

TIME-WAIT (either server or client) represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. A connection can stay in TIME-WAIT for a maximum of four minutes known as a MSL (maximum segment lifetime).]

CLOSED (both server and client) represents no connection state at all.

11.25 Forcing Data Delivery
TCP provides apushoperation that an application can use to force delivery of octets currently in the stream without waiting for the buffer to fill. The push operation does more than force the local TCP to send a segment. It also requests TCP to set the PSH bit in the segment code field, so the data will be delivered to the application program on the receiving end.

11.26 Reserved TCP Port Numbers
TCP uses a combination of statically and dynamically assigned protocol port numbers.

11.27 Silly Window Syndrome and Small Packets
Transferring small segments unnecessarily consumes network bandwidth and introduces computational overhead. Small segments consume more network bandwidth per octet of data than large segments because each datagram has a header.

The problem of TCP sending small segments became known as the silly window syndrome (SWS). Early TCP implementations were plagued by SWS

Early TCP implementations exhibited a problem known as silly window syndrome, in which each acknowledgement advertises a small amount of space available and each segment carries a small amount of data.

11.28 Avoiding Silly Window Syndrome

11.28.1 Receive-Side Silly Window Avoidance
Receive-Side Silly Window Avoidance: before sending an updated window advertisement after advertising a zero window, wait for space to become available that is either at least 50% of the total buffer size or equal to a maximum sized segment.

11.28.2 Delayed Acknowledgements
TCP delays sending an acknowledgement when silly window avoidance specifies that the window is not sufficiently large to advertise. To avoid potential problems, the TCP standards place a limit on the time TCP delays an acknowledgement. Implementations cannot delay an acknowledgement for more than 500 milliseconds.

11.28.3 Send-Sie Silly Window Avoidance
A sending TCP must delay sending a segment until it can accumulate a reasonable amount of data. The technique is known as clumping.

Send-Side Silly Window Avoidance: when a sending application generates additional data to be sent over a connection for which previous data has been transmitted but not acknowledged, place the new data in the output buffer as usual, but do not send additional segments until there is sufficient data to fill a maximum-sized segment. If still waiting to send when an acknowledgement arrives, send all data that has accumulated in the buffer. Apply the rule even when the user requests a push operation.

In summary to overcome silly window syndrome, the TCP now requires the sender and receiver to implement heuristics that avoid the silly window syndrome. A receiver avoids advertising a small window, and a sender uses an adaptive scheme to delay transmission so it can clump data into large segments.

Tuesday, June 24, 2014

Internetworking with TCP/IP Notes (Chapter 10)

Chapter 10 User Datagram Protocol (UDP)

10.1 Using a protocol port as an ultimate destination
Each machine contains a set of abstract destination points called protocol ports. Each protocol port is identified by a positive integer. The local operating system provides an interface mechanism that processes use to specify a port or access it.

Most operating systems provide synchronous access to ports. From an application’s point of view, synchronous access means the computation stops when the application accesses the port.

Each message carries two protocol port numbers: a destination portnumber specifies a port on the destination computer to which the message has been sent, and a source port number specifies a port on the sending machine from which the message has been sent. It is because the destination need this information to generate a reply and forward it back.

10.2 The User Datagram Protocol
The User Datagram Protocol (UDP) provides an unreliable, besteffort, connectionless delivery service using IP to transport messages between machines. UDP uses IP to carry messages, but adds the ability to distinguish among multiple destinations within a given host computer.

10.3 UDP Message Format
The entire UDP header occupies a total of only eight octets.

UDP SOURCE PORT field contains a 16-bit protocol port number used by the sending application. This is optional.
UDP DESTINATION PORT field contains the 16-bit UDP protocol port number of the receiving application.
UDP MESSAGE LENGTH field contains a count of octets in the UDP datagram, including the UDP header and the user data. Thus, the minimum value is eight, the length of the header alone.
UDP MESSAGE LENGTH field consists of sixteen bits, which means the maximum value that can be represented is 65,535.

10.4 Interpretation of the UDP checksum
For IPv6, the UDP checksum is required. For IPv4, the UDP checksum is optional and need not be used at all; a value of zero in theCHECKSUM field means that no checksum has been computed. however, that IP does not compute a checksum on the data portion of an IP datagram.

10.5 UDP Checksum Computation and the Pseudo-Header
The purpose of using a pseudo-header is to verify that a UDP datagram has reached its correct destination. It is important to understand that a pseudo-header is only used for the checksum computation.

The UDP header itself specifies only the protocol port number. Thus, to verify the destination, UDP includes the destination IP address in the checksum as well as the UDP header. At the ultimate destination, UDP software verifies the checksum using the destination IP address obtained from the header of the IP datagram that carried the UDP message.

10.6 IPv4 UDP Pseudo-Header Format
The pseudo-header used in the UDP checksum computation for IPv4 consists of 12 octets of data as figure below

SOURCE IP ADDRESS and DESTINATION IP ADDRESS contain the source and destination IPv4 addresses that will be placed in an IPv4 datagram when sending the UDP message. Field PROTO contains the IPv4 protocol type code (17 for UDP)
Field labeled UDP LENGTH contains the length of the UDP datagram (not including the pseudo-header).

To verify the checksum, the receiver must extract these fields from the IPv4 header, assemble them into the pseudo-header format, and compute the checksum

10.7 IPv6 UDP Pseudo-Header FormatThe pseudo-header used in the UDP checksum computation for IPv6 consists of 40 octets of data arranged as figure below.

The pseudo-header for IPv6 uses IPv6 source and destination addresses. The other changes from IPv4 are that the PROTO field is replaced by the NEXT HEADER field and the order of fields has changed.

10.8 UDP Encapsulation and Protocol Layering
UDP lies in the transport layer above the internet layer.
Conceptually, applications access UDP, which uses IP to send and receive datagrams.

That is, because UDP is layered above IP, a complete UDP message, including
the UDP header and payload, is encapsulated in an IP datagram as it travels across an
internet. Of course, the datagram is encapsulated in a network frame as it travels across
an underlying network, which means there are two levels of encapsulation.

Two levels of encapsulation used when a UDP message travels in an IP datagram, which travels in a network frame

The IP layer is responsible only for transferring data between a pair
of hosts on an internet, while the UDP layer is responsible only for

differentiating among multiple sources or destinations within one host. Thus, only the IP header identifies the source and destination hosts; only the UDP layer
identifies the source or destination ports within a host.

9.9 UDP Multiplexing, Demultiplexing and Protocol ports
UDP software provides another example of multiplexing and demultiplexing.

* Multiplexing occurs on output. On a given host computer, multiple applications can use UDP simultaneously.

* Demultiplexing occurs on input. We can envision UDP accepting incoming UDP datagrams from IP, choosing the application to which the datagram has been sent, and passing the data to the application.
Conceptually, only the destination port number is needed to handle demultiplexing. When it processes an incoming datagram, UDP accepts the datagram from the IP software, extracts the UDP DESTINATION PORT from the header, and passes the data to the application.

It checks to see that the destination port number matches one of the ports currently in use. If it finds a match, UDP enqueues the new datagram at the port where the application program can access it. If none of the allocated ports match the incoming datagram, UDP sends an ICMP message to inform the source that the port was unreachable and discards the datagram.

10.10 Reserved and Available UDP port numbers
The port numbers in the range from 0 to 1023 are the well-known ports. They are used by system processes that provide widely used types of network services.

The range of port numbers from 1024 to 49151 are the registered ports. It need to registered with IANA. On most systems, registered ports can be used by ordinary users.

The range of port numbers from 49152 to 65535 are private ports for any usages.

Internetworking with TCP/IP Notes (Chapter 9)

Chapter 9 Internet Protocol: Error and Control Messages (ICMP)

9.1 The Internet Control Message Protocol
The Internet Control Message Protocol allows routers to send error or control messages back to the source of a datagram that caused a problem. ICMP messages are not usually delivered to applications. ICMP messages are sent to Internet Protocol software on the source computer.

9.2 Error Reporting vs Error Correction

When a datagram causes an error, ICMP can only report the error condition back to the original source of the datagram; the source must relate the error to an individual application program or take other action to correct the problem.

9.3 ICMP Message Delivery
ICMP messages travel across the internet in the payload area of IP datagrams. Because each ICMP message travels in an IP datagram, two levels of encapsulation are required.

Each ICMP message travels across an internet in the payload portion of an IP datagram, which itself travels across an underlying network in the payload portion of a frame. IPv4 uses the PROTOCOL field in the datagram header as a type field. When an ICMP message is carried in the payload area of an IPv6 datagram, the NEXT HEADER field of the header that is previous to the ICMP message contains 58.

9.4 Conceptual Layering
Although each ICMP message is encapsulated in an IP datagram, ICMP is not considered a higher-level protocol. Instead, ICMP is a required part of IP, which means ICMP is classified as a Layer 3 protocol.ICMP must send error reports to the original source, so an ICMP message must travel across multiple underlying networks to reach its final destination. Thus, ICMP messages cannot be delivered by a Layer 2 transport alone.

9.5 ICMP Messgae Format
The standards define two sets of ICMP messages: a set for IPv4 and a larger set for IPv6. In both versions of IP, each ICMP message has its own format. However, all ICMP messages begin with the same three fields.

An ICMP message begins with an 8-bit integer ICMP message TYPE field. The TYPE field identifies the specific ICMP message that follows.

An 8-bit CODE field in an ICMP message provides further information about the message type. The third field in each ICMP message consists of a 16-bi tCHECKSUM that is computed over the entire ICMP message.

The message body in an ICMP message depends entirely on the ICMP type. However, for ICMP messages that report an error, the message body always includes the header plus additional octets from the datagram that caused the problem.

9.6 Example ICMP Message Types Used with IPv4 and IPv6

ICMPv4 message types and the meaning of each.

Values not listed are unassigned or reserved.

Famous ICMPv4 type are 0&8 (ping), 3 (unreachable), 5 (redirect), 11(TTL)

ICMPv6 message types and the meaning of each.

Values not listed are unassigned or reserved.

IPv6 incorporates three major subsystems into ICMP: the Neighbor Discovery Protocol, Multicast support and IP mobility. ICMP messages have been defined for each of the subsystems.

9.7 Echo Request and Reply Message FormatBoth IPv4 and IPv6 use a single format for all ICMP Echo Request and Echo Reply messages.

For IPv4, the TYPE is 8 in a request and 0in a reply. For IPv6, the TYPE is 128 in a request and 129in a reply. For any value in the TYPE field, the CODE is zero (i.e., echo requests and replies do not use the code field). Fields IDENTIFIER and SEQUENCE NUMBER are used by the sender to match replies to requests. A receiving ICMP does not interpret the two fields, but does return the same values in the reply that were found in the request.

The field labeled OPTIONAL DATA is a variable length field that contains data to be returned to the sender. An echo reply always returns exactly the same data as was received in the request.

9.8 Reports of Unreachable Destinations
Whenever an error prevents a router from forwarding or delivering a datagram, the router sends an ICMPdestination unreachablemessage back to the source and then drops(i.e., discards) the datagram. Both IPv4 and IPv6 use the same format for destination unreachable messages.

Although they use the same message format, the way IPv4 and IPv6 interpret fields in the message differs slightly. IPv4 sets the TYPEto 3, and IPv6 sets the TYPEto 1. The CODE field contains an integer that further describes the problem; codes for IPv4 and IPv6 differ.

The CODE values for an ICMP destination unreachable message.

9.9 ICMP Error Reports regarding Fragmentation
IPv4 sends a destination unreachable message with the CODE field set to 4 andIPv6 sends a packet too big message, which has a TYPE field of 2. In IPv4, router can fragment the packet but it is prohibited to do this when the DF bit is set.

The reason IPv6 defines a separate ICMP message to report fragmentation problems as routers are always prohibited from fragmenting an IPv6 datagram. A key part of path

MTU discovery involves receiving information about the MTU of remote networks.

9.10 Route Change Requests from Routers
When a router detects a host using a nonoptimal first hop, the router sends the host an ICMP redirect message that instructs the host to change its forwarding table. The router also forwards the original datagram on to its destination.

The message begins with the requisite TYPE, CODE, and CHECKSUM fields. The message further contains two pieces of information: the IP address of a router to use as a first hop and the destination address that caused the problem. The message formats differ.

An IPv4 redirect message contains the 32-bit IPv4 address of a router followed by the prefix of the datagram that was incorrectly forwarded.

An IPv6 redirect message contains the IPv6 address of a router and the IPv6 destination address that should be forwarded through the router.

As a general rule, routers only send ICMP redirect requests to hosts and not to other routers.

9.11 Detecting Circular or Excessively Long Routes
A router does not merely discard a datagram that has exceed its hop limit. Instead, a router takes the further action of sending the source an ICMPtime exceededmessage.

ICMP uses the CODE field in a time exceeded message to explain the nature of the timeout being reported as below figure.

9.12 Reporting other problems
When a router or host finds problems with a datagram not covered by previous ICMP error messages (e.g., an incorrect datagram header), it sends a parameter problem message to the original source.