Road to CCIE R&S: Internetworking with TCP/IP Notes (Chapter 11)

Chapter 11 Reliable Stream Transport Service (TCP)

11.1 Properties of the Reliable Delivery Service
The reliable transfer service that TCP provides to applications can be characterized by five features that are discussed below:

*Stream Orientation
* Virtual Circuit Connection
* Buffered Transfer
* Unstructured Stream
* Full Duplex Communication

Stream Orientation. When two application programs use TCP to transfer large volumes of data, the data is viewed as a stream of octets. The application on the destination host receives exactly the same sequence of octets that was sent by the application on the source host.

Virtual Circuit Connection. Before data transfer starts, both the sending and receiving applications must agree to establish aTCP connection. TCP monitors data transfer; if communication fails for any reason, the application programs are informed.

Buffered Transfer. To make transfer more efficient and to minimize network traffic, implementations usually collect enough data from a stream to fill a reasonably large datagram before transmitting it across an internet.

Unstructured Stream. The TCP/IP stream service does not provide structured data
streams. Application programs using the stream service must understand stream content and agree on a stream format before they initiate a connection.

Full Duplex Communication. Connections provided by the TCP/IP stream service allow concurrent transfer in both directions. The advantage of a full duplex connection is that the underlying protocol software can send control information for one stream back to the source in datagrams carrying data in the opposite direction.

11.2 Reliability: Acknowledgements and Retransmission

TCP rely on a fundamental technique known as positive acknowledgement with retransmission (PAR) to ensure the reliability. The technique requires a recipient to communicate with the source, sending back an acknowledgement(ACK) each time data arrives successfully. When it sends a packet, the sending software starts a timer. If an acknowledgement arrives before the timer expires, the sender cancels the timer and prepares to send more data. If the timer expires before an acknowledgement arrives, the sender retransmits the packet.

A sender must retain a copy of a packet that has been transmitted in case the packet must be retransmitted. In practice, a sender only needs to retain the data that goes in the packet along with sufficient information to allow the sender to reconstruct the packet headers. The idea of keeping unacknowledged data is important in TCP.

Reliable protocols detect duplicate packets by assigning each packet a sequence number and requiring the receiver to remember which sequence numbers it has received. To avoid ambiguity, positive acknowledgement protocols arrange for each acknowledgement to contain the sequence number of the packet that arrived.

Illustration of timeout and retransmission when a packet is lost.

11.3 The Sliding Window Paradigm
A simple positive acknowledgement protocol wastes a substantial amount of network capacity because it must delay sending a new packet until it receives an acknowledgement for the previous packet.

The sliding window technique uses a more complex form of positive acknowledgement and retransmission. The key idea is that a sliding window allows a sender to transmit multiple packets before waiting for an acknowledgement. The protocol places a small, fixed-size window on the sequence and transmits all packets that lie inside the window.

Technically, the number of packets that can be unacknowledged at any given time is constrained by the window size, which is limited to a small, fixed number.

The window partitions the sequence of packets into three sets:
* those packets to the left of the window have been successfully transmitted, received, and acknowledged;
* those packets to the right have not yet been transmitted; and
* those packets that lie in the window are being transmitted.

The lowest numbered packet in the window is the first packet in the sequence that has not been acknowledged.

11.4 The Transmission Control Protocol
TCP is a communication protocol, not a piece of software. The protocol specifies the format of the data and acknowledgements that two computers exchange to achieve a reliable transfer, as well as the procedures the computers use to ensure that the data arrives correctly.

It specifies how TCP software distinguishes among multiple destinations on a given machine, and how communicating machines recover from errors like lost or duplicated packets. The protocol also specifies how two computers initiate a TCP connection and how they agree when it is complete.

11.5 Layering, Ports, Connections and Endpoints
TCP, which resides in the transport layer just above IP, allows multiple application programs on a given computer to communicate concurrently, and it demultiplexes incoming TCP traffic among the applications.

Like the User Datagram Protocol, TCP uses protocol port numbers to identify application programs. Also like UDP, a TCP port number is sixteen bits long. TCP ports are much more complex because a single port number does not identify an application. Instead, TCP has been designed on a connection abstraction in which the objects to be identified are TCP connections, not individual ports.

TCP uses the connection, not the protocol port, as its fundamental abstraction; connections are identified by a pair of endpoints.

TCP defines an end point to be a pair of integers (host,port), where host is the IP address for a host and portis a TCP port on that host. Because TCP identifies a connection by a pair of endpoints, a given TCP port number can be shared by multiple connections on the same
machine.

11.6 Passive and Active Opens
TCP is a connection-oriented protocol that requires both endpoints to agree to participate.

The application program on one end performs a passive open by contacting the local operating system and indicating that it will accept an incoming connection for a specific port number.

The application program on the other end can then perform an active open by requesting that a TCP connection be established.

11.7 Segments, Streams and Sequence Numbers
TCP views the data stream as a sequence of octets that it divides into segments for transmission. The TCP form of a sliding window protocol also solves the end to-end flow control problem by allowing the receiver to restrict transmission until it has sufficient buffer space to accommodate more data.

The first pointer marks the left of the sliding window, separating octets that have been sent and acknowledged from octets yet to be acknowledged.
A second pointer marks the right of the sliding window and defines the highest octet in the sequence that can be sent before more acknowledgements are received.
The third pointer marks the boundary inside the window that separates those octets that have already been sent from those octets that have not been sent.

The protocol software sends all octets in the window without delay, so the boundary inside the window usually moves from left to right quickly.

We think of the transfers as completely independent because at any time data can flow across the connection in one direction, or in both directions. Thus, TCP software on a computer maintains two windows per connection: one window slides along as the data stream is sent, while the other slides along as data is received.

11.8 Variable Window Size and Flow Control
Each acknowledgement, which specifies how many octets have been received, contains a window advertisement that specifies how many additional octets of data the receiver is prepared to accept beyond the data being acknowledged. In response to an increased window advertisement, the sender increases the size of its sliding window and proceeds to send octets that have not been acknowledged.

TCP software must not contradict previous advertisements by shrinking the window past previously acceptable positions in the octet stream. Instead, smaller advertisements accompany acknowledgements, so the window size only changes at the time it slides forward.

Having a mechanism for flow control is essential in an environment where computers of various speeds and sizes communicate through networks and routers of various speeds and capacities. To fulfill this target, there are two independent problems.

1. Protocols need to provide end-to-end flow control between the source and ultimate destination.

2. A mechanism is needed that allows intermediate systems (i.e., routers) to control a source that sends more traffic than the machine can tolerate. When intermediate machines become overloaded, the condition is called congestion

TCP uses its sliding window scheme to solve the end-to-end flow control problem. For Congestion problem, it will use Tail Drop and RED to overcome.

11.9 TCP Segment Format
The unit of transfer between the TCP software on two machines is called a segment. Segments are exchanged to establish a connection, transfer data, send acknowledgements, advertise window sizes, and close connections. Because TCP allows piggybacking, an acknowledgement traveling from computer A to computer B may travel in the same segment as data traveling from computer A to computer B, even though the acknowledgement refers to data sent from B to A

TCP header consists of at least 20 octets and may contain more if the segment carries options. The maximum is 60 octets.

Fields SOURCE PORT and DESTINATION PORT contain the TCP port numbers that identify the application programs at the ends of the connection.

SEQUENCE NUMBER field identifies the position in the sender’s octet stream of the data in the segment.

ACKNOWLEDGEMENT NUMBER field identifies the number of the octet that the source expects to receive next.
Note that the sequence number refers to the stream flowing in the same direction as the segment, while the acknowledgement number refers to the stream flowing in the opposite direction from the segment.

HLEN field contains an integer that specifies the length of the segment header measured in 32-bit multiples. It is needed because the OPTIONS field varies in length, depending on which options are included.

The 6-bit field marked RESERVED is reserved for future use TCP software uses the 6-bit field labeled CODE BITS to determine the purpose and contents of the segment. The below figure illustrate the usage.

TCP software advertises how much data it is willing to accept every time it sends a segment by specifying its buffer size in the WINDOW field. The field contains a 16-bit
unsigned integer and maximum size is 64KBytes.

Window advertisements provide an example of piggybacking because they accompany all segments, including those carrying data as well as those carrying only an acknowledgement.

11.10 Out of Band Data (urgent bit)
Although TCP is a stream-oriented protocol, it is sometimes important for the program at one end of a connection to send dataout of band, without waiting for the program at the other end of the connection to consume octets already in the stream.

To accommodate out-of-band signaling, TCP allows the sender to specify data as urgent, meaning that the receiving application should be notified of its arrival as quickly as possible, regardless of its position in the stream.

The mechanism used to mark urgent data when transmitting it in a segment consists of the URG code bit and the URGENT POINTER field in the segment header. When the URG bit is set, the URGENT POINTER field specifies the position in the segment where urgent data ends.

11.11 TCP Options
TCP header can contain zero or more options. Recall that the header length is specified in 32-bit multiples. If the options do not occupy an exact multiple of 32 bits, PADDING is added to the end of the header.

11.11.1 Maximum Segment Size Option
TCP uses amaximum segment size(MSS) option to allow a receiver to specify the maximum size segment that it is willing to receive. MSS negotiation is especially significant because it permits heterogeneous systems to communicate.

Unlike a TCP segment, a fragment cannot be acknowledged or retransmitted independently; all fragments must arrive or the entire datagram must be retransmitted.

In theory, the optimum segment size,S, occurs when the IP datagrams carrying the segments are as large as possible without requiring fragmentation anywhere along the path, from the source to the destination.

11.11.2 Windo Scaling Option
Because theWINDOW field in the TCP header is 16 bits long, the maximum size window is 64 Kbytes.

To accommodate larger window sizes, a window scaling option was created for TCP. The option consists of three octets: a type, a length, and a shift value,S. In essence, the shift value specifies a binary scaling factor to be applied to the window value. When window scaling is in effect, a receiver extracts the value from the WINDOW field,W, and shifts W left S bits to obtain the actual window size.

11.11.3 Timestap Option
The TCP timestamp option was invented to help TCP compute the delay on the underlying network.

11.12 TCP Checksum Computation
To compute the checksum, TCP software on the sending machine follows a procedure similar to UDP. Conceptually, TCP prepends a pseudo-header to the TCP segment, appends enough zero bits to make the segment a multiple of 16 bits, and computes the 16-bit checksum over the entire result.

The purpose of using a TCP pseudo-header is exactly the same as in UDP. It allows the receiver to verify that the segment has reached the correct endpoint, which includes both an IP address and a protocol port number.

11.13 Acknowledgements, Retransmission and Timeouts
The receiver always acknowledges the longest contiguous prefix of the stream that has been received correctly. Each acknowledgement specifies a sequence value one greater than the highest octet position in the contiguous prefix it received. In short, a TCP acknowledgement specifies the sequence number of the next octet that the receiver expects to receive.

This TCP acknowledgement scheme is called cumulative because it reports how much of the stream has accumulated. Cumulative acknowledgements have both advantages and disadvantages. One advantage is that acknowledgements are both easy to generate and unambiguous. A major disadvantage is that the sender does not receive information about all successful transmissions, but only about a single position in the stream that has been received. Thus the sender must decide whether to sent the particular segment or all the segments.

Every time it sends a segment, TCP starts a timer and waits for an acknowledgement. If the timer expires before data in the segment has been acknowledged, TCP assumes that the segment was lost or corrupted and retransmits it.

TCP uses an adaptive retransmission algorithm which TCP monitors the round trip time on each connection and computes reasonable values for timeouts. As the performance of a connection changes, TCP revises its timeout value (i.e., it adapts to the change).

TCP records the time at which each segment is sent and the time at which an acknowledgement arrives for the data in that segment. From the two times, TCP computes an elapsed time known as a round trip sample. Whenever it obtains a new round trip sample, TCP must adjust its notion of the average round trip time for the connection.

To accommodate the varying delays encountered in an internet environment, TCP uses an adaptive retransmission algorithm that monitors delays on each connection and adjusts its timeout parameter accordingly.

11.14 Karn's Algorithm and Timer Backoff
Karn’s algorithm: when computing the round trip estimate, ignore samples that correspond to retransmitted segments, but use a backoff strategy and retain the timeout value from a retransmitted packet for subsequent packets until a valid sample is obtained.

11.15 Respons to congestion
TCP must also react to congestionin an internet. Congestion is a condition of severe delay caused by an overload of datagrams at one or more switching points

TCP can help avoid congestion by reducing transmission rates when congestion occurs. In fact, TCP reacts quickly by reducing the transmission rate automatically whenever delays occur.

To avoid congestion, the TCP standard now recommends using two techniques: slow-start and multiplicative decrease. The two are related and can be implemented easily. We said that for each connection, TCP must remember the size of the receiver’s window. To control congestion, TCP maintains a second limit, called the congestion window size or congestion window that it uses to restrict data flow to less than the receiver’s buffer size when congestion occurs. Because TCP reduces the congestion window by half for every loss, it decreases the window exponentially if loss continues.

Multiplicative Decrease Congestion Avoidance: upon loss of a segment, reduce the congestion window by half (but never reduce the window to less than one segment). When transmitting segments that remain in the allowed window, backoff the retransmission timer exponentially.

How can TCP recover when congestion ends? TCP uses a technique named slow-start to scale up transmission. Slow-Start (Additive) Recovery: whenever starting traffic on a new
connection or increasing traffic after a period of congestion, start the congestion window at the size of a single segment and increase the congestion window by one segment each time an acknowledgement arrives.

To avoid increasing the window size too quickly and causing additional congestion, TCP adds one additional restriction. Once the congestion window reaches one half of its original size before congestion, TCP enters a congestion avoidance phase and slows down the rate of increment. During congestion avoidance, it increases the congestion window by1only if all segments in the window have been acknowledged. The overall approach is known asAdditive Increase Multiplicative Decrease(AIMD).

11.16 Fast Recovery and other response modifications
A solution known asTCP Friendly Rate Control(TFRC) was proposed. TFRC attempts to emulate TCP behavior by having a UDP receiver report datagram loss back to the sender and by having the sender use the reported loss to compute a rate at which UDP datagrams should be sent; TFRC has only been adopted for special cases.

11.17 Explicit Feedback mechanisms (SACK and ECN)

11.17.1 Selective Acknowledgement (SACK)
It allows a sender to know exactly which segments to retransmit. TCP includes two options for SACK. The first option is used when the connection is established to allow a sender to specify that SACK is permitted. The second option is used by a receiver when sending an acknowledgement to include information about specific blocks of data that were received. The information for each block includes the first sequence number in a block (called the left edge) and the sequence number immediately beyond the block (called the right edge).

11.17.2 Explicit Congestion Notification
This mechanism requires routers throughout an internet to notify TCP as congestion occurs. TCP segment passes through the internet, routers along the path use a pair of bits in the IP header to record congestion. Thus, when a segment arrives, the receiver knows whether the segment experienced congestion at any point.

ECN uses two bits in the IP header to allow routers to record congestion, and uses two bits in the TCP header (taken from the reserved area) to allow the sending and receiving TCP to communicate. One of the TCP header bits is used by a receiver to send congestion information back to a sender; the other bit allows a sender to inform the receiver that the congestion notification has been received.

11.18 Congestion, Tail Drop and TCP
Tail-Drop Policy For Routers: if a packet queue is filled when a datagram must be placed on the queue, discard the datagram. The nametail-drop arises from the effect of the policy on an arriving sequence of datagrams. Once the queue fills, the router begins discarding all additional datagrams.

Tail-drop has an interesting effect on TCP. In the simple case where datagrams traveling through a router carry segments from a single TCP connection, the loss causes TCP to enter slow-start, which reduces throughput until TCP begins receiving ACKs and increases the congestion window. A more severe problem can occur, however, when the datagrams traveling through a router carry segments from many TCP connections because tail-drop can cause global synchronization.

11.19 Random Early Detection (RED)
How can a router avoid global synchronization? RED is the answer.

Random Early Drop,or Random Early Discard, the scheme is more frequently referred to by its acronym, RED. The general idea behind RED lies in randomization: instead of waiting until a queue fills completely, a router monitors the queue size. As the queue begins to fill, the router chooses datagrams at random to drop.

A router uses two threshold values to mark positions in the queue: TminandTmax.

The general operation of RED can be described by three rules that determine the disposition of a datagram that must be placed in the queue:

* If the queue currently contains fewer than Tmindatagrams, add the new datagram to the queue.
* If the queue contains more than Tmax datagrams, discard the new datagram.
* If the queue contains between Tmin and Tmax datagrams, randomly discard the datagram with a probability, p, that depends on the current queue size.

RED Policy For Routers: if the input queue is full when a datagram arrives, discard the datagram; if the input queue is below a minimum threshold, add the datagram to the queue; otherwise, discard the datagram with a probability that depends on the queue size.

However, a router should not drop datagrams unnecessarily, because doing so has a negative impact on TCP throughput. RED computes a weighted average queue size,avg, and uses the average size to determine the probability.

Both analysis and simulations show that RED works well. It handles congestion, avoids the synchronization that results from tail-drop, and allows short bursts without dropping datagrams unnecessarily. Consequently, the IETF now recommends that routers implement RED.

11.20 Establishing a TCP Connection
To establish a connection, TCP uses a three-way handshake.
The first segment of a handshake can be identified because it has the SYN bit set in the code field.
The second message has both the SYN and ACK bits set to indicate that it acknowledges the first SYN segment and continues the handshake.
The final handshake message is only an acknowledgement and is merely used to inform the destination that both sides agree that a connection has been established.

11.21 Intial Sequence Numbers
The three-way handshake accomplishes two important functions. It guarantees that both sides are ready to transfer data (and that they know they are both ready) and it allows both sides to agree on initial sequence numbers. Sequence numbers are sent and acknowledged during the handshake.

11.22 Closing a TCP Connection
hen an application program tells TCP that it has no more data to send, TCP will close the connection in one direction. To close its half of a connection, the sending TCP finishes
transmitting the remaining data, waits for the receiver to acknowledge it, and then sends
a segment with theFINbit set†. Upon receipt of a FIN, TCP sends an acknowledgement and then informs the application that the other side has finished sending data.

nce a connection has been closed in a given direction, TCP refuses to accept
more data for that direction. Meanwhile, data can continue to flow in the opposite
direction until the sender closes it. Of course, a TCP endpoint that is still receiving data
must send acknowledgements, even if the data transmission in the reverse direction has
terminated. When both directions have been closed, the TCP software at each endpoint deletes its record of the connection.

11.23 TCP Connection Reset
Sometimes abnormal conditions arise that force an application or the network software to break a connection without a graceful shutdown. TCP provides a reset facility to handle abnormal disconnections.

To reset a connection, one side initiates termination by sending a segment with the RST (RESET) bit in the CODE field set. The other side responds to a reset segment immediately by aborting the connection. When a reset occurs, TCP informs any local application that was using the connection.

11.24 TCP State Machine
Like most protocols, the operation of TCP can best be explained with a theoretical model called afinite state machine.

LISTEN (server) represents waiting for a connection request from any remote TCP and port

SYN-SENT (client) represents waiting for a matching connection request after having sent a connection request.

SYN-RECEIVED (server) represents waiting for a confirming connection request acknowledgment after having both received and sent a connection request.

ESTABLISHED (both server and client) represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

FIN-WAIT-1 (both server and client) represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent.

FIN-WAIT-2 (both server and client) represents waiting for a connection termination request from the remote TCP.

CLOSE-WAIT (both server and client) represents waiting for a connection termination request from the local user.

CLOSING (both server and client) represents waiting for a connection termination request acknowledgment from the remote TCP.

LAST-ACK (both server and client) represents waiting for an acknowledgment of the connection termination request previously sent to the remote TCP (which includes an acknowledgment of its connection termination request).

TIME-WAIT (either server or client) represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. A connection can stay in TIME-WAIT for a maximum of four minutes known as a MSL (maximum segment lifetime).]

CLOSED (both server and client) represents no connection state at all.

11.25 Forcing Data Delivery
TCP provides apushoperation that an application can use to force delivery of octets currently in the stream without waiting for the buffer to fill. The push operation does more than force the local TCP to send a segment. It also requests TCP to set the PSH bit in the segment code field, so the data will be delivered to the application program on the receiving end.

11.26 Reserved TCP Port Numbers
TCP uses a combination of statically and dynamically assigned protocol port numbers.

11.27 Silly Window Syndrome and Small Packets
Transferring small segments unnecessarily consumes network bandwidth and introduces computational overhead. Small segments consume more network bandwidth per octet of data than large segments because each datagram has a header.

The problem of TCP sending small segments became known as the silly window syndrome (SWS). Early TCP implementations were plagued by SWS

Early TCP implementations exhibited a problem known as silly window syndrome, in which each acknowledgement advertises a small amount of space available and each segment carries a small amount of data.

11.28 Avoiding Silly Window Syndrome

11.28.1 Receive-Side Silly Window Avoidance
Receive-Side Silly Window Avoidance: before sending an updated window advertisement after advertising a zero window, wait for space to become available that is either at least 50% of the total buffer size or equal to a maximum sized segment.

11.28.2 Delayed Acknowledgements
TCP delays sending an acknowledgement when silly window avoidance specifies that the window is not sufficiently large to advertise. To avoid potential problems, the TCP standards place a limit on the time TCP delays an acknowledgement. Implementations cannot delay an acknowledgement for more than 500 milliseconds.

11.28.3 Send-Sie Silly Window Avoidance
A sending TCP must delay sending a segment until it can accumulate a reasonable amount of data. The technique is known as clumping.

Send-Side Silly Window Avoidance: when a sending application generates additional data to be sent over a connection for which previous data has been transmitted but not acknowledged, place the new data in the output buffer as usual, but do not send additional segments until there is sufficient data to fill a maximum-sized segment. If still waiting to send when an acknowledgement arrives, send all data that has accumulated in the buffer. Apply the rule even when the user requests a push operation.

In summary to overcome silly window syndrome, the TCP now requires the sender and receiver to implement heuristics that avoid the silly window syndrome. A receiver avoids advertising a small window, and a sender uses an adaptive scheme to delay transmission so it can clump data into large segments.

Road to CCIE R&S

Wednesday, June 25, 2014

Internetworking with TCP/IP Notes (Chapter 11)

No comments:

Post a Comment