Transport Layer Topics: TCP, Multiplexing & Sockets
The lower layers of the Internet Protocol Suite and OSI network communication models are inherently unreliable. For example, the Ethernet and Internet Protocols use checksum to test for data corruption, but they discard the frame/packet if there is an issue and don’t have their own system for replacing lost data. For this whole internet thing to work though, we need reliability. Network communication reliability is handled by Transport Layer protocols.
The Transport Layer services can include:
- Connection-oriented + connectionless communication
- End-to-end communication
- Reliability
- Flow control
- Congestion Avoidance
- Multiplexing to allow a host to connect multiple services to other hosts with a single channel
Why do we need reliability?
These are simple examples, but imagine if emails arrived with paragraphs missing, or if websites rendered incorrectly–the data would be useless.
How do we transfer data reliably over an unreliable channel?
The answer is that we need a reliable protocol that ensures all data that is sent is received and usable in the right order. Along with that:
- The route should be established for as long as needed
- Technology should permit dissimilar systems to exchange data
The General’s Dilemma is commonly used to think about data transmission and what problems we might face with reliability. In the General’s Dilemma, the two generals must attack the castle at the same time to ensure a victory. If they attack at different times, they will be defeated. They are out of cell service 😉, so both sides must send messengers on a perilous journey past the castle to agree upon a battle time. Upon receipt of a message, the other general can send an acknowledgement message. However, they have no way to ensure that the acknowledgement messages are received by the other general, even if they follow up with yet another acknowledgement message. If they were to send their acknowledgement of acknowledgements back and forth across the field, they’d have no way to guarantee that the acknowledgement message has been received and wouldn’t have a way to close the loop.
Thus, we can think of some additional options to make this communication method more reliable:
- Timeout and retransmission — If an acknowledgment is not received by the sender by a specific time, another message can be dispatched
- Detect message corruption––If a message’s checksum doesn’t match then it might have been corrupted in transit
- Ensure messages are received in order––If multiple messengers are sent to convey a message, provide each with a sequence number on the segments
Overall, the example describes complex services provided by TCP:
- Data Integrity
- In-order delivery
- Error detection
- Retransmission of lost data
- De-duplication
So what is TCP?
There are two types of transport service, connection-oriented and connectionless. This article covers the connection-oriented transport service, TCP.
Transmission Control Protocol (TCP) is responsible for controlling communication between two endpoints (hosts) on a network. TCP gives reliability to the unreliable Internet and Physical layers.
TCP provides:
- Connection capability to connectionless Internet Protocol
- Error detection (Checksum)
- Sequencing and de-duplication
- Avoid data loss — retransmission of data
- Flow control for host-to-host interaction
- Congestion control mechanism for the routing network as a sliding window
How do we initiate a Transport Layer Connection and manage data flow?
As a reminder, TCP is a connection-oriented protocol. What that means is that the protocol does not start sending application data until a connection is established between application processes. Connections are secured with a “Three-Way Handshake”.
Here’s a text convo I found between Client and Server to handle a Three-Way Handshake. The green bubbles represent the start (above text) and end (below text) state of the sender or receive at that step. A key characteristic of the process is that the sender cannot send any application data until after it has sent the ACK
Segment.
Flow Control
To prevent the sender from sending us too many emojis (or too much data) all at once, which can overwhelm the receiver, each side of the connection indicates the amount of data it is willing to accept via the WINDOW field of the TCP header. Note that this number can change during the course of a connection, such as if a receiver’s buffer is getting full it can decrease the amount in the WINDOW field, so the sender can reduce the amount of data it sends in turn. The window can be thought of as a sliding window, as seen below. After a segment is sent and acknowledged, the window can slide forward one segment. The rest of the segments remain waiting for transmission in a buffer. This ability to send multiple segments at once is called pipelining and works because there is a persistent connection, so the client can send multiple requests for related resources in a single TCP session.
Network Congestion
When there is more data being transmitted on a network than it has capacity to process and transmit, this is known as network congestion. If things get backed up, the excess data transfer is dropped if the buffer overflows.
TCP Segment Deep Dive
Remember that in the TCP/IP and OSI layers, lower layers effectively provide a ‘service’ to the layer above them. As you work down the layers, each layer encapsulates data from the layer above it into its Data Payload and adds a Header, which together form that layer’s Protocol Data Unit (PDU).
The PDU for the Transport Layer/TCP is called a Segment. In the depiction below, we are particularly interested in the source and destination port addresses, sequence number, acknowledgement number, window size, and checksum fields that are in the header.
- Checksum — Error detection
- Sequence number & Acknowledgment number–Reliability such as In-order Delivery, Handling Data Loss, and Handling Duplication
- Window size–Flow Control
Note: This article pertains to IPv4. IPv6 headers don’t include a checksum for TCP because it is implemented at the Transport or Data Link/Link layers.
Why do we care about the source and destination port addresses in the TCP Segment?
The port addresses are key to end-to-end network communication. The combination of the IP address from the Internet Protocol Header and the Port number in the Transport Layer PDU can be thought of as defining a communication end-point called a socket.
Sockets
If client sessions were only distinguished by IP address or port number, the server wouldn’t know how to identify a unique process. Since a TCP connection is always identified by both the client and server IP address and the client and server port numbers, there is no confusion. Each TCP connection has exactly two end-points/sockets.
Looking back at the Segment diagram, recap that a socket is identified by that segment’s source IP address, source port number, destination IP address, and destination port number. This means that two TCP segments arriving to the same with different source IP addresses or source port numbers will be directed to two different sockets, even if they have the same destination port number. This leads us to our last topic.
Multiplexing & De-Multiplexing
Multiplexing is the notion that many applications or processes on a host share a single channel from the host and can correctly send those signals to the right destination without them getting tangled.
Why is multiplexing necessary?
Most hosts only have one network address available, so all transport connections on the machine have to use it. The host needs a way to send incoming segments to the right process or application.