With software protocols, we are typically interested only in point-to-point communication between two entities on the network, usually two programs. Also, we are interested in communication over channels, not physical lines. If you recall, a channel is a virtual communication entity, that might be composed on any combination of physical layers methods. For example if you use a browser to access a web server at some other location, you aren't interested in what physical layers are involved or how your messages are handled, only in the results. The channel is a simple way of discussing this higher layer of communication that can be treated as a simple pipeline for data.
Our terminology is also extended here with the term packet, which simply means one communication unit. In the physical layer, these are typically called frames. A packet is simply one grouping of information, typically including some sort of data and the framing information needed to make the protocol work properly.
Protocols can be differentiated from each other on the basis of the following properties:
Connectionality is concerned with how the two endpoints of the communication handle their relationship. Do they begin by making a connection agreement that allows them to establish the parameters of the communication to follow, or do they simply send data without any prior arrangement? The first method, called connection-oriented, is similar to a traditional telephone call, where the originator starts by dialing the number and waiting for a connection to be made when the receiver accepts the call. Along the way, the resources for the call are allocated and both end points have an agreed upon methodology (although with people its hard to tell and constantly changing). Connectionless is more like traditional mail service, where you simply address an envelope and drop it in the mailbox to be delivered at the other end without prior arrangement. In the physical layer world, the SONET protocol is a connection-oriented protocol, while IEEE 802.3 is connectionless.
Reliability is a measure of how well the protocol manages errors that might occur. Remember that a channel may well cross a large and complex subnetwork with many nodes and physical links, and bad things can happen to good packets. The things that can happen are:
While protocols can handle some subset of these, the general terminology is that a protocol that addresses all of them is reliable, while a protocol that addresses none or some is called unreliable.
If the sender didn't have enough to send to keep the channel busy, it might be given that the sender sends a packet every 2 seconds. Then we would have a 2 s delay, then a 0.1 s delay and then a 0.25 s send, so:
Finally, suppose the sender has to add some framing information. This wouldn't be synchronization bits or CRC's typically, but could include other forms of error control, addressing or processing information for the receiver. If that information took 0.05 s of the 0.25 second send time, we have:
The basics of utilization for software protocols are similar to those for physical layer protocols; what part of a given unit of time is spent doing useful work?
There are some sticky issues in this protocol. For example, what if a packet is lost and doesn't arrive at the receiver? How long does the sender wait for a response from the receiver? Or what if the message is received but the acknowledgement is lost? We will take these questions up later, but for now, consider the utilization properties of this protocol.
In general, if we let:
- L = Message length
- B = Data bytes in a message, so L-B bytes are overhead,
- D = the network latency, which can include propogation delay, routing time through the subnet and processing time on each end.
- R is the bit or byte rate at which data is sent.
Then the utilization is:
For example, suppose that we have a stop and wait protocol that has a message length of 1000 bytes with 20 bytes of framing, a send rate of 80,000 bits per second and propogation delay of 0.10 s and processing time on each end of 0.01 s. Since this is a stop and wait protocol, the message cycle starts when the message is sent and ends when the ACK is received. It takes two times the propogation delay, plus 4 times the processing time to get both parts of the exchange complete, so D is:
The network latency can be viewed as a time (0.24 s) or as the number of bytes (2400) that can be sent in that time. Then the utilization is:
In most cases, D is a given, rather than something that has to be calculated, and it is typically given in terms of a statistical distribution. For example, the network latency is normally distributed with a mean of 300 ms and a variance of 200 ms.
This method of automatically resending unacknowledged packets solves two problems. Lost packets will be resent and if the receiver gets a packet with a data error, it simply waits for the resend. You could use a NAK from the receiver to indicate an error, but that has its own set of problems.
So now we have a basic protocol for each end:
Sender
- Send a packet with sequence number 0
- Set the acknowledgement timer
- If the timer expires, go to 1.
- If the acknowledgement for 0 arrives, stop the timer
- Send a packet with the sequence number 1
- Set the acknowledgement timer
- If the timer expires, go to 5.
- If the acknowledgement for 1 arrives, stop the timer
- Go to 1.
Receiver
- Receive packet.
- If the sequence number is 0, send an ACK and process it, otherwise throw it away and go to 1.
- Receive a packet.
- If the sequence number is 1, send an ACK and process it, otherwise throw it away and go to 3.
- Go to 1.
The term Automatic Request for this protocol indicates that the protocol automatically resends any packet that is not acknowledged.
ARQ protocols are typically connection-oriented due to the need to synchronize the sequence numbers and usually they are full duplex, with a sender and a receiver at each end and they can handle most of the reliability issues. They have a very strict flow control policy, since only one message at a time can be in the channel and we will take up that problem in the next unit.
Utilization for an ARQ protocol is similar to that for a stop and wait protocol, except that you must consider the possible expiration of the timer and the associated consequences. If we assume that we have some statistical information about packet loss we can estimate the likelihood that the timer expires and that we incur the cost of the timeout and retransmit. Using the notation from above, add: T, the timeout value and E, the probability of a packet loss in any exchange. This could be a packet corruption or a packet that simply doesn't arrive. Given T, we can estimate an additional cost for each packet that is the loss associated with resending a packet, plus the cost of waiting for the timer to expire. For example,
is the expected time lost for every packet sent (the probability of an error times the time to process the error). In the most complex sense, we would also have to consider that a resent packet could get an error, but we will ignore that problem. So now we have:
or using time instead of bytes:
For example, suppose that we have an ARQ network with a data rate (R) of 1 Mbps, a latency (D) of 0.01 s, a message length (L) of 1100 bytes, a header length (H) of 100 bytes, a timeout value of 0.20 s and an error rate of 0.05. So,
then
As you can see by the denominator, the latency and error cost dominate the actual cost of sending and drive the utilization to a very low value. This low utilization is a problem with an ARQ protocol of this form, so we need to find something better.