How do I detect an abnormal network disconnect? The previous question deals with detecting when a protocol connection is dropped normally, but what if you want to detect other problems, like unplugged network cables or crashed workstations? In these cases, the failure prevents notifying the remote peer that something is wrong. My feeling is that this is usually a feature, because the broken component might get fixed before anyone notices, so why force everyone to restart?
If you have a situation where you must be able to detect all network failures, you have two options:
The first option is to give the protocol a command/response structure: one host sends a command and expects a prompt response from the other host when the command is received or acted upon. If the response does not arrive, the connection is assumed to be dead, or at least faulty.
The second option is to add an "echo " function to your protocol, where one host (usually the client) is expected to periodically send out an "are you still there? " packet to the other host, which it must promptly acknowledge. If the echo-sending host doesn 't receive its response or the receiving host fails to see an echo request for a certain period of time, the program can assume that the connection is bad or the remote host has gone down.
If you choose the "echo " alternative, avoid the temptation to use the ICMP "ping " facility for this. If you did it this way, you would have to send pings from both sides, because Microsoft stacks won 't let you see the other side 's echo requests, only responses to your own echo requests. Another problem with ping is that it 's outside your protocol, so it won 't detect a failed TCP connection if the hardware connection remains viable. A final problem with the ping technique is that ICMP is an unreliable protocol: does it make a whole lot of sense to use an unreliable protocol to add an assurance of reliability to another protocol?
Another option you should not bother with is the TCP keepalive mechanism. This is a way to tell the stack to send a packet out over the connection at specific intervals whether there 's real data to send or not. If the remote host is up, it will send back a similar reply packet. If the TCP connection is no longer valid (e.g. the remote host has rebooted since the last keepalive), the remote host will send back a reset packet, killing the local host 's connection. If the remote host is down, the local host 's TCP stack will time out waiting for the reply and kill the connection.
There are two problems with keepalives:
Only Windows 2000 allows you to change the keepalive time on a per-process basis. On older versions of Windows, changing the keepalive time changes it for all applications on the machine that use keepalives. (Changing the keepalive time is almost a necessity since the default is 2 hours.)
Each keepalive packet is 40 bytes of more-or-less useless data, and there 's one sent each direction as long as the connection remains valid. Contrast this with a command/response type of protocol, where there is effectively no useless data: all packets are meaningful. In fairness, however, TCP keepalives are less wasteful on Windows 2000 than the "are you still there " strategy above.
Note that different types of networks handle physical disconnection differently. Ethernet, for example, establishes no link-level connection, so if you unplug the network cable, a remote host can 't tell that its peer is physically unable to communicate. By contrast, a dropped PPP link causes a detectable failure at the link layer, which propagates up to the Winsock layer for your program to detect.
|