Chapter 14. Network I/O: MonitoringLet's begin our discussion of network I/O monitoring by revisiting our old standby, netstat, which displays overall network statistics. Probably one of the most common commands you will type is netstat -in:
Code View:
Scroll
/
Show All
# netstat -in Here is a key to the output fields:
Another handy netstat flag is -m. This option lets you view the kernel memory allocation statistics, including mbuf memory requests (and buffer size), amount of memory in use, and failures by CPU:
Code View:
Scroll
/
Show All
# netstat -m If you're using Ethernet, you can also use the entstat command to display device driver statistics: # entstat -d en1 The entstat output provides a potpourri of information. You won't see many collisions because you'll probably be working in a switched environment. Look for transmit errors, and make sure they're not increasing too fast. You need to learn to troubleshoot collision and error problems before you even begin to think about tuning. As an alternative, you can use netstat -v which provides similar information. 14.1. netpmonnetpmon [-o File] [-d] [-T n] [-P] [-t] [-v] [-O ReportType ...] The netpmon command reports information about CPU usage as it relates to the network. It also provides data about network device driver I/O, Internet socket calls, and various other statistics. Similar to its other trace brethren, tprof and filemon, netpmon starts a trace and runs in the background until you stop it with the trcstop command. I like netpmon because it really gives you a detailed overview of network activity and also captures data for trending and analysis (although it's not as useful as nmon for the latter purpose). In the following example, we'll use a trace buffer size of 2 million bytes: # netpmon -T 2000000 -o /tmp/net.out Run trcstop to signal the end of the trace: # trcstop Let's look at the data. Here is just a small sampling of the output: # more net.out gil 3870 0.0163 0.004 0.004 As you can see, little overall network I/O activity was going on during this time. The top section of the output is most important. It helps you gain an understanding of which processes are eating up network I/O time. The lsattr command, which we used in Chapter 13 to view hardware parameters, is another tool you'll use frequently to display statistics about your interfaces. The attributes reported by this command are configured using either the chdev or the no command. Let's display the driver parameters using lsattr: # lsattr -El en0 Sometimes, I also like to use the spray command to troubleshoot possible problems (although oftentimes this command is blocked because it's not very secure). The spray command sends a one-way stream of packets from your host to the remote host machines and reports the number of packets dropped as well as the number of packets transferred: # /usr/etc/spray lpar8test -c 2000 -l 1400 -d 1 In the preceding example, 2,000 packets were sent to the lpar8test host, with a delay of one microsecond. Each packet consisted of 1,400 bytes. Before using spray, make sure the sprayd daemon isn't commented out of the inetd daemon (the default configuration in AIX), and don't forget to refresh inetd. If you're seeing a substantial number of dropped packets, that obviously is not good. 14.2. Monitoring NFSThis section covers the use of the nmon, topas, nfsstat, nfs, nfs4cl, and netpmon commands to monitor the Network File System (NFS). For NFS tuning, you could use a tool such as topas or nmon initially because these commands provide a nice dashboard view of what is happening in your system. Remember that NFS performance problems might not be related to your NFS subsystem at all; your bottleneck could be on the network or, from a server perspective, related to CPU or disk I/O. Running a tool such as topas or nmon can quickly help you get a sense of what the real issues are. Consider a system that has two CPUs and is running AIX 5.3 TL_6. The report in Figure 14.1 shows nmon output from an NFS perspective. Figure 14.1. NFS nmon outputLook at all the information that is available to you from an NFS (client and server) perspective using nmon! There are no current bottlenecks at all on this system. Although topas has improved recently with its ability to capture data, nmon might still be a better first choice. While topas provides a front end similar to nmon, nmon is more useful in terms of long-term trending and analysis. 14.3. nfsstatThe nfsstat tool is arguably the most important tool you'll work with as you monitor your network. This command displays all types of information about NFS and remote procedure calls (RPCs). You can use nfsstat as a monitoring tool to troubleshoot problems and also employ it for performance tuning. Depending on the flags you use, you can have nfsstat display NFS client or server information. The command can also show the actual usage count of file system operations. This detail helps you understand exactly how each file system is utilized, so that you can know how to best tune your system. Look at the client flag (c) first. The r flag generates the RPC information: # nfsstat -cr Here's a rundown of the connection-oriented parameters:
If you notice a large number of timeouts or badxids, you could benefit by increasing the timeo parameter with the mount command (details to come). Next, look at the NFS information by using the n flag: # nfsstat -cn In NFS Version 3, the output fields include:
14.4. nfs4clIf you're running NFS Version 4, you might be using the nfs4cl command more often. This command displays NFS 4 statistics and properties: # nfs4cl showfs If after running this command, you see that there is no output, run the mount command to obtain more detail:
Code View:
Scroll
/
Show All
# mount As you can tell, in this example no file systems are mounted using NFS Version 4, only NFS Version 3. Unlike the vast majority of performance tuning commands, nfs4cl can also be used to tune your system. You do this by using the setfsoptions subcommand to tune NFS Version 4. Another parameter you can tune is the previously mentioned timeo, which specifies the timeout value for the RPC calls to the server. 14.5. netpmon and NFSThe netpmon command can also help you troubleshoot NFS bottlenecks. In addition to monitoring many other types of network statistics, netpmon monitors for clients — both read and write subroutines and NFS RPC requests. For servers, netpmon monitors read and write requests. The command starts a trace and runs in the background until you stop it. First, let's kick off the trace: # netpmon -T 3000000 -o /tmp/nfrss.out You run the trcstop command to signal the end of the trace, as the following message informs you: # Sun Oct 7 07:06:14 2007 # trcstop Now, we can check out the NFS-specific information provided in the output file: NFSv3 Client RPC Statistics (by Server): In this case, you can see the NFS Version 3 client statistics by server. Although netpmon is a useful trace utility, its performance overhead can sometimes outweigh its benefits, particularly when you have other ways to obtain similar information. So be aware of this consideration when using this utility. 14.6. Monitoring Network PacketsEarlier, I addressed some of the very basic flags, such as -in, that you typically use with the netstat command. Using netstat, you can also monitor more detailed information about the packets themselves. For example, the -D option reports the overall number of packets received, transmitted, and dropped in your communications subsystem. The command output sorts the results by device, driver, and protocol:
Code View:
Scroll
/
Show All
# netstat -D There are actually so many different ways to use netstat that the best place to start is to look at the man page for netstat and go from there. Don't be afraid to run these commands, because they won't eat up disk space or affect performance. 14.7. iptrace, ipreport, and ipfilterThe tracing tools provided within AIX are used to record detailed information about packets. Use these commands with more caution. The tools are extremely helpful when you're trying to determine the root cause of network performance problems. Check out iptrace and ipreport first. The iptrace command records all packets received from the network interfaces. The ipreport command formats the data generated from iptrace into a readable trace report. You can also use the ipfilter command to sort the output file created from ipreport. Let's try starting the trace and running it for one minute: # /usr/sbin/iptrace -a -i en0 iptrace Here, you can see the trace running:
Code View:
Scroll
/
Show All
# ps -ef | grep iptrace When we're done with the trace, we need to kill the process: # kill −1 77425 Next, let's sort the file: # ipreport -r -s iptrace.out >/ipreport.network Now, we can examine the output, which shows the captured information about each packet, including packet size and IP address information: # more ipreport.network As you can imagine, the trace file can become very large fairly quickly. The file for this example grew to 40 MB in less than a minute! Be very careful when running these traces because you'll run out of disk space really fast if you don't have the disk bandwidth for these files. You can also start the trace using the System Resource Controller (SRC). 14.8. tcpdumpWhat about tcpdump? This command prints the headers of the packets that are captured for each network interface card (NIC). One important difference with tcpdump is that, unlike iptrace, it can look at only one network interface at a time. And because iptrace examines the entire packet from the kernel space, its results can include lots of dropped packets. With tcpdump, you can limit the amount of data to be traced. Also, you don't need to use an ipreport type of command to format the binary data because tcpdump performs both the trace and the output. Let's run tcpdump: # tcpdump -w tcp.out The utility continues to capture packets until you press Ctrl+C. If any packets were dropped due to a lack of buffer space, tcpdump reports that, too: 14755 packets received by filter The preceding output shows that the kernel dropped no packets, which is a good thing. |
|