Chapter 9. NetworksThe subject of network configuration and performance has been extensively covered by other writers [1]. For that reason, this chapter concentrates on Sun-specific networking issues, such as the performance characteristics of the many network adapters and operating system releases.
New NFS MetricsLocal disk usage and NFS usage are functionally interchangeable, so Solaris 2.6 was changed to instrument NFS client mount points as if they were disks! NFS mounts are always shown by iostat and sar. Automounted directories coming and going more often than disks coming online may be an issue for performance tools that don’t expect the number of iostat or sar records to change often. The full instrumentation includes the wait queue for commands in the client ( biod wait) that have not yet been sent to the server; the active queue for commands currently in the server; and utilization (%busy) for the server mount point activity level. Note that unlike the case with disks, 100% busy does not indicate that the server itself is saturated, it just indicates that the client always has outstanding requests to that server. An NFS server is much more complex than a disk drive and can handle a lot more simultaneous requests than a single disk drive can. Figure 9-1 shows the new -xnP option, although NFS mounts appear in all formats. Note that the P option suppresses disks and shows only disk partitions. The xn option breaks down the response time, svc_t, into wait and active times and puts the expanded device name at the end of the line so that long names don’t mess up the columns. The vold entry is used to mount floppy and CD-ROM devices. Figure 9-1. Example iostat Output Showing NFS Mount Pointscrun% iostat -xnP New Network MetricsThe standard SNMP network management MIB for a network interface is supposed to contain IfInOctets and IfOutOctets counters that report the number of bytes input and output on the interface. These were not measured by network devices for Solaris 2, so the MIB always reported zero. Brian Wong and I filed bugs against all the different interfaces a few years ago, and bugs were filed more recently against the SNMP implementation. The result is that these counters have been added to the “ le” and “ hme” interfaces in Solaris 2.6, and the fix has been backported in patches for Solaris 2.5.1, as 103903-03 ( le) and 104212-04 ( hme). The new counters added were: The full set of data collected for each interface can be obtained as described in “The Solaris 2 “ kstat” Interface” on page 387. An SE script, called dumpkstats.se, prints out all of the available data, and an undocumented option, netstat -k, prints out the data. In Solaris 2.6, netstat -k takes an optional kstat name, as shown in Figure 9-2, so you don’t have to search through the reams of data to find what you want. Figure 9-2. Solaris 2.6 Example of netstat -k to See Network Interface Data in Detail
Virtual IP AddressesYou can configure more than one IP address on each interface, as shown in Figure 9-3. This is one way that a large machine can pretend to be many smaller machines consolidated together. It is also used in high-availability failover situations. In earlier releases, up to 256 addresses could be configured on each interface. Some large virtual web sites found this limiting, and now a new ndd tunable in Solaris 2.6 can be used to increase that limit. Up to about 8,000 addresses on a single interface have been tested. Some work was also done to speed up ifconfig of large numbers of interfaces. You configure a virtual IP address by using ifconfig on the interface, with the number separated by a colon. Solaris 2.6 also allows groups of interfaces to feed several ports on a network switch on the same network to get higher bandwidth. Figure 9-3. Configuring More Than 256 IP Addresses Per Interface
Network Interface TypesThere are many interface types in use on Sun systems. In this section, I discuss some of their distinguishing features. 10-Mbit SBus Interfaces — “ le” and “ qe”The “ le” interface is used on many SPARC desktop machines. The built-in Ethernet interface shares its direct memory access (DMA) connection to the SBus with the SCSI interface but has higher priority, so heavy Ethernet activity can reduce disk throughput. This can be a problem with the original DMA controller used in the SPARCstation 1, 1+, SLC, and IPC, but subsequent machines have enough DMA bandwidth to support both. The add-on SBus Ethernet card uses exactly the same interface as the built-in Ethernet but has an SBus DMA controller to itself. The more recent buffered Ethernet interfaces used in the SPARCserver 600, the SBE/S, the FSBE/S, and the DSBE/S have a 256-Kbyte buffer to provide a low-latency source and sink for the Ethernet. This buffer cuts down on dropped packets, especially when many Ethernets are configured in a system that also has multiple CPUs consuming the memory bandwidth. The disadvantage is increased CPU utilization as data is copied between the buffer and main memory. The most recent and efficient “ qe” Ethernet interface uses a buffer but has a DMA mechanism to transfer data between the buffer and memory. This interface is found in the SQEC/S qe quadruple 10-Mbit Ethernet SBus card and the 100-Mbit “ be” Ethernet interface SBus card. 100-Mbit Interfaces — “ be” and “ hme”The 100baseT standard takes the approach of requiring shorter and higher-quality, shielded, twisted pair cables, then running the normal Ethernet standard at ten times the speed. Performance is similar to FDDI, but with the Ethernet characteristic of collisions under heavy load. It is most useful to connect a server to a hub, which converts the 100baseT signal into many conventional 10baseT signals for the client workstations. FDDI InterfacesTwo FDDI interfaces have been produced by Sun, and several third-party PCIbus and SBus options are available as well. FDDI runs at 100 Mbits/s and so has ten times the bandwidth of standard Ethernet. The SBus FDDI/S 2.0 “ bf” interface is the original Sun SBus FDDI board and driver. It is a single-width SBus card that provides single-attach only. The SBus FDDI/S 3.0, 4.0, 5.0 “ nf” software supports a range of SBus FDDI cards, including both single- and dual-attach types. These are OEM products from Network Peripherals Inc. The nf_stat command provided in /opt/SUNWconn/SUNWnf may be useful for monitoring the interface. SBus ATM 155-Mbit Asynchronous Transfer Mode CardsThere are two versions of the SBus ATM 155-Mbit Asynchronous Transfer Mode card: one version uses a fiber interface, the other uses twisted pair cables like the 100baseT card. The ATM standard allows isochronous connections to be set up (so audio and video data can be piped at a constant rate), but the AAL5 standard used to carry IP protocol data makes it behave like a slightly faster FDDI or 100baseT interface for general-purpose use. You can connect systems back-to-back with just a pair of ATM cards and no switch if you only need a high-speed link between two systems. ATM configures a 9-Kbyte segment size for TCP, which is much more efficient than Ethernet’s 1.5-Kbyte segment. 622-Mbit ATM InterfaceThe 622-Mbit ATM interface is one of the few cards that comes close to saturating an SBus. Over 500 Mbits/s of TCP traffic have been measured on a dual CPU Ultra 2/2200. The PCIbus version has a few refinements and a higher bandwidth bus interface, so runs a little more efficiently. It was used for the SPECweb96 benchmark results when the Enterprise 450 server was announced. The four-CPU E450 needed two 622-Mbit ATM interfaces to deliver maximum web server throughput. See “SPECweb96 Performance Results” on page 83. Gigabit Ethernet Interfaces— vgeGigabit Ethernet is the latest development. With the initial release, a single interface cannot completely fill the network, but this will be improved over time. If a server is feeding multiple 100-Mbit switches, then a gigabit interface may be useful because all the packets are the same 1.5-Kbyte size. Overall, Gigabit Ethernet is less efficient than ATM and slower than ATM622 because of its small packet sizes and relative immaturity as a technology. If the ATM interface was going to be feeding many Ethernet networks, ATM’s large segment size would not be used, so Gigabit Ethernet may be a better choice for integrating into existing Ethernet networks. Using NFS EffectivelyThe NFS protocol itself limits throughput to about 3 Mbytes/s per active client-side process because it has limited prefetch and small block sizes. The NFS version 3 protocol allows larger block sizes and other changes that improve performance on high-speed networks. This limit doesn’t apply to the aggregate throughput if you have many active client processes on a machine. First, some references:
How Many NFS Server Threads?In SunOS 4, the NFS daemon nfsd services requests from the network, and a number of nfsd daemons are started so that a number of outstanding requests can be processed in parallel. Each nfsd takes one request off the network and passes it to the I/O subsystem. To cope with bursts of NFS traffic, you should configure a large number of nfsds, even on low-end machines. All the nfsds run in the kernel and do not context switch in the same way as user-level processes do, so the number of hardware contexts is not a limiting factor (despite folklore to the contrary!). If you want to “throttle back” the NFS load on a server so that it can do other things, you can reduce the number. If you configure too many nfsds, some may not be used, but it is unlikely that there will be any adverse side effects as long as you don’t run out of process table entries. Take the highest number you get by applying the following three rules:
What Is a Typical NFS Operation Mix?There are characteristic NFS operation mixes for each environment. The SPECsfs mix is based on the load generated by slow diskless workstations with a small amount of memory that are doing intensive software development. It has a large proportion of writes compared to the typical load mix from a modern workstation. If workstations are using the cachefs option, then many reads will be avoided, so the total load is less, but the percentage of writes is more like the SPECsfs mix. Table 9-1 summarizes the information. The nfsstat CommandThe nfsstat -s command shows operation counts for the components of the NFS mix. This section is based upon the Solaris 2.4 SMCC NFS Server Performance and Tuning Guide. Figure 9-4 illustrates the results of an nfsstat -s command. Figure 9-4. NFS Server Operation Counts
The meaning and interpretation of the measurements are as follows:
NFS ClientsOn each client machine, use nfsstat -c to see the mix, as shown in Figure 9-5; for Solaris 2.6 or later clients, use iostat -xnP to see the response times. Figure 9-5. NFS Client Operation Counts (Solaris 2.4 Version)
You can also view each UDP-based mount point by using the nfsstat -m command on a client, as shown in Figure 9-6. TCP-based NFS mounts do not use these timers. Figure 9-6. NFS Operation Response Times Measured by Client
This output shows the smoothed round-trip times (srtt), the deviation or variability of this measure (dev), and the current time-out level for retransmission (cur). Values are converted into milliseconds and are quoted separately for read, write, lookup, and all types of calls. The system will seem slow if any of the round trip times exceeds 50 ms. If you find a problem, watch the iostat -x measures on the server for the disks that export the slow file system, as described in “How iostat Uses the Underlying Disk Measurements” on page 194. If the write operations are much slower than the other operations, you may need a Prestoserve, assuming that writes are an important part of your mix. NFS Server Not RespondingIf you see the “not responding” message on clients and the server has been running without any coincident downtime, then you have a serious problem. Either the network connections or the network routing is having problems, or the NFS server is completely overloaded. The netstat CommandSeveral options to the netstat command show various parts of the TCP/IP protocol parameters and counters. The most useful options are the basic netstat command, which monitors a single interface, and the netstat -i command, which summarizes all the interfaces. Figure 9-7 shows an output from the iostat -i command. Figure 9-7. netstat -i Output Showing Multiple Network Interfaces
From a single measurement, you can calculate the collision rate since boot time; from noting the difference in the packet and collision counts over time, you can calculate the ongoing collision rates as Collis * 100 / Opkts for each device. In this case, lo0 is the internal loopback device, bf0 is an FDDI so has no collisions, le1 has a 2.6 percent collision rate, le2 has 1.5 percent, and le3 has 1.2 percent. For more useful network performance summaries, see the network commands of the SE toolkit, as described starting with “net.se” on page 486. |
|