File System Primer
[edit]
IntroductionLinux offers a number of file systems. This paper discusses these file systems, why there are so many, and which ones are the best to use for which workloads and data. Not all data is the same. Not all workloads are the same. Not all filesystems are the same. Matching the file system to the data and workload allows customers to build efficient scalable and cost effective solutions. The next section of this document describes four general workload areas. It is important to understand these different workloads and their requirements, as these drive requirements into file systems. This will also serve as a guide in comparing and contrasting the various file systems available in the market today. [edit]
Workloads of Differing NeedsIT organizations typically divide workloads into four areas:
It is important to understand the difference between File Systems and File Access Protocols. Both apply to the general concept of "File Systems", but for the purposes of this document, the distinction is as such: File Systems: Control the organization of data on storage media. File System software can be viewed as a filing cabinet which provides a structured container into which data is organized and stored. File Systems do NOT include File Access Protocols. File Access Protocols: Control the semantics of allowing remote network access to data stored in file systems. File Access Protocols typically have dependencies on File System features (there is a match between File System Semantics and File Protocol Semantics.) It is extremely important to understand the priority of needs between each of these general workloads, as this drives the requirements for High Availability, File systems, File Access, and Volume Management Storage throughout the IT organization. HA File system, File Access and Storage requirements per workload: [edit]
Business IT
[edit]
HPC (High Performance Computational Clusters)
[edit]
Workgroup
[edit]
Desktop
[edit]
Linux File Systems. Why so many?There are three main reasons why there are so many File Systems on Linux:
Open source means anyone can contribute their value, and they have. This has made available about 20 different file systems for Linux. Ranging from very rudimentary simple file systems to extremely complex and rich file systems. As storage needs have grown, there has been the need for increasing scalability in file systems. This second reason for so many has led to file systems which claim to run faster, handle more files, scale to larger volumes, and can handle more concurrent access to data. Lastly, as mainframe and mini computer systems have given way to less expensive Intel Architecture based commodity PC servers running Linux as well as moving from non-Linux PC operating systems to Linux, the need to preserve access to existing data that was stored on those other systems has resulted in additional file systems which understand that data and storage. [edit]
File System ComparisonThe following list describes the Linux file system characteristics and indicates when this file system is best used. This list is not exhaustive of all the file systems available in the world, but focuses on those which have appreciable market share or attention in the market today. A detailed comparison of file system features can be found at: http://en./wiki/Comparison_of_file_systems and Linux Data Management and High Availability Features EXT2
EXT2 file system is the predecessor to the EXT3 file system. EXT2 is not journaled, and hence is not recommended any longer (customers should move to EXT3). EXT3
EXT3 file system is a journaled file system that has the greatest use in Linux today. It is the "Linux" File system. It is quite robust and quick, although it does not scale well to large volumes nor a great number of files. Recently a scalability feature was added called htrees, which significantly improved EXT3's scalability. However it is still not as scalable as some of the other file systems listed even with htrees. It scales similar to NTFS with htrees. Without htrees, EXT3 does not handle more than about 5,000 files in a directory. FAT32
FAT32 is the crudest of the file systems listed. It's popularity is with its widespread use and popularity in the Windows desktop world and that it has made its way into being the file system in flash RAM devices (digital cameras, USB memory sticks, etc.). It has no built in security access control, so is small and works well in these portable and embedded applications. It scales the least of the file systems listed. Most systems have FAT32 compatibility support due to its ubiquity. GFS
The RedHat Global File System (Sistina acquisition) was open sourced in mid 2004. It is a parallel cluster file system (symmetrical) which allows multiple machines to access common data on a SAN (Storage Area Network). This is important for allowing multiple machines access to the same data to ease management (such as common configuration files between multiple webservers). It also allows applications and services which are written to direct disk access to be scaled out to multiple nodes. The practical limit is 16 machines in a SAN cluster, however. GPFS
The IBM Global Parallel File System is from IBM. It, like GFS, is a parallel cluster file system with similar characteristics to GFS. Video editing is the sweet spot for GPFS. GPFS supports from 2 to thousands of nodes in a single cluster. GPFS also includes very rich management features, such as Hierarchical Storage Management. JFS
The IBM Journaled File System is the file system used by IBM in AIX and OS/2. It is a feature rich file system ported to Linux to allow for ease of migration of existing data. It has been shown to provide excellent overall performance across a variety of workloads. NSS
The Novell Storage Services file system used in NetWare 5.0 and above, and most recently open sourced and included in Novell SUSE's SLES 9 SP1 Linux distribution and later (used in Novell's Open Enterprise Server Linux product). The NSS file system is unique in many ways, mostly in its ability to manage and support shared file services from simultaneous different file access protocols. It is designed to manage access control (using a unique model, called the Trustee Model, that scales to hundreds of thousands of different users accessing the same storage securely) in enterprise file sharing environments. It and its predecessor (NWFS) are the only file systems that can restrict the visibility of the directory tree based on UserID accessing the file system. It and NWFS have built-in ACL rights inheritance. It includes mature and robust features tailored for the file sharing environment of the largest enterprises. The file system also scales to millions of files in a single directory. NSS supports multiple data streams and rich metadata (its features are a superset of existing filesystems on the market for data stream, metadata, namespace, and attribute support). NTFS
The Microsoft Windows file system for the Windows NT kernel (Windows NT, Windows 2000, Windows XP, and Windows 2003). The Linux OpenSource version of this filesystem is only capable of read-only of existing NTFS data. This allows for migration from Windows and access to Windows disks. NTFS includes an ACL model which is not POSIX. The NTFS ACL model is unique to Microsoft, but is a derivative of the Novell NetWare 2.x ACL model. NTFS is the default (and virtually only option) on Windows servers. It includes rich metadata and attribute features. NTFS also supports multiple data streams and ACL rights inheritance since its Windows 2000 implementation. In Windows 2003 R2, Microsoft included a feature called "Access Based Enumeration". This is similar to visibility in NSS and NWFS, but is not implemented in the file system layer, but rather as a feature of the CIFS protocol engine in Windows 2003 R2, so this feature is only available when accessing Windows 2003 via the CIFS protocol. See CIFS below. NWFS
The NetWare [traditional] File System is used in NetWare 3.x through 5.x as the default file system, and is supported in NetWare 6.x for compatibility. It is one of the fastest file systems on the planet, however it does not scale, nor is it journaled. An Open Source version of this file system is available on Linux to allow access to its file data. However, the OSS version lacks the identity management tie-ins so it has found little utility. Customers of NWFS are encouraged to upgrade to NSS. OCFS2
The Oracle Cluster File System v2 is a symmetrical parallel cluster file system specifically designed to support the Oracle Real Application Clusters (RAC) Database. While it supports general file access, it does not scale in number of files (like EXT3 without htrees). It is the first (and so far only) symmetrical parallel cluster file system to be accepted into the Linux Mainline Kernel (January 2006). PolyServe Matrix Server
Matrix Server is a symmetrical parallel cluster file system for Linux (and Polyserve has a version for Windows servers as well). Rooted in technology from Sequent Computers, Matrix server is the premier parallel cluster file system on Linux today. It boasts order of magnitude performance over competing cluster parallel filesystems (GFS, GPFS, OCFS2 etc.). It should be used when parallel cluster file system scaling is needed. ReiserFS
The Reiser File System is the default file system in SUSE Linux distributions. Reiser FS was designed to remove the scalability and performance limitations that exist in EXT2 and EXT3 file systems. It scales and performs extremely well on Linux, outscaling EXT3 with htrees. In addition, Reiser was designed to very efficiently use disk space. As a result, it is the best file system on Linux where there are a great number of small files in the file system. As collaboration (email) and many web serving applications have lots of small files, Reiser is best suited for these types of workloads. VxFS
The Veritas File System is closed source. The Veritas full storage suite is essentially the Veritas File system that is popular on Unix (including Solaris). Approximately 70% of Unix deployments in the world are ontop of the Veritas File System. As a result, this file system is one of the best to be used when data is to be directly migrated from Unix to Linux, and when training in volume and filesystem management is to be preserved within the IT staff. The Vertias File System has excellent scalability characteristics, just like it has on Unix systems. Veritas has recently ported their cluster version of VxFS to Linux. Their cluster parallel filesystem (cVxFS) is an asymmetric model, where one node is the master, and all other nodes are effectively read-only slaves (they can write through the master node). XFS
The XFS file system is Open Source and included in major Linux distributions. It originated from SGI (Irix) and was designed specifically for large files and large volume scalability. Video and multi-media files are best handled by this file system. Scaling to petabyte volumes, it also handles great deals of data. It is one of the few filesystems on Linux which supports Data Migration (SGI contributed the Hierarchical Storage Management interfaces into the Linux Kernel a number of years ago). SGI also offers a closed source cluster parallel version of XFS called cXFS which like cVxFS is an asymmetrical model. It has the unique feature, however, that it's slave nodes can run on Unix, Linux and Windows, making it a cross platform file system. Its master node must run on SGI hardware. [edit]
File Access ProtocolsThere are fewer file access protocols than file systems, and their capabilities vary more widely than file systems do. For the purposes of this discussion, only the popular file access protocols in production in the market will be discussed. AFP
The Apple Filing Protocol. Specifically designed and developed by Apple for the Macintosh Networking (originally AppleTalk over phone wire hardware, now TCP/IP, since 1997, over any hardware medium that supports TCP/IP). This protocol is the best for supporting Apple's MacOS desktop machines in a network. The specification for this protocol is openly available from Apple. The NetAtalk modules in Linux implement the AFP protocol (and still implement the AppleTalk transport even though Apple has end of lifed the AppleTalk transport in favor of TCP/IP). The AFPD module in the NetAtalk package can use either TCP/IP or AppleTalk as a transport. CIFS
The term CIFS was coined by Microsoft meaning "Common Internet File Services" when Microsoft first introduced the workstation peer to peer file sharing protocol verbs to the open community. Subsequent protocol verbs have been held proprietary and include increased richness and management. CIFS (as implemented in Windows 2003) not only includes File Access verbs, but a whole suite of management verbs and other protocols that are used by Windows servers and client desktops. The CIFS protocol originally operated over NetBEUI network protocol, and tunneling through TCP/IP was added in the early 1990s. In 2000, Microsoft introduced native TCP/IP support for CIFS. Microsoft recently introduced an option into Release 2 or Windows Server 2003 called "Access Based Enumeration". When enabled, this feature will restrict sub-directory visibility to users. That way, users can only see the subdirectories to which they have rights to see, and others are out of sight and not seen. This increases security. This feature is enabled per network Share on the Windows 2003 server. The client desktop full protocol suite specifications are available for a royalty license from Microsoft (the MCPP). For Linux, the Samba team has developed an OSS version of CIFS based on reverse engineering of the wire protocol of Microsoft Windows machines. FTP
File Transfer Protocol is one of the most common and widely used simple protocols in the internet today. Virtually all platforms and devices support FTP to some level. FTP is a very simple protocol allowing for uploading and downloading of files. There's no richness for sharing (locking, coordination, contention, etc.) in the protocol. FTP is used broadly for transferring files. The specification is all openly available via the IETF. HTTP
Hyper Text Transfer Protocol is the dominate protocol on the World Wide Web today, and is the one spoken by web browser clients and web servers. It too is like FTP in that it is not rich, and is designed strictly for transfers of HTML (Hyper Text Markup Language). It also transports additional Markup Languages that have been invented, such as XML (eXtensible Markup Language). The specifications are all openly available via the IETF. Lustre
Lustre is a unique distributed client server protocol. It specifically breaks the functions of a file system up at the protocol layer in order to gain huge scalability for great numbers and very large files (like seismic data for petroleum exploration). Lustre is specifically tied to the Linux EXT3 file system for disk storage, but it effectively builds a very large virtual file system out of many nodes in the cluster. Some nodes are dedicated to holding metadata, others are dedicated to holding specific parts of the greater virtual file system. This is required by HPC clusters in order to allow performant access by thousands of compute nodes to up to petabytes of data simultaneously. Lustre is the dominant file system used in HPC clusters today. Cluster File Systems Inc. builds and maintains Lustre. Previously, they would only opensource the older version and keep the current version closed source, Cluster File Systems Inc. is changing this approach, looking to put the most recent into the Open Source and hope to have it accepted into the Linux Mainline Kernel soon. NCP
The Novell Core Protocol is the client server protocol developed by Novell for supporting DOS, Windows, OS/2, Macintosh, Unix (UnixWare), and Linux for shared file services over Novell's history. It is a very rich file protocol as it supports the semantics of all of these native operating systems. Novell has reduced the active support to Windows and Linux desktops with the NetWare client, as well as to the Xtier server for middle tier file access in the new decade. Originally supported only over the IPX network protocol, in 1993 Novell tunneled NCP over IPX through TCP/IP. In 1998 Novell added native support for TCP/IP protocol. Novell has adding NCP support to Linux desktops in order to allow the new Novell Linux Desktop to interoperate with installed base of NetWare servers, and to expose unique capabilities of NetWare to Linux desktops. As part of Open Enterprise Server, Novell is also supporting NCP on Linux servers to allow desktops running the Novell client to access data running on Linux. The NCP Server on Linux includes emulation for the Trustee rights model and inheritance plus visibility when run over traditional POSIX file systems (such as EXT3, Reiser, etc.). When run over NSS on Linux, these capabilities are synchronized with the NSS file system. Visibility in this mode is implemented much like how Microsoft's Windows 2003 R2 "Access Based Emumeration" is implemented: in the file access protocol and not the file system. The specification for this protocol is openly available from Novell. NFS v3
Network File System version 3 was introduced as a standard via the IETF by Sun Microsystems in the mid 1990s. NFS v3, unlike the other file access protocols, is an exported file system. This means that access and security are enforced at the NFS client, and not the NFS server. As a result, NFS is easily hacked if not on a dedicated secure network. NFS v3 is a stateless protocol like HTTP and FTP, so suffers performance since it must assert current state with each operation (for example, it does not define Open and Close file, only Read and Write). File locking was added with sideband protocols, but is only advisory in nature (not hard enforced, meaning it can be hacked on a network). NFS has found its niche as the distributed exported file system protocol used inside the confines of a physical data center hooking application servers and databases to storage. It has also seen use in Unix and Linux based smaller workgroups where security between users is not an issue. Various RFCs in the IETF define NFS. Therefore, its specifications are freely available via the IETF. NFS v4
In order to address the security issues of NFS v3, as well as define a network protocol specification that can handle future needs, the NFS v4 specification was proposed to the IETF. The effort was lead by Sun and Network Appliance, with other vendors joining in. The specification was approved in late 2003, and then issues discovered during initial implementations resulted in updated RFCs bringing the specification effectively to v4.1. NFS v4 defines extensible and rich set of file access verbs. The protocol is a shared file protocol, unlike NFS v3, so it is secure. It also specifies advanced features for Remote Direct Memory Access, Delegations (equivalent to opportunistic locking), extensible rich metadata, and access naming. NFS v4 is currently a work in development, as it is very new in the industry, but holds great promise. 2006 will see the first commercial Linux offerings of NFS v4. NFS v4 requires Kerberos v5 authentication, but will also support other authentication methods supported under GSSAPI RFCs. Authentication of some form is mandatory, as security and access control are enforced at the Server for NFS v4. In summary, NFS v4 is the next key file access protocol based on industry standards to come. [edit]
Workload File System RecommendationsIn reading this document, it should become apparent that there does not exist an overall general purpose file system and file access protocol. Picking the right file system for the data and applications creating/accessing that data is what is important. This section lays out some guildelines for picking and building the right file system for a given workload. [edit]
CollaborationGroupWise, Notes, Exchange and other email/collaboration solutions typically deal with lots of little files. Since only the application process is accessing the file system, the added overhead of rich ACL and file attributes found in NSS or NTFS is redundant. The characteristics needed are a file system whose performance remains relatively constant regardless of the number of files that are in the volume, and that performs well with small files. Best bets would be ReiserFS, XFS, NSS and VxFS. File systems to stay away from for large systems (where you'd have more than 10,000 files in the system) would be EXT2/3, NWFS, FAT32. If you are on a Windows system, you are pretty much stuck with NTFS. NTFS scales better than EXT2/3 NWFS, and FAT32, but not as well as recommended list, so it works well with medium sized systems. [edit]
DatabaseMySQL, Oracle, SQL, Progress, etc typically deal with a very few, very large files which are left open most all of the time. The best file systems for Databases are those which know how to "get out of the way". Virtually any file system with Direct IO capabilities (APIs that allow the database to directly manipulate the file buffers) can be used. Since Databases do not create many files, file systems which do not scale to many files, but still have Direct IO interfaces will work fine. Essentially, you would want to stay away from FAT32 is all (plus those that are discontinued support). Since Databases don't need the added access control features, NSS and NTFS don't have any inherent added benefits for them. VxFS, Reiser, EXT3, and XFS all are recommended file systems for Databases (Your Database Vendor may specify a file system they have tested with. If so, go with that one since they will know how to support it). MS SQL server is again stuck to NTFS (NTFS does have Direct IO capabilities that MS SQL server leverages). [edit]
Web ServicesWeb services can encompass a broad set of workloads. For simple web services, one can use virtually any file system. Since these typically don't need rich access control file systems, you can avoid the extra overhead of NTFS or NSS to squeeze out a few more percentage points in performance. However, if the web services solution leverages identity and requires user security one from another for many people (more than 50 accounts), then the management advantages for access control and security begin to out-weigh the small system performance gains, and NSS or NTFS begin to be better choices. Even complex web services solutions typically do not require the file system scalability that Collaboration applications require (unless it is a web services based collaboration package). Online merchandising sites typically utilize a relational database as the datastore, and in those cases, you would choose a file system to support your database. [edit]
File Serving (NAS)Generally there are two types of NAS use cases: Serving files to application servers in a tiered service oriented architecture (SOA), and serving files to end users desktops and workstations. The former has minimal access control requirements. The latter has quite heavy access control requirements. Typically for serving files to application servers (traditional NAS), one would choose a file system that is scalable and fast. Reiser, XFS, VxFS come to mind for NFS file serving. For file serving to end user workstations, the access control and security management capabilities of NSS and NTFS file systems with CIFS and NCP file access protocols begin to become important. NSS's model does better than NTFS for very large numbers of users. These two file systems allow for security between users and at the same time allow for very fine granular sharing between given users and groups. NSS includes a visibility feature implemented in the file system which prevents unauthorized users from even seeing subdirectory structures they don't have rights to. CIFS in Windows 2003 R2 includes a similar visibility feature called "Access Based Enumeration", however, it is implemented in the file access protocol, not the NTFS file system, so is only available when access the file system via CIFS (which are traditional Microsoft network Shares). [edit]
Parallel Cluster File SystemsParallel Cluster File systems are relatively new in the market and offer the ability to scale out an application or service (increasing throughput). HOWEVER, it must be well understood that not all applications or services can take advantage of parallel cluster file systems for scale out. Applications/services which have been properly designed can be run simultaneously on 2 or more nodes accessing the same data in a parallel cluster file system. These are cluster parallel enabled. Others which are not parallel cluster enabled can only run on one node at a time in the cluster, even though their data is accessible by all nodes simultaneously. If they attempt to run on more than one node simultaneously, crashing or data corruption may occur. Your application or service vendor should know if they support this or not. To assist in determining if an application is parallel cluster enabled, the following points are helpful:
|
|
来自: fort > 《Linux File System》