|  SAN 
              Building Blocks
 W. Curtis Preston 
              Before I get into this month's topic, I'll review what 
              I've covered so far. In the first article of this series, I 
              discussed the reasons that SANs exist. Because this column is dedicated 
              mainly to backup and recovery, I covered the ways that SANs make 
              backup and recovery easier. The second and third articles in this 
              series explained the basics of Fibre Channel, starting with Fibre 
              Channel's advantages over parallel SCSI. 
              Although I did not use the term parallel SCSI in previous articles, 
              I'd like to introduce it now. Since SCSI refers to both the 
              physical medium and the protocol, we need a term that refers to 
              "traditional" SCSI. In traditional (i.e., bus-attached) 
              SCSI, SCSI data travels over several conductors in parallel. (SCSI 
              cables range from 50 to 80 conductors in a single cable.) Therefore, 
              we are referring to traditional SCSI as parallel SCSI. 
              In contrast, you will remember that Fibre Channel has only two 
              conductors, one for transmitting and one for receiving. If SCSI 
              traffic was meant to travel across several conductors in parallel, 
              how does Fibre Channel accept SCSI data? The answer is the new SCSI-3 
              specification for the SCSI architecture, which is very different 
              from its predecessors. One of the main differences is that the SCSI-1 
              and SCSI-2 specifications were laid out in a single document each, 
              where the SCSI-3 specification consists of more than 30 documents 
              describing a multi-layered architecture. 
              This allowed the specification of other layers, as long as each 
              new layer followed the communication specifications of the existing 
              layers above and below it. The Fibre Channel specifications (FC 
              and FC-2) were then added to the SCSI-3 specification. (Read more 
              about this at http://www.t10.org. T10 is the ANSI committee 
              that develops the SCSI standard.) Since Fibre Channel SCSI traffic 
              travels across only two wires (transmit and receive), we could refer 
              to it as serial SCSI. (To avoid confusion, I will only use this 
              term when necessary, and only when comparing Fibre Channel to parallel 
              SCSI.) Fibre Channel solves a number of problems with parallel SCSI, 
              such as: 
              
              Address limitations -- Parallel SCSI is limited to 
              15 devices per HBA. Fibre Channel can address 16 million devices 
              per HBA in a switched fabric configuration. 
              Logistical limitations -- You cannot easily share storage 
              resources via parallel SCSI. Although you can connect hosts to a 
              single SCSI bus, it can be quite complicated to do so, and many 
              operating systems may not support this. Fibre Channel, on the other 
              hand, can connect hundreds or thousands of hosts to the same bus, 
              allowing all of them to share the same storage resource. Fibre Channel 
              devices can also be located up to 10 km apart, whereas parallel 
              SCSI is limited to only 25 meters. 
              Speed limitations -- With 640 MB/s parallel SCSI in 
              progress, this is less true than before. However, faster Fibre Channel 
              speeds are in progress as well, and Fibre Channel can also be aggregated, 
              whereas parallel SCSI cannot. In other words, you can "trunk" 
              several 100-MB connections together, giving much more throughput 
              than is possible with one connector. 
              Backup/restore limitations -- Being able to backup 
              large amounts of data without using the LAN is one of the main advantages 
              to Fibre Channel and SANs. This is because of what can be done when 
              a single device is accessible by more than one computer, which is 
              really only possible with Fibre Channel. 
              
              I also covered the three different topologies of Fibre Channel 
              (point-to-point, arbitrated loop, and switched fabric), and explained 
              that switched fabric is the most expensive topology, but also the 
              fastest. It could also be said that switched fabric is becoming 
              more competitive with arbitrated loop in the cost area. Therefore, 
              one should seriously consider installing a new SAN using switched 
              fabric, if at all possible. 
              The Building Blocks 
              Now that I've reviewed why we are here, and how the different 
              SAN elements communicate, I will discuss the elements of a SAN, 
              and how they work together. The main elements of a SAN are servers, 
              HBAs, switches, hubs, routers, disk systems, tape systems, cabling, 
              and software. These are all illustrated in Figure 1. 
              Servers 
              No storage area network would have any reason for being if there 
              weren't servers connected to it. The servers will use the SAN 
              to share storage resources. 
              Host Bus Adapters (HBAs) 
              Servers connect to the SAN via their Host Bus Adapter, or HBA. 
              This is often referred to as a "Fibre Channel card," or 
              "Fibre Channel NIC." It is simply the SAN equivalent of 
              a SCSI card (i.e., a SCSI HBA). Some HBAs may use fiber, and some 
              HBAs may use copper. Regardless of the physical layer, the HBA is 
              what is used to connect the servers to the SAN. 
              In Figure 1, you will see that two of the servers actually have 
              two connections to the SAN via two HBAs. Although this configuration 
              is used quite a bit, I should explain that just because you are 
              using Fibre Channel and multiple connectors does not mean that you 
              will have redundant paths to a given device. This is true for a 
              lot of reasons. Notice in Figure 1 that, since the storage resources 
              are not connected to multiple switches, each path coming out of 
              each server can only access one device. Another reason that you 
              may not have redundant paths is because of the limitations of the 
              drivers. Please realize that Fibre Channel disks are simply running 
              the SCSI-3 protocol that has been adapted to work in a serial architecture 
              (see above). Since SCSI was historically written to understand one 
              device that plugs into one bus, it is understandable that SCSI-3 
              does not understand the concept of a storage device that could appear 
              on more than one HBA. Therefore, if you would like to have redundant 
              paths, you will also need some sort of redundant pathing software. 
              Its job is to stand between the kernel and the SAN, so that requests 
              for storage resources are monitored, and directed to the appropriate 
              path. (These software products are covered again later in this article.) 
              Note that there are two main types of lasers in today's HBAs: 
              OFC and Non-OFC. OFC (optical fiber control) devices use a hand-shaking 
              method to ensure that they do not transmit a laser pulse if there 
              is nothing connecting to the HBA. (This is for safety reasons, since 
              a high-powered laser can cause permanent damage to your eyesight.) 
              Non-OFC devices employ no such handshaking, and will transmit even 
              if a device is not connected. Believe it or not, Non-OFC devices 
              are actually quite common, due to the cost associated in making 
              an OFC device. Therefore, please do not look directly into an HBA. 
              You may regret it! 
              Switches 
              Figure 1 shows two servers connected to two switches. Remember 
              that when you are connecting to a switch, you are using the switched 
              fabric topology -- not the arbitrated loop topology. (That is 
              unleess the device that we are connecting to the switch does not 
              support fabric login. If the switch supports arbitrated loop, it 
              will create a private arbitrated loop on the port to which you connect 
              that device.) 
              Switches are "intelligent," and have many possible configurations. 
              Using software provided by the switch vendor, you could create zones 
              that allow only certain servers to see certain resources. This configuration 
              is usually done via a serial or Ethernet interface. 
              This brings up an interesting and important topic -- security. 
              One major difference between parallel SCSI and Fibre Channel is 
              that most Fibre Channel devices have an RJ-45 port, allowing you 
              to connect your SAN devices to a LAN. This setup allows for much 
              easier configuration than what is possible through the serial interface. 
              It also allows your SAN devices to be monitored via SNMP-capable 
              monitoring software. However, it also opens a major security hole. 
              If you simply connect the RJ-45 port of your SAN devices directly 
              to your corporate LAN (or even worse, the Internet), then you have 
              a new way that "black hats" can take down your enterprise. 
              I suggest placing all LAN connections for SAN devices on a separate, 
              well-protected, LAN. To do otherwise is to invite disaster. Also, 
              remember to change the default administrator passwords on these 
              devices! 
              Another interesting security ramification of SANs is the configuration 
              software that runs on servers connected to the SANs. Depending on 
              the product and its capabilities, a black hat breaking into the 
              wrong box can also wreak havoc on your SAN. Ask your configuration 
              software vendor how you can protect yourself from such a disaster. 
              The vendor will probably tell you to limit the number of boxes that 
              run the configuration software and to isolate them on a separate 
              LAN and secure them as much as possible. It's tempting to put 
              your management and configuration software in multiple places, because 
              it makes management of the SAN much easier; however, think about 
              the security implications before doing so! 
              Hubs 
              Hubs only understand the arbitrated loop topology. When you connect 
              a device to a hub, it will cause the arbitrated loop that the hub 
              is managing to re-initialize. The device will be assigned an AL_PA 
              (arbitrated loop physical address), and it will begin arbitration 
              when it needs to communicate with another device on the loop. 
              There are managed (i.e., "smart") hubs and unmanaged 
              (i.e., "dumb") hubs. An unmanaged hub is unable to close 
              the loop when a device on the loop is malfunctioning; therefore, 
              a single bad device can disable the entire loop. A managed hub could 
              detect the bad device and remove it from the loop, allowing the 
              rest of the loop to function normally. Although there are plenty 
              of unmanaged hubs available, the cost difference between managed 
              and unmanaged hubs is minimal, and the functionality difference 
              between them is quite great. When considering a new SAN, one should 
              also consider whether a hub is even appropriate. The cost difference 
              between hubs and switches gets smaller and smaller every day, and 
              the functionality difference is even greater than the difference 
              between managed and unmanaged hubs. Since arbitrated loop is cheaper 
              than fabric, I have seen a number of sites build SANs based on hubs 
              and arbitrated loop -- only to rip out the hubs and replace 
              them with switches a year or two later. Consider purchasing a switch 
              if at all possible. 
              Routers and Bridges 
              There are two different types of routers. The first is one that 
              is sometimes referred to as a bridge (1) and is what is depicted 
              in Figure 1. This type of router converts the serial data stream 
              into a parallel data stream, and vice versa. It allows you to plug 
              parallel SCSI devices, such as tape and optical drives, into your 
              SAN. Once you have done so, you can share them just as you would 
              share a device that speaks serial SCSI natively. That is why, in 
              Figure 1, you see a tape library connected to the SAN via a router. 
              The second type of router (not pictured) goes between the HBAs 
              and the switches. This type of router can actually route traffic 
              based on load, and finds alternate paths when necessary. This is 
              a relatively new type of router. 
              Disk Systems 
              As shown in Figure 1, disk systems come in many shapes and sizes. 
              While many people think it is necessary to buy a high-end disk array 
              to enter the SAN space, there are two other types of disk systems 
              on the SAN in Figure 1. The first is a "disk array," sometimes 
              referred to as "RAID in a box." These types of arrays 
              can typically be configured as a RAID 0+1, RAID 1+0, or RAID 5 array, 
              and can present the disks to the SAN in a number of ways. They will 
              often automatically pick a hot spare and perform other tasks that 
              JBOD just can't do. 
              The second main type of disk system is JBOD, which stands for 
              Just a Bunch Of Disks. These disks would either be parallel SCSI 
              disks plugged into a SAN router, or Fibre Channel disks plugged 
              directly into the switch. You can also plug several JBOD disks into 
              a hub, and then plug the hub into the switch. This is a more cost-effective 
              way to plug several smaller disks into the SAN. However, as discussed 
              above, you should perform a cost-benefit analysis when deciding 
              whether to just plug the disks into the switch, or to plug them 
              into a hub that gets plugged into the switch. 
              The final type of disk system is the high-end disk array. These 
              typically offer significant advantages over JBOD or "RAID in 
              a box" systems, but they do cost quite a bit more than the 
              other systems. Features that may be available in such systems are: 
              
              
             
               Creation of additional mirrors that can be split for backup 
                purposes 
               Proactive monitoring and notification of failed (or failing) 
                components 
               Multiple server connections (32, 64, or more servers connected 
                to a single array) 
               Internal zoning capabilities 
               Multi-pathing and failover software 
              Although some of these features may be available in the "RAID 
              in a box" products, a high-end array will probably offer all 
              of them in one box. 
              Cabling 
              Although cabling is often overlooked in discussions about SAN 
              architecture, it's obviously a very important part of the system. 
              These cables are typically fiber optic cables with SC connectors. 
              (This is the same type of cables used for Gigabit Ethernet.) As 
              discussed in previous articles, there are also DB9-style connectors, 
              which are less expensive, and may be more appropriate for some environments. 
              Please remember that fiber optic cables are very fragile, and should 
              be treated as such. I have heard this described more than once as 
              an advantage over SCSI. Fiber optic cables either work, or they 
              don't. Either no data gets through, or all the data gets through. 
              In contrast, a SCSI cable may work fine under some conditions, but 
              not others. 
              Software 
              There are many products in this category, and this is one of the 
              fastest growing areas of SAN products. Among other things, these 
              products offer the following features: 
              Protocol Conversion 
              Suppose you'd like to address SSA disks (2) and Fibre Channel 
              disks from a single host. A product offering protocol conversion 
              could make this happen. 
              Zoning 
              Zoning is a very important aspect of SANs. Without zoning, every 
              host connected to the SAN can read and write to every disk in the 
              SAN. By separating the servers and disks into zones, you solve this 
              problem. 
              Device/Path Failover 
              Suppose you do have multiple Fibre Channel paths to the same device. 
              By default, Fibre Channel will not use one of those links as a failover 
              link if the other one fails. Software can make this happen. 
              Load Balancing 
              This is very similar to the failover feature. If you have multiple 
              paths to a single storage resource, wouldn't it be nice to 
              distribute the load between those paths? Often this is combined 
              with the failover feature, where traffic will be load balanced during 
              normal operations, but will failover in case of device failure. 
              There's So Much More 
              There is so much more to cover in an article about SAN building 
              blocks. For one thing, you will notice that I have hardly mentioned 
              any vendors' names. The reason for that is that the SAN industry 
              is moving very fast right now. Given that this article is actually 
              written several months before you see it, it would be out of date 
              before it gets to you. Therefore, please go to: http://www.backupcentral.com/ 
              \hardware-san.html for an updated list of SAN vendors, separated 
              by which element(s) of the SAN that they can provide. (Please let 
              me know if I am missing anyone!)
  In the coming months, I will explore what you can do with a SAN, 
              including backup and recovery, storage consolidation, and high-availability 
              applications. I'll see you soon! 
              1 In my opinion, router is a more appropriate name. A bridge communicates 
              on Layer 2 and a router on Layer 3. When mapped to the OSI model, 
              the Fibre Channel specification puts SCSI at Layer 3. The vendor 
              that owns the lion's share of the market (Crossroads) calls 
              them routers. 
              2 Serial Storage Architecture. This is a competing architecture 
              to Fibre Channel, but it has been around for a while and has not 
              gained much acceptance. That is not to say that there aren't 
              SSA devices out there, though! 
              W. Curtis Preston is a principal consultant at Collective Technologies 
              (http://www.colltech.com), and has specialized in designing 
              and implementing enterprise backup systems for Fortune 500 companies 
              for more than 7 years. Portions of this article are excerpted from 
              his O'Reilly book UNIX Backup & Recovery. Curtis 
              may be reached at: curtis@backupcentral.com. 
           |