With the titanic growth of the Internet and the exponential deployment of corporate intranets, IP "appliances" are becoming common, useful, and sometimes necessary parts of IP networks. IP load balancers are a part of these appliances and they bring a much needed ease of mind for the over-worked network administrator. IP load balancers bring two major advantages to a multi-server environment. The first is scalability. As web sites and server farms field more and more clients, the constant upgrading of server hardware becomes not only tedious, but also not economically sound. With the use of an IP load balancer, a cluster of identical servers can be built to seem like one super-powerful server, managed by the load balancer. The load balancer acts as a front end machine and intelligently directs clients to the servers according to the servers' capabilities and status. Client traffic is load balanced and intelligently distributed between the servers, allowing each server to operate more efficiently. The second advantage of an IP load balancer is fault tolerance. With the ever-growing importance placed on the "no down time" concept, the IP load balancer can monitor the health of servers in the server cluster and not direct client traffic to a server that is deemed to be "unhealthy" or "out of service". With providing this resilience within a server cluster, the client can receive smooth and continuous service, which is the ultimate goal of on-line services. What networks can use load balancers? The answer is quite simple: any network that needs to be scaleable for growth and robust for maximum up time. As soon as client traffic becomes overwhelming for a single server, a simple server cluster can be built. As soon as a server cluster is built, a load balancer becomes almost a necessity for smooth operation, providing load balancing and fault tolerance. The purpose of this document is to present and examine feature sets that are important when selecting a load balancer. The options are growing daily and it's important to distinguish characteristics that make a good load balancer. Software vs. Hardware: There are primarily two approaches to handling load balancing for a server cluster: through software or through hardware. A typical software solution would require additional software to be installed on the servers in a server cluster to handle the necessities of the solution. Additionally, standalone machines acting as "masters" may or may not be required in the server cluster. On the other hand, a typical hardware solution is a standalone unit (i.e. a "black box") that is physically positioned between the user community and the server cluster. This physical box handles the flow and directing of traffic between the clients and the servers. Let's examine each solution's advantages and disadvantages: Software Solutions: Software solutions typically involve installing one or many pieces of extra software on the servers in the cluster, which certainly has it's advantages. Software on each server would allow in-depth analysis of the operating system within the server. This can give the solution the impressive ability to look into vital statistics on the server such as CPU or memory utilization. Some solutions even allow synchronization of data between servers in a cluster. Since it may not always be a safe assumption that servers within a cluster have identical content, such a capability can certainly prove itself useful. However, software solutions present some disadvantages also. For example, an extra piece of software on each server might take up an unknown amount of resources on that server. The more "task-rich" the implementation, the more server power is depleted for the load balancing tasks. A new piece of software on the servers also presents an extra point of failure per server. It would be ironic, yet possible, that the piece of software installed to examine a server's health actually ends up causing it's failure, which is possible when any new piece of software or unknown entity is introduced to an existing server platform. Furthermore, software solutions may prove themselves to be not very scaleable. With every new server that is added to the cluster, load balancing software has to be installed. Depending on the size of the software, such a task may complicate the overall solution and it's effective growth. Add to that the possible "per server" licensing costs or software adjustments and the possible complications of a software upgrade and a software solution has the potential not to be very "growth friendly". Finally, a software solution may pose a possible operating system dependency. It's possible that a software solution may only be compatible with a finite list of operating systems. This may undesirably lock a network in to a single operating system without any chance for heterogeneity between the servers. Hardware Solutions: Hardware solutions typically place a dedicated machine in the network between the users and the server cluster. The advantage here is that there is dedicated hardware for a dedicated task. In other words, the server applications can continue to process data as before, without any need for external software, while a dedicated piece of hardware handles the traffic management. This philosophy results in a higher performance factor for the site or server cluster as a whole. Also, a dedicated piece of hardware typically brings with it operating system independence, allowing the platform to operate in a heterogeneous server environment. This makes the typical hardware solution very flexible. Furthermore, the overall maintenance of hardware load balancers is simple and can be done with minimum effort. For upgrades and/or periodic maintenance requirements, only a single device on the network has to be tampered with, not every server. If a redundant unit is available, this can be done with zero down time. Hardware solutions have their disadvantages, also. For example, a typical hardware solution can almost never "truly" know everything about the servers in it's clusters. It's virtually impossible to gather information like CPU/memory utilization from the servers, that is unless the hardware solution requires software agents on the servers. Also, a hardware solution can be considered as a single point of failure. All traffic to and from the servers goes through a single piece of hardware, which may (depending on the hardware's reliability factor) introduce a potential hazard. However, most all hardware solutions allow a redundant unit to be configured to activate in case the main unit fails. So, this is only a concern if dual and redundant units are not used. Hardware/Software combinations: As alluded to above, some solutions use both hardware and software to obtain the overall desired result. It's possible to consider such an implementation as a "best of both worlds" solutions. However, such solutions also contain the "worst of both worlds". All relative advantages and disadvantages should be taken into consideration. IP Protocol Support: It is almost vital that a solution be flexible when it comes to supporting IP protocols. Solutions that only support HTTP or HTTP and FTP have limitations if and when a need to use them in a "new" environment arises. The more flexible a solution, the better the solution can grow and develop with the network. The ideal solution is one that offers support for any IP protocol, TCP or UDP based. Furthermore, some IP protocols and applications need special attention. For example, the FTP protocol which can operate in one of two modes (active or passive) needs very special attention and care, especially if operating in passive mode (the mode Internet browsers use for FTP). Care must be taken when selecting a load balancing solution to make sure that delicate protocols such as FTP are fully supported. At the same time, the behavior of the load balancing device in respect to the flow of traffic should be considered. Analysis proves that it's most efficient if the device behaves as a "pass through" device. This allows for minimum alteration to the packet flow and format and will not in any way hinder the server's operation. If the device was solely handling all the sessions (acting as a TCP session "intermediator"), not only is more overhead created (as far as processes on the load balancer go), but also the servers will never know who the "real" users are. All servers may believe they have a single user: the load balancer. This has potential to cause problems with accounting, record keeping, or any other service the server may be performing on the side. Finally, a load balancer must fully allow state based protocols (such as SSL) to operate without any problems. A load balancing solution must be able to have the option to maintain sessions between users and servers in case the session is state based. In other words, a solution must have the capability to be configured in a way that would allow a user to continue to receive service from the same server until the end of the session, so as not to allow for any service interruption. This should be observed for the duration of the session, no matter how long. Performance: The term "performance" has a very broad definition base. When considering such a feature on a platform, it is important to distinguish between key terms and concepts commonly used. One way to specify performance is to use a PPS (packets per second) figure. In essence, this is the number of network packets (i.e. Ethernet packets) that a solution can forward per seconds. Another way to specify performance is by noting the number of concurrent sessions a device can handle, or that it uses a "super-processor". Other concepts have also been used to derive performance figures. So, what really matters? The bottom line is that a solution must be able to forward traffic to and from the servers at maximum efficiency, with the least amount of effect on the overall performance of a server cluster. A solution may be able to handle millions of concurrent sessions, but only forward packets at a rate of 2 per second. Obviously, such a solution would be very inefficient. The best indicator is actual operation in a network. If performance is a major concern, it's always a good idea to test a load balancer with realistic traffic patterns before deploying it in a working network. Considering the fact that a load balancer reduces load per server, the overall performance and efficiency of a cluster is usually enhanced, assuming the load balancer and the transport medium (LAN and WAN) can handle the client traffic volume. As a matter of fact, their strong performance is one of the reasons hardware solutions may be considered to have an advantage over software solutions. A dedicated hardware for a dedicated task will usually perform better than extra software on an already loaded server. All this said, it should always be considered that a server cluster or "site" is always bottle-necked by the bandwidth into the site, no matter how incredibly fast the load balancer is. As a simple example, if a site has a single T1 (1.544 Mbps) connection for access to the user community (Internet/Intranet), a load balancing device that can handle wire speed traffic for 10Mbps Ethernet may be just as sufficient as one that can handle 100Mbps Ethernet. This is , of course, assuming that ALL the users are accessing the site through the single T1. Server Cluster Flexibility: As noted above, a solution should be able to work in a server cluster with any operating system. Once a solution is used, it should not lock a network into one single platform. This allows for better expandability and the deployment of new hardware/software systems if and when necessary without any change to the load balancing solution. Flexibility is an important issue to consider. An ideal solution can be considered as one that can work with any operating system and any server hardware platform. The solution should allow for clusters to be fully heterogeneous, in necessary, both in software operating systems (WinNT, Unix, BSD, HPUX, etc ) and hardware configurations (P166, P200, DEC Alpha, dual processors, etc ). Load Balancing Schemes: A solutions must allow for flexible load balancing "schemes" to be used for server clusters. In other words, an administrator should have multiple options in how the load balancer decides what the best server suitable for a client query is. Furthermore, an administrator should have the luxury of choosing different schemes for different server clusters. One cluster should be able to utilize scheme A, while a second cluster (serviced by the same load balancing solution) uses scheme B. This is a necessity in case different server clusters have different needs and requirements. Also, a load balancer must allow for an imbalanced distribution of traffic between the servers in a cluster. The mechanism should allow the administrator to configure the system in a way by which more powerful servers receive more user requests than the less powerful servers. This is incredibly important if different servers within the same server cluster have different capabilities. Sometimes, it's possible to gather some important information from a server without installing an external software agent. Some operating systems (like Win NT) provide valuable information to authorized external agents. Likewise, any server with an SNMP daemon can provide server statistics that can be used to evaluate the overall condition of a server. A load balancing solution has a strong advantage if it can utilize such information to dynamically adjust traffic loads to the servers. Fault Tolerance And Overall Resilience: Efficient server manageability and dynamic server cluster adjustments based on current circumstances are vital features that are an absolute MUST for any load balancing solution. Useful and important features include: · Server health monitoring - A solution must, at all times,
be aware of the status and health of all the servers within the clusters it's responsible
for. The solution must periodically monitor the servers' physical and application layer
health and must be able to remove a server from service if there are any problems. Redundancy: As mentioned above, one of the main reasons load balancing solutions are deployed is to provide full fault tolerance and high availability for a server farm. One of their most important tasks is to provide a full-availability solution in case of server failure. But, what's to protect a network from a failure of the load balancer itself? This question is why any legitimate load balancing solution must provide the option for redundancy between load balancers. In case of hardware solutions, this is typically done by utilizing two units in parallel, where one is always active and one acts as a "hot spare". In software solutions, somehow the server cluster must provide for multiple "master" decision makers. This task may be accomplished differently by the different software solutions. When redundancy is used, peer units (hardware or software) must have an efficient way of monitoring each other. This is usually done in one of 2 ways: · Through a serial cable - Mostly used by hardware
solutions, a back-up unit monitors the health of the main unit through
"heartbeat" messages through a serial cable connected between the 2 units.
However, it's possible for serious problems to occur if the serial cable itself is faulty
or accidentally disconnected. Furthermore, serial cables typically detect only total unit
failure. Protection is not provided against more common failure of components such as hub
ports, switch ports, or Ethernet cables. No matter how the monitoring is done, the ideal solution is one that protects the system against the maximum possible failure scenarios. At first glance, it may be considered ample for a redundant unit to take over when the main unit fails as a whole. But, as mentioned above, what if an Ethernet cable or maybe a switch port fails? A fully robust solution should also allow protection against such failures, allowing full redundancy to occur when any piece of the entire solution architecture fails. A final and important note should be made about solutions that have the capability to have redundant units operate in parallel and simultaneously. Traditionally, and mostly with hardware solutions, redundancy implementations have offered the means for a "main" unit and a "hot spare backup" unit. These solutions are very acceptable. However, if the two units had the capability to work in parallel and simultaneously, while at the same time providing redundancy for each other, the solution would be very attractive and rather impressive. Of course, IP rules do not allow two machines to act as a single IP address (i.e. one IP address can be at one place at one time). However, if two server clusters were to be served by a single system, it would be very beneficial if the two load balancers could work in parallel and simultaneously, each acting as the main unit for traffic management to each of the multiple clusters. At the same time, each unit would provide redundancy for the other. The overall result is incredibly robust. Also, we should always consider that a "hot spare" may fail itself, before coming into action. In a main-backup scenario, we may never know this until too late. If two units were working together, such a situation would not cause a problem, since both units are always monitoring each other. Installation and Management: Sometimes, the practicalities of load balancing implementations, such as installation and management, are overlooked. A solution should be able to install into an existing network with ease. Of course, when a new piece of hardware/software is installed, there is some inherent break in action. However, minimizing this break has it's advantages. A solution should be able to fit very easily into an existing network. It should not affect any networking operation that was in tact before the load balancer's introduction into the network. Furthermore, a solution should allow for an easy migration path. Once installed, if desired, the solution should give the administrator the option to gradually switch the network to be fully at the load balancer's mercy. The option should always be available for a gradual migration between the "old" and the "new" network. Once installed, the load balancing solution should give the administrator convenient, strong, and flexible management options. For example, a GUI (Graphic User Interface) may make overall management simple. Or, compliance with industry management standards, such as SNMP, can allow the solution to be managed from any compliant device. The solution should also be able to provide some level of reporting regarding it's tasks. A network administrator may find information such as total number of users, or total/average amount of traffic very useful. Final Thoughts: New load balancing solutions are being introduced to the market on a monthly basis. At first glance it's very easy to be overwhelmed by the offerings. It's important to distinguish fact from fiction and to distinguish the useful facts from the not-so-useful facts. Important features of products, such as those mentioned in this document, should be accounted and taken into consideration. It's true that some features may be more important than others to some administrators. However, solutions that allow for growth, flexibility, and resilience will most likely prove themselves as superior. |