High Availability Networking
Vincent C. Jones
This book explores and discusses a wide range of potential approaches to improving network availability, allowing you to choose those most appropriate for your organization and its unique needs and constraints. The goal is to show how to achieve higher network availability both in theory and in practice. In economic terms, this means pushing the design to the point where the cost of eliminating further unavailability exceeds the cost to the organization of the losses due to downtime.
While the theoretical aspects apply to networks of all sizes and technologies, the example solutions provided focus on the needs of moderate sized extended corporate networks using IP version 4 and stable, moderate performance technologies such as frame relay, ISDN, and Ethernet--not because these technologies are fundamentally more or less reliable than others, but because these tend to be the networks which have grown to the point of being critical to the day-to-day operations of the organization without a staff of dedicated network designers and architects to provide optimization and support.
This book is written for those looking for design techniques to cost effectively improve network availability. The reader is assumed to be knowledgeable in the fundamentals of large network design and comfortable going to other references for more details on specific protocols and functions.
The basic approach of this book is divide and conquer. Each chapter attacks a general need of high availability network design, from defining what high availability really means and requires in the first chapter to the final chapter's discussion of the essential commitment to a full range of network management capabilities. Within each chapter, the general need is broken down into specific requirements. Within each specific requirement, the problem being addressed and possible solutions are first discussed on a general theoretical level. Wherever practical, one or more specific scenarios are defined and example solutions implemented, typically using Cisco routers.
Please read through the example implementations even if you never expect to touch a Cisco system. The examples and their accompanying discussions serve to flesh out the theoretical framework, showing typical adjustments required to get the theory presented to actually work in a real world environment. Many of these adjustments have nothing to do with Cisco, but rather reflect limitations in the current implementations of network protocols.
Technical managers and others will find this book's survey of all aspects of high availability network design invaluable. There are a vast array of considerations which should be part of any design and tunnel vision can be costly. It is very easy (and common) to implement point solutions which in the process of eliminating one weakness introduce other modes of failure. Choosing the best solution is rarely possible without a system-wide perspective.
Network implementors in a Cisco environment will find this book a cookbook of Cisco solutions that they can modify and install in their own network. These readers should still pay attention to the theoretical discussions preceding each example so they can identify modifications necessary to fit their unique environment.
It is essential for all to keep in mind that high availability is not just a design parameter, it is also an executive management commitment to funding adequate resources, staffing and training for the life of the network. At the same time, even though this book focuses on enhancing the availability of the network, we must always keep our sense of perspective. From the user's viewpoint, it is immaterial whether it is the network, the server, the software or the client platform which fails. Cost effective availability improvement needs to be balanced across all causes of failure to ensure that the resources required are applied where they will have the most impact on the bottom line.
This is much easier to say than it is to do, as few organizations even know what their current availability is or keep any statistics on the causes of failure. Even fewer organizations have proceeded to the stage of analyzing their bottom line costs for various failure modes. But higher network availability remains indisputably important. Fortunately, it is never too late to start on the road to higher availability.
Chapter 1, "Reliability and Availability," introduces the theory and technology of high availability networks. First the stage is set with the potential cost of network downtime for mundane production as well as ``must run'' networks. The mathematical basis behind predicting availability, different approaches to providing higher availability, and the availability challenges unique to computer networks form the core of the chapter. The chapter ends with the need to provide physical diversity in multiply connected WANs and LANs, setting the stage for the rest of the book.
High availability is not an automatic result of adding redundant links and components to a network. Adding redundancy adds complexity to the network, which must be recognized and utilized. Chapter 2,"Bridging and Routing," starts out with a quick review of network terminology, then surveys the available layer two bridging approaches (simple learning, SR and TST) and popular layer three routing protocols for IP (static, RIP, OSPF, Integrated IS-IS, EIGRP and BGP), briefly discussing how each works and the strengths and weaknesses of each. Along the way, parameter tuning which may be appropriate to speed up response to failures is explored and examples provided.
Chapter 3, "Multihomed Hosts," extends the availability benefits of redundant connectivity all the way to the end system in an IP network. Starting with the simple step of adding a second NIC to an end system, the challenges presented in supporting applications when the server (or client) has two IP addresses are explored. Then two approaches to giving the two NICs the appearance of a single IP address (proprietary and via routing) are examined, with full configuration examples for the latter. The chapter concludes with a discussion of server cluster terminology, techniques and limitations.
Dial backup is a popular alternative to installing additional ``permanent'' links. In Chapter 4, "Dial Backup for Permanent Links," the first of two dedicated to dial backup, three different dial backup approaches are introduced, distinguished by how the router determines the need to place a call. After exploring the underlying assumptions behind each and how those assumptions affect their suitability for various applications, a basic ``how to'' for IP dial backup is provided. Using examples of ISDN dial backup applied to leased lines, frame relay, and DSL, the critical factors requisite to successful implementation are highlighted.
Chapter 5, "Advanced Dial Backup," extends the general concepts introduced in chapter 4 to meet the specific needs of a range of requirements. It starts with the challenge of using asynchronous modems rather than ISDN, then moves on to explore techniques for combining multiple dial links to provide higher bandwidth. After a brief look at providing IPX support, the chapter concludes with how to use BGP with generic dial-on-demand routing to provide dial backup driven by routing table changes without the limitations associated with Cisco's proprietary dialer watch facility.
Chapter 6, "Multiple Routers at a Single Site," focuses on eliminating the router as a single point of failure from the viewpoint of preventing end-systems at a location from being isolated from the rest of the WAN. Starting with solutions to the limitations inherent in the IP concept of a default gateway, the chapter then explores how to provide a second router without doubling the WAN communications costs by getting one router to provide dial backup for a link on another router. It then finishes with how to configure the routers on a physically extended LAN so that even if the LAN is split in two by a failure, IP systems on both halves of the LAN can still communicate with the outside world.
Chapter 7, "Hub and Spokes Topology," explores the unique requirements of hub and spokes networks. Hub and spokes is a popular topology for HQ data center and other applications because it allows major simplifications in the routing structure, but it can also introduce complications. The chapter starts with a discussion on how to get around limitations on the number of peers supportable on a single router and how to scale a hub and spokes design to handle an arbitrary number of spokes without requiring the spoke routers to maintain more than a handful of routes each. The focus then shifts to configuring dial-on-demand routing so that a spoke router can dial any of several routers at the hub without concern for which answers. Finally, critical considerations when the hub expands to actually be multiple sites, such as a primary and backup data center, are explored.
Chapter 8, "Connecting to Service Providers," looks at the special considerations which apply when connecting with networks outside our control. The challenge is split into two levels: connecting to well-defined, relatively trustworthy external networks and connecting to ``The Internet.'' For the former, the focus is on solving the problem of redundant connectivity using floating static routes driven by an IGP. For the latter, the focus is on the limits of static routing and how to make the most of BGP for both multiple connections to a single ISP and redundant connections to multiple ISPs.
Any time we can't trust ``the other network'' we should firewall it. The problem is that a firewall not only blocks traffic from ``the bad guys,'' it also blocks desirable traffic such as routing information. Chapter 9, "Connecting through Firewalls," focuses on how firewalls are integrated into a network and the various ways traffic can be routed to the appropriate firewall. Starting with an example of a fully redundant network with no firewall failover capability, it looks at different ways to provide useful redundancy for the firewalls without sacrificing security.
Many organizations also use DLSw to support IBM SNA communications. Chapter 10, "IBM Mainframe Connections," looks at the challenges of supporting DLSw in a fully redundant manner, starting with token ring LAN support where support of redundant DLSw links is automatic. Then the challenge of providing working redundant fallback in support of Ethernet attached devices is addressed. The chapter ends with how to implement high availability redundant DLSw in a firewalled environment.
Chapter 11, "Disaster Recovery," scales the survey of high availability design techniques to consider continued operations in the event of a site-wide or regional disaster. The cost of full scale disaster recovery combined with fundamental weaknesses in the IPv4 protocol suite means every solution is a compromise. The chapter looks at some of the key considerations in planning for disaster recovery with little or no service interruption, some of the potential failure modes that need to be designed around, and the use of commercially available load sharing approaches to implement high availability disaster recovery.
Chapter 12, "Management Considerations," finishes the book with a look at network management. In a network design with redundant capabilities for high availability, network management is not an option. It is essential that faults be found and fixed as quickly as possible, even though they may have no immediate impact on network functionality. Surveying the technical and management skills and discipline required to run a high availability network, the chapter focus on the critical roles of network monitoring, configuration management, and total quality control.
An extensive Glossary is also included in recognition of the huge range of acronyms and technical terms required in a book covering as wide a variety of concerns as this one.
For those of you not using Cisco products, I apologize for the exclusive use of Cisco routers in the router configuration examples provided. I strongly encourage you to continue reading regardless, as many of the challenges and tradeoffs discussed as part of the Cisco configuration apply to routers and switches from other vendors as well. Even though the terminology, command syntax, and feature sets differ, the underlying protocols are based on the same standards and the same set of constraints.
For those of you who are using Cisco products, I have tried to indicate in each configuration the oldest version of Cisco IOS that the particular configuration can be expected to run on. In indicating a minimum IOS, I ignored interim release trains (such as 11.1 and 11.3) and all releases prior to 11.0. My assumption is that in a network requiring maximum availability, only mainstream IOS releases which are actively supported and in General Distribution status will be of interest.
This book reflects techniques I have used over the years to help my clients minimize the impact of failure on network operations. Almost all the examples are extracted from working configurations that have been proven in production environments, then adapted and sanitized for publication. As a service to readers, all the example configurations are available on the Networking Unlimited, Inc. web site at http://www.networkingunlimited.com. All feedback is welcome.
| Home Page | Company Profile | Capabilities | Coming Events | Case Studies | White Papers | Book
Copyright 1999-2000 © Networking Unlimited Inc. All rights reserved.