CCAMP Working Group T. Soumiya (Editor) Internet Draft Fujitsu Laboratories Expires: August 2003 P. Czezowski (Editor) Fujitsu Labs of America February 2003 Extensions to LMP for Flooding-based Fault Notification draft-soumiya-lmp-fault-notification-ext-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft describes extensions to the Link Management Protocol (LMP) for use in flooding-based fault notification in pre-OTN networks. Pre-OTN networks are transport networks that have a GMPLS-based control plane and various transport plane technologies (such as Optical Cross Connects and Optical Add/Drop Multiplexers, etc.) An important feature of these networks is timely recovery from failures - using either a protection or restoration scheme. The recovery schemes should also be resource efficient and flexible to meet operator requirements. Once a fault is detected, fault notification is one of a series of phases needed to achieve recovery. We prefer a flooding-based approach to the notification phase because it may offer speed and flexibility advantages over using RSVP-TE signaling for notification. Extending LMP to include fault notification is a good fit to the problem because fault management is already one of its features and many of its protocol objects can be reused. Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 1] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 Table of Contents 1. Overview.......................................................2 1.1 Terminology...................................................3 1.2 Glossary of Terms Used........................................3 2. Fault Recovery Scenario........................................3 3. Additional LMP Message Formats.................................5 3.1 FaultNotify Message (Msg Type = TBD)..........................5 3.2 FaultNotifyAck Message (Msg Type = TBD).......................6 4. Additional LMP Object Definitions..............................6 4.1 TTL Class (Class = TBD).......................................6 4.2 FAULT_ID Class (Class = TBD)..................................6 5. Priority-Based Recovery........................................7 6. Security Considerations........................................8 7. Conclusion.....................................................8 References........................................................9 Acknowledgments..................................................10 Editors' Addresses...............................................10 Contributing Authors.............................................10 1. Overview This draft describes extensions to the Link Management Protocol (LMP) for use in flooding-based fault notification in pre-OTN networks. Pre-OTN networks are transport networks that have a GMPLS-based control plane and various transport plane technologies (such as Optical Cross Connects and Optical Add/Drop Multiplexers, etc.) An important feature of these networks is timely recovery from failures - using either a protection or restoration scheme. The recovery schemes should also be resource efficient and flexible to meet operator requirements. Once a fault is detected, fault notification is one of a series of phases needed to achieve recovery. We prefer a flooding-based approach to the notification phase because it may offer speed and flexibility advantages over using RSVP-TE signaling for notification. Extending LMP to include fault notification is a good fit to the problem because fault management is already one of its features and many of its protocol objects can be reused. Currently, there are several Internet Drafts related to recovery in networks featuring a GMPLS control-plane. They cover the topics of terminology [2], requirements [3], functional specification [4], and mechanisms analysis [5]. The requirements for control plane-based recovery were found to fall into four main categories: o Meeting timing requirements o Efficient usage of data plane resources o Efficient usage of control plane resources o Supporting flexible design of recovery schemes Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 2] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 That controlled flooding meets the requirements on fault notification mechanisms and has beneficial side effects over notification via GMPLS signaling is discussed in [3]. Flooding-based notification is also appropriate for shared mesh-based recovery schemes that are promoted for their resource efficiency and flexibility. Generic mechanisms for implementing a flooding-based fault notification protocol are proposed in [6]. In this draft, we describe the implementation of flooding-based fault notification in a recovery scenario, and provide the necessary extensions to LMP message formats and data object definitions. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [7]. 1.2 Glossary of Terms Used In addition to the terminology for GMPLS-based recovery that is documented in [2], this draft uses the following acronyms: o GMPLS: Generalized Multiprotocol Label Switching [8] o LMP: Link Management Protocol [9] o LSP: Label Switched Path o NMS: Network Management System o OTN: Optical Transport Network o RSVP-TE: Resource Reservation Protocol-Traffic Eng. [10] 2. Fault Recovery Scenario In this section, a fault recovery scenario is described based on shared mesh recovery. Every node maintains an adjacency with each of its neighbors via at least one LMP control channel. A pre-planned recovery path table is configured using extended GMPLS signaling messages or through a Network Management System (NMS). When a failure occurs, the following procedure is carried out: 1. A downstream node close to the failure detects it. This node is called the detecting node. If path is bi-directional, an upstream node also detects it. The detecting node should report the detection of the failure (and becomes the reporting node). 2. The reporting node unicasts FaultNotify messages to all its immediate neighbor nodes. The node continues sending unicast Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 3] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 FaultNotify messages periodically until it receives FaultNotifyAck messages from its neighbors, or a timer to retry sending has expired. A FaultNotify message contains the node ID of the reporting node, the link ID of the failed link and a sequence Number. The message may optionally contain a TTL, failed wavelength ID, or failed SRLG ID. 3. A neighbor node receives FaultNotify message with failure data and sequence number. 4. The receiving node confirms it has not yet received the message about this failure. For this purpose, it searches a database indexed on the failure data and sequence number. The database stores the failure data and sequence numbers from the received messages. 4a. If it has not yet received the message about this failure, it adds the failure data and sequence number into the database. The node then unicasts FaultNotify messages to all its neighbors, except the node that sent the message. 4b. If it has already received the message about this failure, go to 6. 5. The receiving node possibly sets up one or more protection paths according to a pre-planned protection table and the failure data. 6. The receiving node sends back FaultNotifyAck message to the node that sent the FaultNotify message. [Optional]: If the receiving node has set up one or more protection paths, it sends a ProctectionCompleteNotify message to either the egress node of the protection path or to the NMS. It continues sending ProtectionCompleteNotify messages periodically until it receives a ProtectionCompleteNotifyAck message or a timer to retry sending has expired. [Optional]: The receiving node at the egress of the protection path, or the NMS sends back ProtectionCompeteNotifyAck message. [R1]---[R2]---[R3]---[R4] \ / [R5]-------------[R6] / \ / \ [R7]---[R8]---[R9]---[R10] Working LSP1: [R1->R2->R3 >R4] Working LSP2: [R7->R8->R9->R10] Recovery LSP1: [R1->R5->R6->R4] Recovery LSP2: [R7->R5->R6->R10] Figure 1: Shared Mesh Recovery Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 4] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 Figure 1 shows an example of shared mesh-based recovery. One working path, W-LSP1, runs from R1 to R4 via R2 and R3, and another working path, W-LSP2, runs from R7 to R10 via R8 and R9. R1 can provide user traffic protection by creating a backup LSP that merges with the working LSP at R4. We refer to a R-LSP1 [R1->R5->R6->R4] as the recovery LSP of W-LSP1. In the same manner, we refer to R-LSP2 [R7- >R8->R9->R10] as recovery of W-LSP2. In this situation, if it can be assumed that multiple failures do not occur at a same time, the resource for recovering the working LSPs can be shared. In other words, the resource between R5 and R6 can be shared between the recovery LSP1 and recovery LSP2. By setting up these recovery LSPs, the spare/work capacity ratio in the network can be reduced. When a failure occurs at a link between R8 and R9 on the working W- LSP2, the endpoint nodes (both R8 and R9) of the link will detect the failure (if the link is bi-directional). Then, the detecting nodes start sending FaultNotify messages in the flooding-based manner. In case of R9, the messages are sent to R6 and R10. When a FaultNotify message is received, these nodes send back a FaultNotifyAck message to the sending node. In the same manner, they flood the messages to their immediate neighbors. The FaultNotify message includes information regarding the failure such as FAULT_ID, etc. When nodes on the recovery LSPs receive the FaultNotify message, they activate the pre-calculated recovery path. In this example, R-LSP2 is activated and R7 switches W-LSP2 traffic to be carried by R-LSP2. Also R10 merges R-LSP2 to the original route. As the result, traffic will take the path [R7->R5->R6->R10]. 3. Additional LMP Message Formats LMP is a good candidate protocol to extend for the purposes of fault notification. Flooding-based fault notification is quite simple, and only two messages (FaultNotify and FaultNotifyAck) need to be defined. Furthermore, most of the necessary data objects are already defined in LMP [9]. 3.1 FaultNotify Message (Msg Type = TBD) ::= [] { [] ...} or ::= [ ...] Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 5] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 3.2 FaultNotifyAck Message (Msg Type = TBD) ::= The contents of the MESSAGE_ID_ACK object MUST be obtained from the FaultNotify message being acknowledged. 4. Additional LMP Object Definitions The formats for the Common Header at the beginning of LMP messages and the LMP objects used to build the messages are defined in [9]. That document also defines the MESSAGE_ID, MESSAGE_ID_ACK, LOCAL_NODE_ID, and CHANNEL_STATUS data objects used in our extended messages. The SRLG_ID data object is defined in [11]. This leaves us to define data objects for TTL and FAULT_ID. 4.1 TTL Class (Class = TBD) o C-Type = 1, Time to Live (= Hop Count) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TTL: 8 bits This is an unsigned integer to indicate a remaining hop count value. A node receiving a FaultNotify message having a TTL of zero MUST silently discard the message. This object is non-negotiable. 4.2 FAULT_ID Class (Class = TBD) o C-Type = 1, Failure Identifier 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | FaultId | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ FaultId: 16 bits This MUST be a node-wide unique unsigned integer. The FaultId Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 6] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 identifies the sequence of failures. A node increases the value when it detects a failure. This object is non-negotiable. 5. Priority-Based Recovery Fault recovery schemes typically assume single failure events. However, there may occur multiple failures in some short time interval. Protection against occurrences of failure scenarios requires exorbitant spare capacity. Ideally, the network should at least save some of the working paths in this situation. For example, consider figure 2, when two failures occur at a same time. One failure is a link failure between R3 and R4, and the other failure is also link failure between R7 and R8. In this example, R4 detects a failure and send fault notification message using flooding to R6. Also, R8 detects a failure and sends a message to R5. At R6, a recovery path switches traffic from R6 to R4 because the fault notification message is for W-LSP1. On the other hand, at R5, a recovery path switches traffic from R5 to R6 because the fault notification message is for W-LSP2. As the result, an invalid recovery path is set to follow [R7->R5->R6->R4]. [R1]---[R2]---[R3]-X-[R4] \ / [R5]-------------[R6] / \ / \ [R7]-X-[R8]---[R9]---[R10] Working LSP1: [R1->R2->R3->R4] Working LSP2: [R7->R8->R9->R10] Recovery LSP1: [R1->R5->R6->R4] Recovery LSP2: [R7->R5->R6->R10] Figure 2: Multiple failure scenario. Priority-based control is an effective solution for the case of saving specific working paths in multiple failure condition. In the above example, if the priority of W-LSP1 is higher than W-LSP2, then the fault notification messages for W-LSP1 are preferred. In other words, the system checks the priority of the protection path and changes the setting by priority. In that case, the setting of R6 to R4 takes place over R6 to R10. By adopting the priority-based control, such misbehavior can be avoided. As the result, the high priority protection path is set up. This priority should be set according to a network operator's policy and/or network service. Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 7] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 6. Security Considerations Security requirements depend on the level of trust between nodes that exchange fault notification messages. In general, when nodes in a pre-OTN network are in the same administrative domain than when talking to nodes in a different administrative domain, the security consideration may apply more relaxed. When flooding-based fault notification mechanism is implemented based on LMP [9], the security mechanisms of LMP can be adopted. All LMP messages should be sent over an IPsec channel that has been either pre-established or is set-up on a per need basis. Note however that fault recovery protocol itself introduces no new security considerations. 7. Conclusion This draft describes extensions to the Link Management Protocol (LMP) for use in flooding-based fault notification in pre-OTN networks. While there are currently several Internet Drafts in the Sub-IP Area related to service recovery in GMPLS networks, fault notification method for control plane-based networks has not been specifically detailed in any one document. We believe that flooding base fault notification method is the best way to satisfy fault recovery requirements. We show how the notification functions in fault recovery scenarios. Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 8] draft-soumiya-lmp-fault-notification-ext-00.txt February 2003 References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Mannie, E., et al, "Recovery (Protection and Restoration) Terminology for GMPLS", Internet Draft, work in progress, draft- ietf-ccamp-gmpls-recovery-terminology-01.txt, November 2002. [3] Czezowski, P., and T. Soumiya (Eds.), "Optical network failure recovery requirements", Internet Draft, work in progress, draft- czezowski-optical-recovery-reqs-01.txt, February 2003. [4] Lang, J.P. and B. Rajagopalan (Eds.), "Generalized MPLS Recovery Functional Specification", Internet Draft, work in progress, draft-ietf-ccamp-gmpls-recovery-functional-00.txt, January 2003. [5] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based Recovery Mechanisms (including Protection and Restoration)", Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery- analysis-00.txt, January 2003. [6] Rabbat, R., and V. Sharma (Eds.), "Fault Notification Protocol for GMPLS-based Recovery", Internet Draft, work in progress, draft-rabbat-fault-notification-protocol-02.txt, February 2003. [7] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [8] Mannie, E. (Ed.), "Generalized Multi-Protocol Label Switching (GMPLS) Architecture", Internet Draft, work in progress, draft- ietf-ccamp-gmpls-architecture-03.txt, August 2002. [9] Lang, J. (Ed.), "Link Management Protocol (LMP)", Internet Draft, draft-ietf-ccamp-lmp-07.txt, November 2002. [10] Berger, L. (Ed.), "Generalized MPLS Signaling - RSVP-TE Extensions", Internet Draft, work in progress, draft-ietf-mpls- generalized-rsvp-te-09.txt", September 2002. [11] Fredette, A., and J. Lang (Eds.), "Link Management Protocol (LMP) for DWDM Optical Line Systems", Internet Draft, work in progress, draft-ietf-ccamp-lmp-wdm-01.txt, September 2002. Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 9] Acknowledgments The following individuals provided valuable input to this draft: Richard Rabbat, Ching-Fong Su and Takafumi Chujo of Fujitsu Labs of America, and Masafumi Katoh and Akira Chugo of Fujitsu Laboratories, Ltd. Editors' Addresses Toshio Soumiya Peter Czezowski Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc. 1-1, Kamikodanaka 4-Chome 595 Lawrence Expressway Nakahara-ku, Kawasaki Sunnyvale, CA 94085 211-8588, Japan United States of America Phone: +81-44-754-2765 Phone: +1-408-530-4516 Email: soumiya.toshio@jp.fujitsu.com Email: peterc@fla.fujitsu.com Contributing Authors Shinya Kanoh Takeo Hamada Fujitsu Laboratories Ltd. Fujitsu Labs of America, Inc. 1-1, Kamikodanaka 4-Chome 595 Lawrence Expressway Nakahara-ku, Kawasaki Sunnyvale, CA 94085 211-8588, Japan United States of America Phone: +81-44-754-2765 Phone: +1-408-530-4516 Email: kanoh@jp.fujitsu.com Email: thamada@fla.fujitsu.com Soumiya & Czezowski (Eds.) Expires - August 2003 [Page 10]