- BPDU frames are sent with a destination address of the well-known STP multicast address 01-80-c2-00-00-00
- By default, BPDUs are sent out all switch ports every 2 seconds
- STP determination process only identify bridges and ports. All ports are still active, and bridging loops still might lurk in the network
- In Cisco switches, PVST+ is the default STP that is enabled when a VLAN is created.
- The IEEE recommendation is to consider a maximum diameter of seven bridges for the default STP timers.
2 x forward delay = end-to-end_BPDU_propagation_delay + Message_age_overestimate + Maximum_frame_lifetime + Maximum_transmission_halt_delay = 14 + 6 + 7.5 + 1 = 28.5
forward_delay = 28.5 /2 = 15 (rounded)
http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/19120-122.html
STP Animatioin @ cisco.com
Cisco SW actually send several different BPDUs on an 802.1Q trunk port:
* An 802.1d BPDU for VLAN 1 is sent untagged to the IEEE MAC address (0180.c200.0000) on the native VLAN of the trunk
* A Cisco SSTP BPDU for the native VLAN of the trunk is sent untagged to the SSTP MAC address (0100.0ccc.cccd) on the native VLAN of the trunk
* Additional Cisco SSTP BPDUs, one for each of the other VLANs carried on the trunk, are sent 802.1Q tagged to the SSTP MAC address with the appropriate VLAN ID
The format of the SSTP BPDU is 100% identical to the 802.1d BPDU after the SNAP header, except that we also add a "PVID" TLV field at the end of the frame, which identifies the VLAN ID of the source port (eg, if we send an SSTP BPDU on VLAN 10, the TLV contains vlan 10).
STP History
- First STP, called the DEC STP, was invented in 1985 by Radia Perlman
- In 1990, the IEEE published the first standard for the protocol as 802.1D based on the algorithm designed by Perlman.
Subsequent versions were published in 1998 and 2004 incorporating various extensions.
IEEE 802.1D - CSTP aka 802.1d-1998 ( no VLANs) - Common Spanning Tree assumes one 802.1D spanning-tree instance for the entire bridged network, regardless of the number of VLANs.
I EEE 802.1Q - STP (with VLANs support)
- PVST+ is a Cisco enhancement of STP that provides a separate 802.1D spanning-tree instance for each VLAN configured in the network.
- IEEE 802.1W - RSTP aka 802.1d-2004 Rapid STP - still had a single instance of STP
- PVRST+ is a Cisco enhancement of RSTP that is similar to PVST+.
- IEEE 802.1S - MSTP - IEEE standard inspired from the earlier Cisco proprietary Multi-Instance Spanning Tree Protocol (MISTP) implementation.
MST maps multiple VLANs that have the same traffic flow requirements into the same spanning -tree instance.
- From the STP point of view, IEEE 802.1D is not VLAN-aware and IEEE 802.1Q is VLAN-aware, but it uses a single STP instance for all VLANs.
- That is, if the port is blocking then it is blocking for all VLANs on that port. The same is true for forwarding.
http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/24063-pvid-inconsistency-24063.html
STP Terminology
Inferior BPDU - if it carries information about the
root bridge that is worse than the one currently stored for the port,
or
the BPDU has longer distance to reach the current root bridge. Inferior
BPDUs may appear when a neighboring switch suddenly
loses its uplink and claims itself the new root of the topology. By
default, every switch should ignore inferior BPDUs, until the currently
stored BPDU expires (time=Max_Age – Message_Age). This feature intends
to stabilize STP topology in situations where an uplink on some switch
flaps (goes down and up frequently for any malfunction).
A superior BPDU is one that has a lower Bridge ID. An inferior BPDU
would have a higher Bridge ID. This can’t be judged on a single BPDU
basis. It’s only in comparison that one can be considered superior or
inferior. Receiving a superior BPDU typically means that a switch
received a BPDU with a lower Bridge ID than the Bridge ID of the
currently elected root bridge.
If a switch receives an inferior BPDU, nothing changes. Receiving a
superior BPDU will kick off a reconvergence of the STP topology.
Legacy STP - IEEE 802.1D Overview
In a Layer 3 environment, the routing protocols in use keep track of redundant paths to a destination network so that a secondary path can be used quickly if the primary path fails. Layer 3 routing allows many paths to a destination to remain up and active, and allows load sharing across multiple paths.
In a Layer 2 environment (switching or bridging), however, no routing protocols are used, and active redundant paths are neither allowed nor desirable. Instead, some form of bridging provides data transport between networks or switch ports. The Spanning Tree Protocol (STP) provides network link redundancy so that a Layer 2 switched network can recover from failures without intervention in a timely manner.
Bridging Loops
Layer 2 switch mimics the function of a transparent bridge (must offer segmentation between two networks while remaining transparent to all the end devices connected to it)
A transparent bridge (and the Ethernet switch) must operate as follows:
- as no initial knowledge of any end device’s location; therefore, the bridge must “listen” (source MAC) to frames coming into each of its ports, builds a table that correlates source MAC addresses with the bridge port numbers,
- constantly update its bridging table on detecting the presence of a new MAC address or on detecting a MAC address that has changed,
- broadcast address as the destination address, the bridge must forward, or flood, the frame out all available ports, except the port that initially received the frame,
- if a frame arrives with a destination address that is not found in the bridge table, the bridge cannot determine which port to forward the frame to for transmission (unknown unicast). The bridge treats the frame as if it were a broadcast and floods it out all remaining ports.
- Frames forwarded across the bridge cannot be modified by the bridge itself. Therefore, the bridging process is effectively transparent.
This process of forwarding a single frame around and around between two switches is known as a bridging loop.
Nothing can stops the frame from being forwarded in this fashion forever! (physically break the loop by disconnecting switch ports or shutting down a switch)
Bridging loops form because parallel switches (or bridges) are unaware of each other.
STP was developed to overcome the possibility of bridging loops so that redundant switches and switch paths could be used for their benefits. Basically, the protocol enables switches to become aware of each other so they can negotiate a loop-free path through the network.
How STP worksBridge Protocol Data Unit
* STP computes a tree structure that spans all switches in a subnet or network.
* Redundant paths are placed in a Blocking or Standby state to prevent frame forwarding.
* The switched network is then in a loop-free condition.
* However, if a forwarding port fails or becomes disconnected, the spanning-tree algorithm recomputes the spanningtree topology so that the appropriate blocked links can be reactivated.
STP data messages are exchanged in the form of bridge protocol data units(BPDU). A switch sends a BPDU frame out a port, using the unique MAC address of the port itself as a source address.
BPDU frames are sent with a destination address of the well-known STP multicast address 01-80-c2-00-00-00.
Two types of BPDU exist:
■ Configuration BPDU, used for spanning-tree computation
■ Topology Change Notification (TCN) BPDU, used to announce changes in the network topology
BPDU Frame formats (univercd/cc/td/doc/product/lan/trsrb2/frames.pdf): https://www.dropbox.com/s/bexm7of295n7yxh/l2_frames.pdf?dl=0
BPDU Frame - from link |
1. Electing a Root Bridge
For all switches in a network to agree on a loop-free topology, a common reference (root point) must exist.
An election process among all connected switches chooses the root bridge.
Each switch has a unique Bridge ID (BID) that identifies it to other switches. The bridge ID is an 8-byte value consisting of the following fields:
■ Bridge Priority (2 bytes)—The priority or weight of a switch in relation to all other switches. T
he Priority field can have a value of 0 to 65,535 and defaults to 32,768 (or 0x8000) on every Catalyst switch.
■ MAC Address (6 bytes)—The MAC address used by a switch can come from the Supervisor module, the backplane, or a pool of 1,024 addresses that are assigned to every supervisor or backplane, depending on the switch model. In any event, this address is hard-coded and unique, and the user cannot change it.
Every switch begins by sending out BPDUs with a root bridge ID equal to its own bridge ID and a sender bridge ID that is its own bridge ID.
Received BPDU messages are analyzed to see if a “better” root bridge is being announced.
A root bridge is considered better if the root bridge ID value is lower than another.
Sooner or later, the election converges and all switches agree on the notion that one of them is the root bridge.
2. Electing Root Ports
Each nonroot switch must figure out where it is in relation to the root bridge.
This action can be performed by selecting only one root porton each nonroot switch. The root port always points toward the current root bridge.
STP uses the concept of cost to determine many things. Selecting a root port involves evaluating the root path cost.
Cost = value is the cumulative cost of all the links leading to the root bridge.
A particular switch link also has a cost associated with it, called the path cost. Path costs are defined as a 1-byte value
STP Path Costs |
1.The root bridge sends out a BPDU with a root path cost value of 0 because its ports sit directly on the root bridge.
2.When the next-closest neighbor receives the BPDU, it adds the path cost of its own port where the BPDU arrived. (This is done as the BPDU is received.)
3.The neighbor sends out BPDUs with this new cumulative value as the root path cost.
4.The root path cost is incremented by the ingress port path cost as the BPDU is received at each switch down the line.
5.Notice the emphasis on incrementing the root path cost as BPDUs are received. When computing the spanning-tree algorithm manually, remember to compute a new root path cost as BPDUs come into a switch port, not as they go out.
The lower cost is tell - the switch that the path to the root bridge must be better using this port than it was on other ports.
Default Observed STP COST on Cisco Switches - from http://www.hojmark.net/stp-port-cost.html
Speed Port Cost Comment
10 Mbps 100 Ethernet
20 Mbps 56 EtherChannel
30 Mbps 47 EtherChannel
40 Mbps 41 EtherChannel
50 Mbps 35 EtherChannel
54 Mbps 33 802.11 wireless
60 Mbps 30 EtherChannel
70 Mbps 26 EtherChannel
80 Mbps 23 EtherChannel
100 Mbps 19 Fast Ethernet
200 Mbps 12 Fast EtherChannel
300 Mbps 9 Fast EtherChannel
400 Mbps 8 Fast EtherChannel
500 Mbps 7 Fast EtherChannel
600 Mbps 6 Fast EtherChannel
700 Mbps 5 Fast EtherChannel
800 Mbps 5 Fast EtherChannel
1 Gbps 4 Gigabit Ethernet
2 Gbps 3 Gigabit EtherChannel
10 Gbps 2 10G Ethernet
20 Gbps 1 20G EtherChannel
40 Gbps 1 40G EtherChannel
3. Electing Designated Ports
Only one designated port is enabled per segment.
To remove the possibility of bridging loops, STP makes a final computation to identify one designated porton each network segment.
Switches choose a designated port based on the lowest cumulative root path cost to the root bridge.
If a neighboring switch on a shared LAN segment sends a BPDU announcing a lower root path cost, the neighbor must have the designated port.
If a switch learns only of higher root path costs from other BPDUs received on a port, however, it then correctly assumes that its own receiving port is the designated port for the segment.
Designated port selection
Two or more links might have identical root path costs. This results in a tie condition:
1.Lowest Root Bridge-ID
2.Lowest Root Path-Cost to Root Bridge
3.Lowest sender Bridge-ID
4.Lowest sender Port-ID
Influence interface selection
For UPSTREAM interfaces to Root-Bridge
- how to the local switch elects the root port change the cost on the links. Cost is cumulative throughout the STP domain. The higher cost is the less preferred.
For DOWNSTREAM switches
- downstream switch elects its root port - change the priority. This is only local significant between the two directly connected switches. Highest priority is less preferred.
UPSTREAM
SW-1-root (fa0/1, fa0/2) <====> SW-2 (fa0/1, fa0/2)
Switch-2#sh spanning-tree
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Fa0/1 Root FWD 19 128.1 P2p <--- Fa0/1 (128.1) on upstream switch (Sw1-root) have lower priority than Fa0/2 (128.2)
Fa0/2 Altn BLK 19 128.2 P2p
Switch-1-root#
int fa0/2
spanning-tree vlan 1 port-priority 64
Switch-2#sh spanning-tree
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Fa0/1 Altn BLK 19 128.1 P2p
Fa0/2 Root FWD 19 64.2 P2p <--- Fa0/2 (64.2) on upstream switch have lower priority than Fa0/1 (128.1)
DOWNSTREAM
Switch-2#
int Fa0/2
spanning-tree vlan 1 cost 1
Switch-2#sh spanning-tree
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Fa0/1 Altn BLK 19 128.1 P2p
Fa0/2 Root FWD 1 128.2 P2p <--- selected as FWD because Fa0/2 has better cost to upstream Bridge
Catalyst B port 1/2 is the DP for Segment B–C. Therefore, Catalyst C port 1/2 will be neither a root port nor a designated port.
Any port that is not elected to either position enters the Blocking state. Where blocking occurs, bridging loops are broken.
STP States
STP port states are as follows:
■ Disabled—Ports that are administratively shut down by the network administrator (not part of the normal STP progression for a port)
■ Blocking—After a port initializes, it begins in the Blocking state so that no bridging loops can form. Cannot receive or transmit data and cannot add MAC addresses to its address table. Allowed to receive only BPDUs so that the switch can hear from other neighboring switches.
■ Listening—A port is moved from Blocking to Listening if the switch thinks that the port can be selected as a root port or designated port.
The port is on its way to begin forwarding traffic. Still cannot send or receive data frames. Allowed to receive and send BPDUs so that it can actively participate in the STP.
The port finally is allowed to become a root port or designated port because the switch can advertise the port by sending BPDUs to other switches.
If the port loses its root port or designated port status, it returns to the Blocking state.
■ Learning—After a period of time called the Forward Delay (15 seconds) in the Listening state, the port is allowed to move into the Learning state.
The port still sends and receives BPDUs as before. In addition, the switch now can learn new MAC addresses to add to its address table.
This gives the port an extra period of silent participation and allows the switch to assemble at least some address information.
The port cannot yet send any data frames, however.
■ Forwarding—After another Forward Delay (15 seconds) period of time in the Learning state, the port is allowed to move into the Forwarding state.
The port now can send and receive data frames, collect MAC addresses in its address table, and send and receive BPDUs.
The port is now a fully functioning switch port within the spanning-tree topology.
Switch# show spanning-tree interface fastethernet 0/1
Vlan Port ID Designated Port ID
Name Prio.Nbr Cost Sts Cost Bridge ID Prio.Nbr
----------------- ------------- -------------- -------------------------- --------
VLAN0001 128.1 19 LIS 0 32769 000a.f40a.2980 128.1
When a port is first activated, it transitions through the following stages:
1) Blocking - with Max Age (20 sec) Discards frames, does not learn MAC addresses, receives BPDUs
2) Listening - with Forward Delay (15 sec) Discards frames, does not learn MAC addresses, receives BPDUs to determine itsrole in the network
3) Learning - with Forward Delay (15 sec) Discards frames, does learn MAC addresses, receives and transmits BPDUs
4) Forwarding Accepts frames, learns MAC addresses, receives and transmits BPDUs
It takes up to 50+ seconds to a port to transition from Blocking to Forwarding.
It's not obvious to see because if a port goes down, the information is aged out immediately, without waiting for max-age (~30 seconds).
The 20 sec blocking is only applied when the port is in blocking state when the root port fails.
STP Computation
1. Identify path costs on links.
For each link between switches, write the path cost that each switch uses for the link.
2. Identify the root bridge.
Find the switch with the lowest bridge ID; mark it on the drawing.
3. Select root ports (1 per switch).
For each switch, find the one port that has the best path to the root bridge. This is the one with the lowest root path cost. Mark the port with an RP label.
4. Select designated ports (1 per segment).
For each link between switches, identify which end of the link will be the designated port. This is the one with the lowest root path cost; if equal on both ends, use STP tie-breakers. Mark the port with a DP label.
5. Identify the blocking ports.
Every switch port that is neither a root nor a designated port will be put into the Blocking state. Mark these with an X.
STP Timers
The STP timers can be configured or adjusted from the switch command line.
The timer values never should be changed from the defaults without careful consideration.
The timers and their default values are as follows:
■ Hello Time (2 seconds)—The time interval between Configuration BPDUs sent by the root bridge.
The Hello Time value configured in the root bridge switch determines the Hello Time for all nonroot switches because they just relay the Configuration BPDUs
as they are received from the root. However, all switches have a locally configured Hello Time that is used to time TCN BPDUs when they are retransmitted.
■ Forward Delay (15 seconds)—The time interval that a switch port spends in both the Listening and Learning states.
■ Max (maximum) Age (20 seconds)—Maximum length of time a BPDU can be stored without receiving an update. Timer expiration signals an indirect failure with designated or root bridge.
Bridge keeps a copy of the “best” BPDU that it has heard. If the switch port loses contact with the BPDU’s source (no more BPDUs are received from it), the switch assumes that a topology change must have occurred after the Max Age time elapsed and so the BPDU is aged out.
A reference model of a network having a diameter of seven switches derives these values.Topology Changes
The diameter is measured from the root bridge switch outward, including the root bridge.
The Hello Time is based on the time it takes for a BPDU to travel from the root bridge to a point seven switches away. This computation uses a Hello Time of 2 secs.
To announce a change in the active network topology, switches send a TCN BPDU.
When the root bridge receives a TCN, it starts sending configuration BPDUs with the TCN bit set for a period of time equal to max age plus forward delay.
A topology change (a port on an active switch comes up or goes down) occurs when a switch either moves a port into the Forwarding state or moves a port from the Forwarding or Learning states into the Blocking state.
The switch sends a TCN BPDU out its root port so that, ultimately, the root bridge receives news of the topology change.
Also notice that the switch will not send TCN BPDUs if the port has been configured with PortFast enabled.
The switch continues sending TCN BPDUs every Hello Time interval until it gets an acknowledgment from its upstream neighbor.
As the upstream neighbors receive the TCN BPDU, they propagate it on toward the root bridge and send their own acknowledgments.
When the root bridge receives the TCN BPDU, it also sends out an acknowledgment.
The root bridge sets the Topology Change flag in its Configuration BPDU, which is relayed to every other bridge in the network. This is done to signal the topology change and cause all other bridges to shorten their bridge table aging times from the default (300 seconds) to the Forward Delay value (default 15 seconds).
This condition causes the learned locations of MAC addresses to be flushed out much sooner than they normally would, easing the bridge table corruption that might occur because of the change in topology.
http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/12013-17.html |
A direct topology change is one that can be detected on a switch interface (switch can immediately detect a link failure).
Direct TCN on Catalyst A |
Sw A is STP root, so it start to generate Topology change Notification (TCN) down to Sw B, Sw B send is to Sw C.
Sw C receive new info about STP Root and this BPDU becomes the “best” one received from the root, so port 1/2 becomes the new root port.
With the default STP timers, amount of time SW C takes about two times the Forward Delay period (15 seconds), or 30 seconds total.
Indirect Topology Changes
Indirect TCN |
The link status at each switch stays up, but something between them has failed or is filtering traffic (could be another device, such as a service
provider’s switch, a firewall, and so on).
As a result, no data (including BPDUs) can pass between those switches.
STP can detect and recover from indirect failures, thanks to timer mechanisms.
The sequence of events unfolds as follows:
1.Catalysts A and C both show a link up condition; data begins to be filtered elsewhere on the link.
2.No link failure is detected, so no TCN messages are sent.
3.Catalyst C already has stored the “best” BPDU it had received from the root over port 1/1. No further BPDUs are received from the root over that port.
After the Max Age timer expires, no other BPDU is available to refresh the “best” entry, so it is flushed.
Catalyst C now must wait to hear from the Root again on any of its ports.
4.The next Configuration BPDU from the root is heard on Catalyst C port 1/2. This BPDU becomes the new “best” entry, and port 1/2 becomes the root port. Now the port is progressed from Blocking through the Listening, Learning, and finally Forwarding states.
As a result of the indirect link failure, the topology doesn’t change immediately.
The absence of BPDUs from the root causes Catalyst C to take some action. Because this type of failure relies on STP timer activity, it generally takes longer to detect and mitigate.
In this example, the total time that users on Catalyst C lost connectivity was roughly the time until the Max Age timer expired (20 seconds), plus the time until the next Configuration BPDU was received (2 seconds) on port 1/2, plus the time that port 1/2 spent in the Listening (15 seconds) and Learning (15 seconds) states. In other words, 52 seconds elapse if the default timer values are used
Insignificant Topology Changes
No actual topology change occurred because none of the switches had to change port states to reach the root bridge.
Same network topology, with the addition of a user PC on access-layer switch Catalyst C.
Insignificant Topology Change |
Sequence of events:
1.The PC on Catalyst port 2/12 is turned off. The switch detects the link status going down.
2.Catalyst C begins sending TCN BPDUs toward the root, over its root port (1/1).
3.The root sends a TCN acknowledgment back to Catalyst C and then sends a Configuration BPDU with the TCN bit set to all downstream switches. This is done to inform every switch of a topology change somewhere in the network.
4.The TCN flag is received from the root, and both Catalysts B and C shorten their bridge table aging times. This causes recently idle entries to be flushed, leaving only the actively transmitting stations in the table. The aging time stays short for the duration of the Forward Delay and Max Age timers.
Now every time any PC in the network powers up or down, every switch in the network must age out CAM table entries.
Fortunately, Catalyst switches have a feature that can designate a port as a special case. You can enable the STP PortFast feature on a port with a single attached PC. As a result, TCNs aren’t sent when the port changes state, and the port is brought right into the Forwarding state when the link comes up.
Types of STP
Implementing STP into a switched environment has required additional consideration and modification to support multiple VLANs, because the IEEE and Cisco have approached STP differently
CST - Common Spanning Tree
The IEEE 802.1Q standard specifies how VLANs are to be trunked between switches. It also specifies only a single instance of STP that encompasses all VLANs.
All CST BPDUs are transmitted over trunk links using the native VLAN with untagged frames.
Having a single STP for many VLANs simplifies switch configuration and reduces switch CPU load during STP calculations. However, having only one STP instance can cause limitations, too.
PVST - Per-VLAN Spanning Tree
Cisco has a proprietary version of STP that offers more flexibility than the CST version.
PVST operates a separate instance of STP for each individual VLAN. This allows the STP on each VLAN to be configured independently, offering better performance and tuning for specific conditions.
Multiple spanning trees also make load balancing possible over redundant links when the links are assigned to different VLANs. One link might forward one set of VLANs, while another redundant link might forward a different set.
PVST requires the use of Cisco Inter-Switch Link (ISL) trunking encapsulation between switches. In networks where PVST and CST coexist, interoperability problems occur. Each requires a different trunking method, so BPDUs are never exchanged between STP types
PVST+ - Per-VLAN Spanning Tree Plus
Cisco has a second proprietary version of STP that allows devices to interoperate with both PVST and CST. Operates over both 802.1Q and ISL.
- To communicate with CST, PVST+ exchanges BPDUs with CST as untagged frames over the native VLAN.
- BPDUs from other instances of STP (other VLANs) are propagated across the CST portions of the network by tunneling.
- PVST+ sends these BPDUs by using a unique multicast address so that the CST switches forward them on to downstream neighbors without interpreting them first. Eventually, the tunneled BPDUs reach other PVST+ switches where they are understood.