CCNP Tshoot - Advanced Cisco Catalyst Switch Troubleshooting

Many modern switches (MLS) can also route - make forwarding decisions based on Layer 3 information (for example, IP address information).

Resolving InterVLAN Routing Issues
 - Several years ago, one popular approach to performing interVLAN routing with a Layer 2 switch was to create a router on a sticktopology, where a Layer 2 switch is interconnected with a router via a trunk connection (Router interface has subinterfaces, one for each VLAN).
 - More recently, many switches have risen above their humble Layer 2 beginnings and started to route traffic (Layer 3 switches or Multilayer switches-MLS).

Contrasting Layer 3 Switches with Routers
Layer 3 Switch/Router Shared Characteristics


 - Both can build and maintain a routing table using both statically configured routes and dynamic routing protocols.
 - Both can make packet forwarding decisions based on Layer 3 information (for example, IP addresses).

Layer 3 Switch/Router Differentiating Characteristics
- Routers usually support a wider selection of interface types (for example, non-Ethernet interfaces).
 - Switches leverage ASIC to approach wire speed throughput. Therefore, most Layer 3 switches can forward traffic faster than their router counterparts.
 - A Cisco IOS version running on routers typically supports more features than a Cisco IOS version running on a Layer 3 switch, because many switches lack the specialized hardware required to run many of the features available on a router.

Control Plane and Data Plane Troubleshooting
 - Many router and Layer 3 switch operations can be categorized as control plane or data plane operations.
 - The processes involved in troubleshooting control plane operations are identical on both Layer 3 switch and router platforms. (ex: the same CLI commands could be used to troubleshoot an OSPF issue on both types of platforms).
 - Data plane troubleshooting, however, can vary between Layer 3 switches and routers (data throughput issues, the commands you issued
might vary between types of platforms, because Layer 3 switches and routers have fundamental differences in the way traffic is forwarded through the device).
Cisco Express Forwarding (CEF) to efficiently forward traffic through a router:
 -  forwarding information base (FIB) and the adjacency table are constructed from information collected from the router’s control plane (RIB and ARP cache).

Check control plane operations with commands such as
show ip route
Examine information contained in the router’s CEF FIB andadjacency tables.
show ip cef   <-- Layer 3 forwarding information, in addition to multicast, broadcast, and local IP addresses.
show adjacency  <-- Verifies that a valid adjacency exists for a connected host.

Some Cisco Catalyst switches take the information contained in CEF’s FIB and adjacency table and compile that information into Ternary Content Addressable Memory (TCAM).
The specific way a switch’s TCAM operates depends on the switch platform.
show platform  <-- Cisco Catalyst 3560, 3750, and 4500 switches
show mls cef   <--  Cisco Catalyst 6500
Cisco IOS Software, C3560 Software:
Switch#show platform  tcam  utilization
CAM Utilization for ASIC# 0                      Max            Used
                                             Masks/Values    Masks/values
 Unicast mac addresses:                        784/6272         14/36   
 IPv4 IGMP groups + multicast routes:          152/1216          6/26   
 IPv4 unicast directly-connected routes:       784/6272         14/36   
 IPv4 unicast indirectly-connected routes:     272/2176          8/44   
 IPv4 policy based routing aces:                 0/0             0/0    
 IPv4 qos aces:                                768/768         260/260  
 IPv4 security aces:                          1024/1024         39/39   
Comparing Routed Switch Ports and Switched Virtual Interfaces
 - You can configure the IP address for a collection of ports belonging to a VLAN under a virtual VLAN interface(Switched Virtual Interface - SVI).
 - Although SVIs can route between VLANs configured on a switch, a Layer 3 switch can be configured to act more as a router by using routed ports on the switch.
 -  no switchport - command in interface configuration mode to convert a switch port to a routed port.

Router Redundancy Troubleshooting
 - Cisco offers technologies that provide next-hop gateway redundancy. These technologies include HSRP, VRRP, and GLBP.

HSRP
 - Hot Standby Router Protocol (HSRP) uses virtual IP and MAC addresses.
 - virtual MAC address for an HSRP group begins with a vendor code of 0000.0c, followed with a well-known HSRP code of 07.ac.
 - last two hexadecimal digits are the hexadecimal representation of the HSRP group number.
 - One router, known as the active router, services requests destined for the virtual IP and MAC addresses.
 - Another router, known as the standby router, can service such requests in the event the active router becomes unavailable.
 - By default, HSRP sends hello messages every three seconds.
 - If the standby router does not hear a hello message within ten seconds by default, the standby router considers the active router to be down. The standby router then assumes the active role. (convergence happens more rapidly if an interface is administratively shut down - an active router sends a resignmessage
if its active HSRP interface is shut down).
 - If it were configured for preemption, the newly added router would send a coup message, to inform the active router that the newly added router was going to take on the active role.

HSRP Vritual MAC
Following information about the HSRP group under inspection:
 - Which router is the active router
 - Which routers, if any, are configured with the preempt option
 - What is the virtual IP address
 - What is the virtual MAC address
 - Check to see if a host on the HSRP virtual IP address’ subnet can ping the virtual IP address. (ping / arp -a)
show standby brief
show standby <interface_id>   <--- Active virtual MAC address is 0000.0c07.ac0a
debug standby terse
VRRP
 - Virtual Router Redundancy Protocol (VRRP), allows a collection of routers to service traffic destined for a single IP address.
 - Unlike HSRP, the IP address serviced by a VRRP group does not have to be a virtual IP address.
 - The IP address can be the address of a physical interface on the virtual router master.
 - A VRRP group can have multiple routers acting as virtual router backups

GLBP
 - Global Load Balancing Protocol (GLBP) can load balance traffic destined for a next-hop gateway across a collection of routers, known as a GLBP group.
 - When a client sends an ARP-Request, in an attempt to determine the MAC address corresponding to a known IP address, GLBP can respond with the MAC address of one member of the GLBP group.
 - GLBP has one active virtual gateway (AVG), which is responsible for replying to ARP requests from hosts. However, multiple routers acting as active virtual
forwarders(AVFs) can forward traffic.

Troubleshooting VRRP and GLBP
 - Because VRRP and GLBP perform a similar function to HSRP
show vrrp brief
show glbp brief
Cisco Catalyst Switch Performance Troubleshooting
 - Determine what network component is responsible for the poor performance
 -  Rather than a switch or a router, the user’s client, server, or application could be the cause of the performance issue

Cisco Catalyst Switch Troubleshooting Targets
 - Troubleshooting one of these switches can be platform dependent.
 - Many similarities do exist, however.

Cisco Catalyst switches include the following hardware components:
■ Ports(also known as interfaces): A switch’s ports physically connect the switch to other network devices.
■ Forwarding logic:A switch contains hardware that makes forwarding decisions. This hardware rewrites a frame’s headers.
Backplane: A switch’s backplane physically interconnects a switch’s ports. Therefore, depending on the specific switch architecture, frames flowing through a switch enter via a port (that is, the ingress port), flow across the switch’s backplane, and are forwarded out of another port (that is, an egress port).
Forwarding logic contained in the forwarding hardware comes from the control plane.
Control plane: A switch’s CPU and memory reside in a control plane. This control plane is responsible for running the switch’s operating system. (does
not directly participate in frame forwarding).

 - As a result, a continuous load on the control plane could, over time, impact the rate at which the switch forwards frames.
 - Also, if the forwarding hardware is operating at maximum capacity, the control plane begins to provide the forwarding logic.

Port Errors
Troubleshooting Ethernet - http://sclabs.blogspot.com/2014/09/show-interface-in-depth.html
 - a good first step is to, check port statistics (example: excessive number of frames are being dropped)
 - TCP flows are going into TCP slow start, which causes the window size, and therefore the bandwidth efficiency, of TCP flows to be reduced.
 - packet drops for a UDP flow used for voice or video could result in noticeable quality degradation, because dropped UDP segments are not retransmitted.
 - another possibility is that the cabling could be bad.
show interfaces <interface_id> counters
Mismatched Duplex Settings
 - Duplex mismatches can cause a wide variety of port errors.
 - Almost all network devices, other than shared media hubs, can run in full-duplex mode. Therefore, if you have no hubs in your network, all devices should be running in full-duplex mode.
 - A new recommendation from Cisco is that switch ports be configured to autonegotiate both speed and duplex.
 - mdix auto - The automatic medium-dependent interface crossover (auto-MDIX) feature can automatically detect if a port needs a crossover or a straight-through cable to interconnect with an attached device and adjust the port to work regardless of which cable type is connected.

 - detected by
SW1:
SW1# show interfaces gig 0/9 counters errors
Port   Single-Col Multi-Col  Late-Col  Excess-Col  Carri-Sen  Runts  Giants
Gi0/9  5603       0          5373      0           0          0      0
SW1# show interfaces gig 0/9  include duplex
Half-duplex, 100Mb/s, link type is auto, media type is 10/100/1000BaseTX
SW2:
SW2# show interfaces fa 5/47 counters errors
Port   Align-Err  FCS-Err  Xmit-Err Rcv-Err UnderSize OutDiscards
Fa5/47 0          5248     0        5603    27        0
SW2# show interfaces fa 5/47  include duplex
Full-duplex, 100Mb/s
 - You could change the duplex settings on the switch over which you do have control.
 - Then, you could clear the interface counters to see if the errors continue to increment.

TCAM Troubleshooting
 - The two primary components of forwarding hardware are forwarding logic and backplane.
 - A switch’s backplane, however, is rarely the cause of a switch performance issue, because most Cisco Catalyst switches have high-capacity backplanes.
 - However, it is conceivable that in a modular switch chassis, the backplane will not have the throughput to support a fully populated modular chassis, where each card in the chassis supports the highest combination of port densities and port speeds
 - A switch’s forwarding logic is compiled into a special type of memory called ternary content addressable memory (TCAM),
 - TCAM works with a switch’s CEF feature to provide extremely fast forwarding decisions.
 - If a switch’s TCAM is unable, for whatever reason, to forward traffic, that traffic is forwarded by the switch’s CPU, which has a limited forwarding capability.
 - The process of the TCAM sending packets to a switch’s CPU is called punting.
 - On most switch platforms, TCAMs cannot be upgraded. ( you could either use a switch with higher-capacity TCAMs or reduce the number of entries in a
switch’s TCAM.)
 - Some switches (for example, Cisco Catalyst 3560 or 3750 Series switches) enable you to change the amount of TCAM memory allocated to different switch features. (L2/L3)

Reasons why a packet might be punted from a TCAM to its CPU:
 - Routing protocols (STP + protocols that send multicast or broadcast traffic will have that traffic sent to the CPU)
 - Someone connecting to a switch administratively (Telnet session with the switch) will have their packets sent to the CPU.
 - Packets using a feature not supported in hardware (packets traveling over a GRE tunnel) are sent to the CPU.
 - If a switch’s TCAM has reached capacity, additional packets will be punted to the CPU.
A TCAM might reach capacity if it has too many installed routes or configured access control lists.
From the events listed, the event most likely to cause a switch performance issue is a TCAM filling to capacity.
Please be sure to check documentation for your switch model, because TCAM verification commands can vary between platforms.
show tcam    <--- for Cisco Catalyst 3550 Series switch
show platform tcam    <--- Cisco Catalyst 3560 and 3750 Series switches
Example:
Cisco Catalyst 3550 has three TCAMs.  

Cat3550# show tcam inacl 1 statistics     <---  inacl - access control lists applied in the ingress direction
Ingress ACL TCAM#1: Number of active labels: 3
Ingress ACL TCAM#1: Number of masks allocated: 14, available: 402
Ingress ACL TCAM#1: Number of entries allocated: 17, available: 3311
Conclusion: TCAM number one is not approaching capacity.

On some switch models (for example, a Cisco Catalyst 3750 platform):
show platform ip unicast counts   <--- see if a TCAM allocation has failed
show controllers cpu-interface   <---display a count of packets being forwarded to a switch’s CPU.

High CPU Utilization Level Troubleshooting
 - The load on a switch’s CPU is often low, even under high utilization, thanks to the TCAM. (show processes cpu)

Cat3550# show processes cpu
CPU utilization for five seconds: 19%/15%; one minute: 20%; five minutes: 13%
PID Runtime(ms) Invoked uSecs  5Sec    1Min    5Min    TTY   Process
1   0           4       0      0.00%   0.00%   0.00%   0     Chunk Manager
19 percent CPU load, with 15 percent of the CPU load used for interrupt processing.  (4 percent of the CPU load is consumed with control plane processing)
 - A value as high as ten percent is considered acceptable.
 - Periodic spikes in processor utilization are also not a major cause for concern  if such spikes can be explained (processing routing updates / debug / SNMP).
 - If you determine that a switch’s high CPU load is primarily the result of interrupts, you should examine the switch’s packet switching patterns and check the TCAM utilization.
 - A high CPU utilization on a switch might be a result of STP.

Cisco Catalyst 3750 Series Switches - Troubleshooting High CPU Utilization
http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3750/software/troubleshooting/cpu_util.html