Ex4 Chapter 8 - Network Troubleshooting

Establish and document a network baseline.
Describe the various troubleshooting methodologies and troubleshooting tools.
Describe the common issues that occur during WAN implementation.
Identify and troubleshoot common enterprise network implementation issues using a layered model approach.

Ethernet Standarts

Standard Cabling Maximum length
1000BASE-CX Twinaxial cabling 25 meters
100BASE-FX Two strands, multimode 400 m
1000BASE-LX     Long-wavelength laser, MM or SM fiber                          10 km (SM)3 km (MM)
1000BASE-SX Short-wavelength laser, MM fiber   220 m with 62.5-micron fiber; 550 mwith 50-micron fiber
1000BASE-ZX Extended wavelength, SM fiber 100 km
1000Base-BX single SMF fiber at different light wavelengths (1310nm/1490nm).  

Troubleshooting steps

1) Documenting Network
   a) router / switch documentation
     - Type of device, model designation
     - IOS image name
     - Device network hostname
     - Location of the device (building, floor, room, rack, panel)
     - If it is a modular device, include all module types and in which module slot they are located
     - Data Link layer addresses
     - Network layer addresses
     - Any additional important information about physical aspects of the device
  b) End-system Configuration Table (servers, network management consoles, and desktop workstations)
     - Device name (purpose)
     - Operating system and version
     - IP address
     - Subnet mask
     - Default gateway, DNS server, and WINS server addresses
     - Any high-bandwidth network applications that the end-system runs
   с) Network Topology Diagram

The OSI model provides a common language for network engineers and is commonly used in troubleshooting networks:
The upper layers (5-7) of the OSI model deal with application issues and generally are implemented only in software
The lower layers (1-4) of the OSI model handle data-transport issues. layer (Layer 1) and Data Link layer (Layer 2) are implemented in hardware and software.

Commands that are useful to the network documentation process include:
ping {host | ip-address} - test connectivity with neighboring devices before logging in to them. Pinging to other PCs in the network also initiates the MAC address auto-discovery process.
Sends an echo request packet to an address, then waits for a reply. The host | ip-address variable is the IP alias or IP address of the target system.

traceroute {destination} - Identifies the path a packet takes through the networks. The destination variable is the hostname or IP address of the target system.

telnet {host | ip-address} - log in remotely to a device for accessing configuration information.

show ip interface brief - display the up or down status and IP address of all interfaces on a device.

show ip route - display the routing table in a router to learn the directly connected neighbors, more remote devices (through learned routes), and the routing protocols that have been configured.

show running-config interface - displays contents of currently running configuration file for a particular interface.

[no] debug ? - Displays a list of options for enabling or disabling debugging events on a device.

show protocols - Displays the configured protocols and shows the global and interface-specific status of any configured Layer 3 protocol.

show cdp neighbor detail - obtain detailed information about directly connected Cisco neighbor devices.
 - What does not work?
 - Are the things that do work and the things that do not work related?
 - Has the thing that does not work ever worked?
 - When was the problem first noticed?
 - What has changed since the last time it did work?
 - Can you reproduce the problem?
 - When exactly does the problem occur?

Software Troubleshooting Tools
- Network management system (NMS) tools include device-level monitoring, configuration, and fault management tools. (WhatsUp Gold, Cacti, Zabbix)
- Knowledge Bases (Cisco.com)
- Baselining Tools (SolarWinds LAN surveyor)
- Protocol Analyzers (Wireshark)

Hardware Troubleshooting Tools
- Network Analysis Module (provide a graphical representation of traffic from local and remote switches and routers)
- Digital Multimeters (measure electrical values of voltage, current, and resistance:checking power-supply, netw devices)
- Cable Testers - are specialized, handheld devices designed for testing the various types of data communication cabling.
- Cable Analyzers - are used to test and certify copper and fiber cables for different services and standards.
- Portable Network Analyzers  - are used for troubleshooting switched networks and VLANs.


Network Troubleshooting
Jitter is often used as a measure of the variability over time of the packet latency across a network. A network with constant latency has no variation (or jitter).

Physical Layer Problems

 - Performance lower than baseline (slow or poor performance include overloaded or underpowered servers, unsuitable switch or router configurations, traffic congestion on a low-capacity link, and chronic frame loss.)
 - Loss of connectivity - If a cable or device fails, the most obvious symptom is a loss of connectivity between.
 - High collision counts - Collision domain problems affect the local medium and disrupt communications to Layer 2 or Layer 3 infrastructure devices, local servers, or services.
 - Network bottlenecks or congestion - If a router, interface, or cable fails, routing protocols may redirect traffic to other routes that are not designed to carry the extra capacity. This can result in congestion or bottlenecks in those parts of the network.
 - High CPU utilization rates - are a symptom that a device, such as a router, switch, or server, is operating at or exceeding its design limits. If not addressed quickly, CPU overloading can cause a device to shut down or fail.
 - Console error messages - Error messages reported on the device console indicate a Physical layer problem.

Causes of Physical Layer Problems
- Power-related (most fundamental reason for network failure)
- Hardware faults (Faulty network interface cards (NICs) can be the cause of network transmission errors due to late collisions, short frames, and jabber.)
- Cabling faults - Many problems can be corrected by simply reseating cables that have become partially disconnected. Problems with fiber-optic cables may be caused by dirty connectors, excessively tight bends (turns), and swapped (exchange position) RX/TX connections when polarized.
- Attenuation - loss of communication signal energy (voltage fluctuations or current spikes induced, Random (white) noise that is generated by many sources, noise induced by other cables in the same pathway, noise originating from crosstalk from other adjacent cables or noise from nearby electric cables, devices with large electric motors, or anything else)
- Interface configuration errors - Serial links reconfigured as asynchronous instead of synchronous, Incorrect clock rate, Incorrect clock source, Interface not turned on.
- Exceeding design limits - operating at or near the maximum capacity and there is an increase in the number of interface errors.
- CPU overload - input queue drops, slow performance, router services such as Telnet and ping are slow or fail to respond, or there are no routing updates. One of the causes of CPU overload in a router is high traffic. If some interfaces are regularly overloaded with traffic, consider redesigning the traffic flow in the network or upgrading the hardware.

Symptoms of Data Link Layer Problems

 - No functionality or connectivity at the Network layer or above - Some Layer 2 problems can stop the exchange of frames across a link, while others only cause network performance to degrade.
 - Network is operating below baseline performance levels
   a) Frames take an illogical path to their destination but do arrive. An example of a problem which could cause frames to take a suboptimal path is a poorly designed Layer 2 spanning-tree topology. In this case, the network might experience high-bandwidth usage on links that should not have that level of traffic.
   b) Some frames are dropped. These problems can be identified through error counter statistics and console error messages that appear on the switch or router. In an Ethernet environment, an extended or continuous ping also reveals if frames are being dropped.
 - Excessive broadcasts -  it is important to identify the source of the broadcasts( Poorly programmed or configured applications, Large Layer 2 broadcast domains, Underlying network problems, such as STP loops or route flapping)
 - Console messages - The most common console message that indicates a Layer 2 problem is a line protocol down message.

Causes of Data Link Layer Problems
 - Encapsulation errors - occurs when the encapsulation at one end of a WAN link is configured differently from the encapsulation used at the other end.
 - Address mapping errors (in Frame Relay, an incorrect map is a common mistake, L2/L3 dynamic mapping:
Devices may have been specifically configured not to respond to ARP or Inverse-ARP requests, The Layer 2 or Layer 3 information that is cached may have physically changed, Invalid ARP replies are received because of a misconfiguration or a security attack).
 - Framing errors - can be caused by a noisy serial line, an improperly designed cable (too long or not properly shielded), or an incorrectly configured channel service unit (CSU) line clock.
 - STP failures or loops - The purpose of Spanning Tree Protocol (STP) is to resolve a redundant physical topology into a tree-like topology by blocking redundant ports.
Most STP problems revolve around these issues:
  a) Forwarding loops that occur when no port in a redundant topology is blocked and traffic is forwarded in circles indefinitely. When the forwarding loop starts, it usually congests the lowest bandwidth links along its path. If all the links are of the same bandwidth, all links are congested. This congestion causes packet loss and leads to a downed network in the affected L2 domain.
b) Excessive flooding because of a high rate of STP topology changes. The role of the topology change mechanism is to correct Layer 2 forwarding tables after the forwarding topology has changed. This is necessary to avoid a connectivity outage because, after a topology change, some MAC addresses previously accessible through particular ports might become accessible through different ports. A topology change should be a rare event in a well-configured network. When a link on a switch port goes up or down, there is eventually a topology change when the STP state of the port is changing to or from forwarding. However, when a port is flapping (oscillating between up and down states), this causes repetitive topology changes and flooding.
c) Slow STP convergence or reconvergence, which can be caused by a mismatch between the real and documented topology, a configuration error, such as an inconsistent configuration of STP timers, an overloaded switch CPU during convergence, or a software defect.

Symptoms of Network (L3) Layer Problems
In most networks, static routes are used in combination with dynamic routing protocols. Improper configuration of static routes can lead to less than optimal routing and, in some cases, create routing loops or parts of the network to become unreachable.
 - General network issues - may include the installation of new routes, static or dynamic, removal of other routes, and so on.
 - Connectivity issues - problems such as outages and environmental problems such as overheating, bad port, cabling, ISP.
 - Neighbor issues - any problems with the routers forming neighbor relationships.
 - Topology database - if the routing protocol uses a topology table or database, check the table for anything unexpected, such as missing entries or unexpected entries.
 - Routing table - check the routing table for anything unexpected, such as missing routes or unexpected routes. Use debug commands to view routing updates and routing table maintenance.

Transport (L4) Layer Torubleshooting
Common Access List Issues - The most common issues with ACLs are caused by improper configuration. A useful command for viewing ACL operation is the log keyword on ACL entries.
Common NAT Issues - The biggest problem with all NAT technologies is interoperability with other network technologies, especially those that contain or derive information from host network addressing in the packet (BOOTP and DHCP, DNS and WINS, SNMP, Tunneling and encryption protocols).
One of the more common NAT configuration errors is forgetting that NAT affects both inbound and outbound traffic. Improperly configured timers can also result in unexpected network behavior and suboptimal operation of dynamic NAT.

Application Layer Overview
Application layer protocols are typically used for network management, file transfer, distributed file services, terminal emulation, and e-mail. However, new user services are often added, such as VPNs, VoIP, and so on.
 It is possible to have full network connectivity, but the application simply cannot provide data.

Troubleshooting Application Layer Problems
1) ping default gateway
2) ping needed host
3) verify acl ( show access-list, clear access-list counters)
4) verify nat (show ip nat translations, clear ip nat translation *, debug ip nat )
5) troubleshoot upper layer

Troubleshooting Commands for IOS Devices

# General Information
show version
show inventory
show environment
show tech-support
show debug

# CPU Utilization
show processes
show processes cpu
show processes cpu history
show platform cpu packet buffered
debug platform packet all

# Memory Utilization
show memory
show processes memory
show region
show buffers

No comments :

Post a Comment