Troubleshooting

Once you have learned the fundamentals of routing and bridging and begin to implement them in your network, it is almost inevitable that you will encounter a situation where you need to employ your troubleshooting skills. In this section, we will consider how to troubleshoot both routing and bridging.

The pfSense routing table, which can be found by navigating to Diagnostics | Routes, is a good starting point for learning about which routes exist, how they are configured, and the number of times a route has been used. The table is divided into two sections, one for IPv4 routes and the other for IPv6 routes. Each entry in the table has several columns: Destination is the route's destination, Gateway is the gateway through which the route travels, Use is the number of times the route has been used, Mtu is the maximum transmission unit, Netif is the gateway's interface, and Expire tells us if the route has expired (which may be the case for a temporary route such as an ICMP redirect). There is also a column called Flags which informs us of the flags that are set for this route. The netstat man page provides a complete listing of the flags and what they mean, but some of the more common ones are:

  • U = RTF_UP: Route is usable
  • G = RTF_GATEWAY: Destination requires forwarding by an intermediary
  • H = RTF_HOST: Host entry
  • S = RTF_STATIC: Manually added entry

Since routing encompasses both static routing and dynamic routing, we will first consider static routing. Let's consider a simple example, using the network we considered in the section on static routes. As you might recall, we had a DMZ network with a subnet of 192.168.2.0 which was connected to the LAN network. Assume we also have an OPT1 network directly connected to pfSense with a subnet of 192.168.3.0. Thus, the LAN and OPT1 networks are known to pfSense, while the DMZ network is only known to pfSense via a static route (the DMZ router's IP address of 192.168.1.2 is configured as a gateway). A node on the DMZ network with an IP address of 192.168.2.10 is unable to establish a session with a node on OPT1 with an IP address of 192.168.3.10.

First, we should consider obvious potential issues, such as an interface that is in shutdown mode, or a misconfigured interface. In this case, we begin at 192.168.2.10. The router's WAN IP address is 192.168.1.2, and the LAN-side IP address (not to be confused with the LAN network) is 192.168.2.1. Therefore, the default gateway on 192.168.2.10 should be 192.168.2.1. If it is not configured as such, we need to change it.

Now we need to confirm connectivity with the router, and we can do that by pinging 192.168.2.1 from 192.168.2.10. If the ping fails, then the problem is likely a local issue, and either the router is malfunctioning or is misconfigured. If we can ping the router, however, we can begin to look elsewhere.

You could use the traceroute command (or tracert under Windows) to trace the route to 192.168.3.10, which might be the better solution in a more complex networking scenario, but since our network is fairly simple, pinging the LAN address (192.168.1.1) will tell us a good deal. If the ping is unsuccessful, there are several possibilities:

  • The LAN interface is down or is misconfigured
  • The static route to the DMZ is misconfigured, and therefore pfSense doesn't know where to send the ping replies, or is sending them to the  wrong address

One possible way of eliminating the first of these as a possibility is to try to ping 192.168.1.2 from pfSense, which we can do from the web GUI by navigating to Diagnostics | Ping. If we can ping the router in this manner, then we have proven that the LAN interface is up and running and that there is a path to the DMZ router from pfSense. If we can ping 192.168.1.2 from pfSense but not 192.168.2.1 or 192.168.2.10, however, then there is a good possibility that the static route to the DMZ network is misconfigured.

If the pfSense firewall is reachable from 192.168.2.10, however, then we may need to consider problems with the OPT1 interface. If we can ping 192.168.3.10 from pfSense, then OPT1 is up and running and there is connectivity with the node that 192.168.2.10 is trying to reach. If not, then we have isolated the problem to the OPT1 network.

But what if pfSense can be pinged from 192.168.2.10, and pfSense can ping 192.168.3.10? If so, we have proven both nodes have connectivity with pfSense, and that the static route to the DMZ network is configured correctly. Keep in mind, however, that this is inter-network traffic, and therefore firewall rules apply. We navigate to Firewall | Rules, and click on the LAN tab, since DMZ traffic is coming in through the LAN interface. It is here that we discover the problem: the Allow LAN to any rule only allows traffic to pass if its source is the LAN subnet. Since DMZ traffic doesn't match this requirement, the rule doesn't apply and traffic to OPT1 is not allowed to pass. We need to either modify this rule to allow traffic from the LAN interface to anywhere else to pass regardless of its source, or create another rule to allow traffic from the DMZ network (192.168.2.0) to pass.

Your network topology may be more complex than this, but the same basic troubleshooting techniques apply. Try to employ a divide and conquer approach; keep in mind that ping and traceroute are our friends, and if you are using Cisco switches, you have other command-line tools at your disposal, such as:

  • showip route: This command shows, at the very least, the next hop, as well as other information such as the route metric, total delay, and reliability.
  • showip interface brief: This command can be used to display a summary of the status information for each router interface.
  • showcdp neighbors: This command can be used to display information about neighboring devices discovered during the Cisco Discovery Protocol (CDP). Adding detail to the command will cause it to display such information as network address, protocols, hold time, and the software version.

As our networks get more complex, static routing tends to prove inadequate and we opt for more elegant solutions, such as dynamic routing. But dynamic routing brings with it a whole set of issues which we have to consider. Keep in mind, however, that just as with static routes, we have to consider the obvious issues such as an incorrect gateway setting, or a port being down. Another common issue is that often there is connectivity between routers, but one or more routers (or more likely, specific ports on the router) are not configured to use the routing protocol being used by the rest of the network.

Another potential issue is that as your networks grow, your routers may require more CPU and RAM in order to hold routing tables and calculate dynamic routes. This can be avoided by choosing the right hardware for your network, and upgrading equipment as needed.

More likely, however, you may encounter a looping issue. This may be the case even if you are running STP or RSTP. A misconfigured or malfunctioning switch could still bring your network to its knees. You should make sure all devices are running the same version of STP (either legacy STP or RSTP, but not both). Once you have confirmed this, you can begin to look into other problems. If a switch recently went down, and you are having problems, perhaps the interval for calculating a new spanning tree is too long. It is also possible that there is an issue with convergence – the switch is not recognized as being down by all routers, hence the delay in incorporating this information into the spanning tree.

Another possibility is an incompatibility between versions of the routing protocol being used. For example, RIP v2 is backwards compatible with RIP v1, but not all subsequent versions of routing protocols may be backwards compatible with older versions. If you are running different versions of the same protocol, check your documentation to ensure they are compatible.

Bridging interfaces can cause a number of problems which require troubleshooting. There are several common problems with bridges:

  • The bridge may not be forwarding traffic, or may only be forwarding traffic intermittently
  • The opposite may occur—the bridge causes a storm of duplicate traffic, flooding the network
  • After adding the bridge, the network seems unstable, and it might even cause pfSense to freeze

If the bridge is not forwarding traffic, then it's possible the bridge has not been created properly, or it was created, but one (or both) interfaces is disabled. Since firewall rules still apply to traffic between interfaces in a bridge, there is the possibility that the firewall rules are blocking traffic, so you need to consult the firewall rules for interfaces participating in the bridge.

If the bridge is forwarding traffic intermittently, then there are several possibilities. One is that STP is running on the bridge, and there are so many network topology changes that the spanning tree has to be constantly recalculated. Another possible reason is that there are equipment outages. The bridge forwarding delay adds at least 15 seconds to even the briefest of outages.

If there is a storm of traffic, then the cause is almost certainly a loop. One of the ways of solving this problem is to manually determine where the loops are and break them. The preferable way, however, would be to run STP or RSTP, but running these protocols can sometimes create pitfalls, as outlined above when discussing troubleshooting for static routes.

If the network is unstable and/or pfSense freezes, it is possible this happened because you are using one of the bridged interfaces for remote administration. There is also a possibility that it happened because users are relying on the bridged interfaces for essential network services, such as file sharing. Or it could be that it is a hardware issue with one or more of the bridged interfaces.

Bridging network interfaces usually is not a good idea, especially if there is a more elegant and straightforward solution available. But by employing some common sense troubleshooting techniques, you should be able to get your bridged interfaces to work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset