|
|
A
new traceroute
Paris traceroute
is a new version of the well-known network diagnosis and measurement tool.
It addresses problems caused by load balancers with the initial implementation
of traceroute.
Traceroute's
deficiencies
Routers can
spread their traffic across multiple paths using a per-packet, per-flow,
or per-destination policy. In per-flow load balancing, packet header
information ascribes each packet to a flow, and the router forwards all
packets belonging to a same flow to the same interface. A natural flow
identifier is the classic five-tuple of fields from the IP header
and either the TCP or UDP headers: Source IP address, Destination
IP address, Protocol, Source port, and Destination
port. We found through our experiments that load balancers actually
use combinations of these fields, as well as three other fields: the IP
Type of Service (or TOS), and the ICMP Code and ICMP
header checksum fields.
Per-flow load balancing ensures that packets from the same flow are delivered
in order. Per-packet load balancing makes no attempt to keep packets
from the same flow together, and focuses purely on maintaining an even
load. Per-destination load balancing could be seen as a coarse
form of per-flow load balancing, as it directs packets based upon the
destination IP address. But, as it disregards source information, there
is no notion of a flow per se. As seen from the measurement point of view,
per-destination load balancing is equivalent to classic routing which
is also per destination, and so we will not explore it here. Whether a
router balances load per-flow or per-packet depends on the router manufacturer,
the OS version, and how the network operator configures it.
Where there
is load balancing, there is no longer a single route from a source to
a destination. In the case of per-packet load balancing, a given packet
might take any one of a number of possible routes. With per-flow load
balancing, the notion of a single route persists for packets belonging
to a given flow, but different flows for the same source-destination pair
can follow different routes. In this regard, designing a new traceroute
able to uncover all routes from a source to a given destination would
be a significant improvement.
Indeed, the current traceroute is not adequate to that task, as it cannot
definitively identify one single route from among many. It suffers from
two systematic problems: it fails to discover true nodes and links, and
it may report false links. These problems are due to the random manner
in which its probe packets are directed by a load-balancing router, or
load balancer. Our explanation of these two problems draws on the
example in Figure 1. In this example, L is a load balancer at hop 1 from
the traceroute source. The true router topology from hops 1 through 4
is shown on the left. Routers are represented as circles and each of their
interfaces is numbered. We also show the probe packets sent with TTL 1
to 4. The packets are depicted as yellow arrows, either above the topology,
if L directs them to A, or below, if L directs them to B. At the right
side, we present the topology that would be inferred given these probe
packets.
Figure 1 Traceroute's deficiencies under load balancing
Missing
nodes and links
By default, traceroute sends three probes per hop. We can imagine a scenario
where all three probes to distance 2 are directed to A. Device B is not
discovered, and the links (L,B) and (B,D) can thus not be inferred. With
purely random load balancing, there is a significant probability, of 0.53
x 2 = 0.25, that one of the two devices, A or B, would be left undiscovered
by a traceroute through this topology. In the general case, there may
be more than two next-hop interfaces following a load balancer. For instance,
the newer Juniper routers permit up to sixteen equal-cost paths. It is
certain that classic traceroute misses many nodes, and the possibility
to infer many links.
False
links
In our example, L directs the probe dedicated to hop 2 to A and the one
to hop 3 to B, which leads to the mistaken inference of a link between
A and D. Many topology inference tools send one probe packet per hop to
reduce the probing overhead. Consequently, these tools cannot easily detect
that a single hop responds with multiple interfaces. In the default usage
of traceroute (i.e., three probes per hop), it is easier to identify instances
of hops that respond with multiple interfaces. In our example, let us
assume that by sending three probes we find both A and B at hop 2, and
C and D at hop 3. A liberal approach may infer the existence of links
(A,C), (A,D), (B,C), and (B,D), which would lead to two false links. On
the other hand, a more conservative approach would require ignoring much
data, which would lead to more missing nodes and links. There is no satisfactory
solution with the existing traceroute.
False links may be caused by either per-packet or per-flow load balancing.
In the latter case, false links arise because traceroute relies upon the
ICMP replies to its probe packets in order to identify devices along a
route. A router that sends a Time Exceeded message encapsulates the IP
header of the packet that it discarded, plus the first 8 octets of data
which, in the case of UDP, TCP, or ICMP probes, means the first 8 octets
of the relevant transport-layer header.
Traceroute considers these 28 octets in order to match replies to probes.
In the UDP probes that it sends out, traceroute sets a high destination
port number, and systematically varies this number with each probe
so that there will be a unique identifier in the returning ICMP packet.
For ICMP probes, traceroute uses the sequence number field as the probe
identifier, which in turn impacts the checksum of the ICMP header. Unfortunately,
as we have found, varying any field in the first four octets of the transport-layer
header amounts to systematically varying the flow identifier with each
probe.
How we correct it
We introduce
Paris traceroute, a new traceroute to respond to load balancing routers.
Its key innovation is to control the probe packet header fields in a manner
that allows all probes towards a destination to follow the same path in
the presence of per-flow load balancing. It also allows a user to distinguish
between the presence of per-flow load balancing and per-packet load balancing.
Unfortunately, due to the random nature of per-packet load balancing,
Paris traceroute cannot perfectly enumerate all paths in all situations.
But it can do considerably better than the classic traceroute, and it
can flag those instances where there are doubts. The problem, if one wishes
to maintain certain header fields constant, is that traceroute still needs
to be able to match response packets to their corresponding probe packets.

Figure 2 The IP, UDP and
ICMP headers. Per-flow load balancers
use the grey fields to identify a flow. Red arrows
show the fields incremented by classic traceroute.
Paris traceroute uses the green fields to identify probes
Paris traceroute
does this by varying header fields that are within the first 28 octets,
but are not used for load balancing. In the case of TCP probes, Paris
traceroute varies the sequence number. In UDP probes, it is the checksum
field. This requires manipulating the payload to yield the desired checksum,
as packets with an incorrect checksum are liable to be discarded. In ICMP
probes, it is a combination of the ICMP identifier and the sequence number.
We carefully set the value of the ICMP identifier and sequence number
to keep constant the header checksum of all probes to a destination. Figure
2 summarizes the IP, UDP, and ICMP header fields that are used by load
balancers, classic traceroute, and Paris traceroute. We include the ICMP
type field as used for load balancing, even if we cannot verify it experimentally
(routers only repond to one type of probes, which is the ICMP echo request).
Our experiments with UDP, TCP, and IPSec probes do give strong evidence
that routers blindly use the first four octets after the IP header combined
with the IP fields to do load balancing.
Download the Paris traceroute's
source code
|
|
|