Title: Impact of Configuration Errors on DNS Robustness
1Impact of Configuration Errors on DNS Robustness
- V. Pappas
- Z. Xu , S. Lu , D. Massey , A. Terzis , L.
Zhang - UCLA, Colorado State, John Hopkins
2Motivation
- DNS part of the Internet core infrastructure
- Applications web, e-mail, e164, CDNs
- DNS considered as a very reliable system
- Works almost always
- Question is DNS a robust system?
- User-perceived robustness
- System robustness
3Motivation Short Answer
Microsoft's websites were offline for up to 23
hours -- the most dramatic snafu to date on the
Internet --because of an equipment
misconfiguration -- Wired
News, Jan 2001
- Thousands or even millions of users affected
- All due to a single DNS configuration error
4Related Work
- Traffic implementation errors studies
- Danzig et al. SIGCOMM92 bugs
- CAIDA traffic bugs
- Performance studies
- Jung et al. IMW01 caching
- Cohen et al. SAINT01 proactive caching
- Liston et al. IMW02 diversity
- Server availability
- To appear OSDI04, IMC04
5Our Work Study DNS Robustness
- Classify DNS operational errors
- Study known errors
- Identify new types of errors
- Measure their pervasiveness
- Quantify their impact on DNS
- availability
- performance
6Outline
- DNS Overview
- Measurement Methodology
- DNS Configuration Errors
- Example Cases
- Measurement Results
- Discussion Summary
7Background
com
foo
buz
bar
bar1
bar2
bar3
8asking for www.bar.foo.com
client
9Infrastructure RRs
- NS Resource Record
- Provides the names of a zones authoritative
servers - Stored both at the parent and at the child zone
com
- A Resource Record
- Associated with a NS resource record
- Stored at the parent zone (glue A record)
foo.com
10What Affects DNS Availability
- Name Servers
- Software failures
- Network failures
- Scheduled maintenance tasks
- Infrastructure Resource Records
- Availability of these records
- Configuration errors
11Classification of Measured Errors
Inconsistency
Dependency
12What is Measured?
- Frequency of configuration errors
- System parameters TLDs , DNS level, zone size
(i.e. the number of delegations) - Impact on availability
- Number of servers lost due to these errors
- Zones availability probability of resolving a
name - Impact on performance
- Total time to resolve a query
- Starting from the query issuing time
- Finishing at the query final answer time
13Measurement Methodology
- Error frequency and availability impact
- 3 sets of active measurements
- Random set of 50K zones
- 20K zones that allow zone transfers
- 500 popular zones
- Performance impact
- 2 sets of passive measurements1-week DNS packet
traces
14Lame Delegation
foo.com. NS A.foo.com. foo.com. NS
B.foo.com.
A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2
com
1) Non-existing server -- 3 seconds perf.
penalty
foo
2) DNS error code -- 1 RTT perf. penalty
3) Useless referral -- 1 RTT perf. penalty
4) Non-authoritative answer (cached)
A.foo.com
B.foo.com
15Lame Delegation Results
16Lame Delegation Results
17Lame Delegation Results
- Error Frequency
- 15 of the zones
- 8 for the 500 most popular zones
- independent of the zones size, varies a lot per
TLD - Impact
- 70 of the zones with errors lose half or more of
the authoritative servers - 8 of the queries experience increased response
times (up to an order of magnitude) due to lame
delegation
18Diminished Server Redundancy
foo.com. NS A.foo.com. foo.com. NS
B.foo.com.
A.foo.com. A 1.1.1.1 B.foo.com. A 2.2.2.2
com
A) Network level - belong to the same subnet
foo
B) Autonomous system level - belong to the
same AS
C) Geographic location level - belong to the
same city
A.foo.com
B.foo.com
19Diminished Server Redundancy Results
- Error Frequency
- 45 of all zones have all servers in the same /24
subnet - 75 of all zones have servers in the same AS
- large popular zones better AS and geo
diversity - Impact
- less than 99.9 availability all servers in the
same /24 subnet - more than 99.99 availability 3 servers at
different ASs or different cities
20Cyclic Zone Dependency (1)
foo.com. NS A.foo.com. foo.com. NS
B.foo.com.
A.foo.com. A 1.1.1.1
com
foo
A.foo.com
B.foo.com
21Cyclic Zone Dependency (2)
foo.com. NS A.foo.com. foo.com. NS
B.bar.com.
A.foo.com. A 1.1.1.1
com
The combination of foo.com and bar.com zones is
wrongly configured
foo
B.bar.com
A.foo.com
22Cyclic Zone Dependency Results
- Error Frequency
- 2 of the zones
- None of the 500 most popular zones
- Impact
- 90 of the zones with cyclic dependency errors
lose 25 (or even more) of their servers - 2 or 4 zones are involved in most errors
23Discussion User-Perceived ! System Robustness
- User-perceived robustness
- Data replication only one server is needed
- Data caching temporary masks infrastructure
failures - Popular zones fewer configuration errors
- System robustness
- Fewer available servers due to inconsistency
errors - Fewer redundant servers due to dependency errors
24Discussion Why so many errors?
- Superficially are due to operators
- Unaware of these errors
- Lack of coordination
- parent-child zone, secondary servers hosting
- Fundamentally are due to protocol design
- Lack of mechanisms to handle these errors
- proactively or reactively
- Design choices that embrace some of them
- Name-servers are recognized with names
- Glue NS A records necessary to set up the DNS
tree
25Summary
- DNS operational errors are widespread
- DNS operational errors affect availability
- 50 of the servers lost
- less than 99.9 availability
- DNS operational errors affect performance
- 1 or even 2 orders of magnitude
- DNS system robustness lower than user perception
- Due to protocol design, not just due to operator
errors
26Ongoing Work
- Reactive mechanisms
- DNS Troubleshooting NetTs 04
- Proactive mechanisms
- Enhancing DNS replication caching
27Thank You!!!
28(No Transcript)