On the Internet, the Domain Name System (DNS) is a critically important directory service that translates to and from a raw IP address (such as 207.46.197.32) and a domain name (such as microsoft.com). This allows people to interact with software via domain names, which are easier to remember than numerical IP addresses.
More importantly, it allows computer-friendly but user-unfriendly IP addresses to change without affecting human users. Thus people can still expect to find the same information behind the user-friendly domain names, and need not be concerned if Microsoft Corporation changes the IP address on one of its host computers, as the domain name microsoft.com is sufficient, thanks to DNS, to find their computers regardless of which IP address the Microsoft administrator has assigned to those hosts.
DNS is a hierarchical federated database, distributed widely across many host computers on the public Internet, and it also has a set of application protocols for interacting with the database. DNS names must comply with standards on the public Internet, but need not do so in a private internet where DNS is still useful. The original purpose of DNS was to translate a domain name to an IP address (forward DNS), and an IP address to a domain name (reverse DNS),[1] but in recent years there have been ongoing attempts to expand the purpose and functionality of DNS in the public Internet. Further, because the lookup process for DNS superficially appears to resemble the lookup process for searching on the world wide web, it has become easy to confuse the purposes of a DNS lookup with a search-engine lookup. These two kinds of lookups have very different goals and occur at vastly different levels within the internet protocol stack. This article will explain the functions and purposes of the Domain Name System, the nature of its distributed and hierarchical database, and the protocols for accessing it. It will also note how the functions of DNS differ markedly from those of search engines, since this seems to be a matter of frequent confusion on the part of learners. In lay terms, you might think of DNS as like the white pages in a traditional phone book, and search engines as more like the yellow pages.
As the white page type lookup service of the public Internet, DNS has been attacked by hostile programs either attempting to disrupt Internet traffic or divert users to illicit host machines. The distributed and simplistic approach taken by DNS has proved, historically, surprisingly resilient against such attacks, but as the size and importance of the public Internet has grown, so have the security concerns related to DNS. This article, or its related sub-articles, will also address basic DNS security issues.
DNS was first introduced for use on the Internet in 1983, with the first specification written by Paul Mockapetris.[2] Mockapetris' first DNS implementation was called JEEVES, and replaced the ARPANET (pre-Internet) environment with few enough computers that a single file, hosts.txt
, was sufficient to contain all connected computer names and their numeric addresses.[3] Its designers, however, did not think of it as anything like a search engine, with the ability to seek a name corresponding to an idea (e.g. "pizza"), but to work with explicit names already known by the application. Manually maintaining and sharing host files became impractical as the scale of the Internet grew, and DNS was designed and implemented as the solution to the problem of scalable host name resolution.
Note well: all DNS was designed to do was replace the hosts.txt
file that had the name to address mappings for every computer in the ARPANET. That's all. DNS was not designed to be a search engine. Search engines hadn't been invented, since, after all, the Web had not been invented.
Protocol designers | Name & address authorities | System administrators |
---|---|---|
Standard formats for resource data. | Addresses for the root servers | The definition of zone boundaries |
Standard methods for querying the database | Unique assignments of domain names | Master files of data (i.e., sets of Resource Records (RR) |
Standard methods for name servers to refresh local data from foreign name servers. | Operation, perhaps with delegation of the root servers and top-level domain servers | Statements of the refresh policies desired |
Over the years, it has taken on more technical and administrative roles. These include providing additional information for the names and addresses, especially for security; the DNS infrastructure itself needed to be enhanced to be secure and trusted. [4] DNS originally was manually configured, but there have been a variety of extensions to allow dynamic operation, such as the temporary binding of an address to a name.
The domain name space, as well as the address spaces both for Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6), are under the authority of the Internet Corporation for Assigned Names and Numbers (ICANN), with much delegation of administration. The original system only handled IPv4, so one of the first steps for IPv6 support was defining how to represent IPv6 addresses in DNS. [5] Berkeley Internet Name Domain (BIND), first deployed in BSD 4.3 UNIX and written by Kevin Dunlap, was the first widespread DNS implementation. BIND is now public domain code supported by the Internet Software Consortium [6]
In the years DNS has served, Internet technology and operational issues changed. When the new IPv6 address format came into use, the need to change name-to-address mapping tools to handle that format is understandable.
Less obvious, but still necessary, is the new requirement to have a capability to track dynamically assigned addresses when there is no central address server. Domain Name System dynamic update can do such tracking, but dynamic update at this level is a security vulnerability. Address assignment spoofing is, by no means, the only threat to DNS, and an entire set of Domain Name System security (DNSSEC) extensions are being deployed.[4]
The U.S. government now requires DNSSEC for all Federal information systems, effective December 2009.[7]
The DNS namespace is hierarchical. Individual domain and host names within it have a textual representation, from right to left, which mirrors the tree that makes up the schema of the DNS:
en.citizendium.com
appears to have three components, but actually has four. The naming hierarchy is a tree, with increasingly specific levels reading right to left.
From what can be seen in the textual example,
citizendium.com
technical administrator.What cannot be seen is the hierarchically "zeroth" highest part, the root. If a part usually suppressed were displayed,
en.citizendium.com.
The rightmost dot identifies the root of the DNS tree. In actual practice, there are multiple root servers, for which addresses are in an explicit file, a representative of which is found at http://www.internic.net/zones/named.root
It is defined as:
This file holds the information on root name servers needed to initialize cache of Internet domain name servers (e.g. reference this file in the "cache . <file>" configuration file of BIND domain name servers).
A fully qualified domain name can be traced from the hierarchically lowest host name to the root. For example, en.citizendium.org
goes from the host en
all the way up to the top-level domain .org
, which is connected to the root.
A computer within the second-level domain citizendium.org
could refer to the subdomain en
, which would be a relative domain name; most DNS applications would append the current domain to the right of the host name. k12.en.citizendium.org
is a hypothetical subdomain of en.citizendium.org
; an arbitrary host could be larry.en.citizendium.org
and the DNS software would understand if it is dealing with a host or a domain.
The administrative process of DNS name assignment involves both DNS registries and DNS registrars
DNS registries' fundamental role is to operate the data base for their top-level domain (TLD), and authorize registrars as "retail" agents to provide customer service. The bulk of TLDs are national, and use International Organization for Standardization (ISO) two-letter country codes (e.g., Canada = .ca, China = .cn, Germany = .de). In the majority of cases these country codes must be from the ISO 3166-1 list. However, there have been a few exceptions, usually for historical reasons. For example the ISO 3166-1 code for the United Kingdom is gb, but for historical reasons the assigned TLD is .uk. While the .gb TLD does exist, it has only one subdomain and does not accept new registrations. A few country codes, such as Tuvalu's .tv, form attractive branding, and the country has few internal registrants but considerable income from outside registrants.
New TLDs are created by the Internet Corporation for Assigned Names and Numbers (ICANN), who then delegates the registry function to an organization that contracts with ICANN. Some new or proposed TLDs have been quite controversial, such as the .xxx domain for pornography. Others, which offer some competitive commercial service, may take much time and effort to create, since multiple organizations may want to be the registry.
Remember that the public Internet, while international from the start, began as a U.S. project. A small set of non-national TLDs were created for early convenience. Country codes were not, at first, used, and the majority of registrations still go into the best-known .com. While the ".cc" country codes had gradually been used, they were formalized in the 1998 U.S. Department of Commerce White Paper about moving the U.S. government out of Internet operations.
Some countries have a rational system where they use the "traditional" major suffix, or a variant of it, as a second-level domain, such as .co.uk, or .ac.uk. This, however, has not always been done in an intuitive or consistent manner. A relatively naive user might expect .com.uk to be correct in line with the international .com, but .co.uk is in fact correct. Based on this the user may then think that .or.uk would be the equivalent of .org, but in this case .org.uk is correct.[8] Similarly one would expect that either .edu.uk or .ed.uk would correspond to .edu. But neither of these are correct, and instead .ac.uk is used for higher education colleges and universities, and .sch.uk for primary and secondary schools.
Top-level domain | Registry | Comments |
---|---|---|
.aero | Société Internationale de Télécommunications Aéronautiques SC, (SITA) | Sponsored by air transport industry |
.com | Verisign | Unsponsored |
.edu | Educause | Under U.S. government agreement, ending in 2011 |
.net | Verisign | Unsponsored |
.mil | Defense Information Systems Agency | U.S. government agency |
.org | Public Interest Registry (PIR) | Unsponsored; not-for-profit |
.biz | NeuLevel, Inc. | Unsponsored |
There is a continuing business, political, and technical argument about the desirability of more TLDs, especially from those that want TLDs that are suggestive of the business purpose of a registrant. From a technical standpoint, while a proliferation of TLDs would not, as once suspected, seriously impact DNS performance, it would be likely to increase customer support cost due to the likelihood of making mistakes and getting the wrong domain.
There are also legal issues of intellectual property involved in domain disputes.
Registrars are the "retail" side of DNS operation. In .com and many other TLDs, they are profit-making entities. They deal with organizations that wish to acquire particular domain names, verifying the name is available, and then handling the administrative interaction with the domain registry.
Most registrars are reasonable and ethical. They may be subdivisions of companies that can sell additional services, such as web server hosting, to domain registrants. Frequently, they have user support functions that will help new DNS administrators set up their zone files, or they may actually operate name servers on behalf of registrants. If there is a dispute over the rights to a domain name, one's registrar can be a valuable ally.
There are registrars that compete for the business of large hosting centers and other organizations that need many domain names, typically discounting the registration fee to multiple-domain customers. It is to the advantage of a registrar to keep its existing customers, as most domains will be renewed, producing a continuing income stream. Registrars want to avoid "churn", a name for customers changing to other registrars.
Some registrars, unfortunately, act against the original Internet tradition of it being a shared resource, and DNS being a service. Domain registrations expire annually, although one can pay the registrar to renew it automatically. It is not uncommon for certain registrars to look for domain names that expire in the near term, domains that were registered by a different registrar, and send the domain administrators what appear to be legitimate renewal notices. If completed and returned with payment, such a registrar will indeed renew the domain name — but transfer it away from the existing registrar.
When the ARPANET, and then the Internet, were new, DNS was seen as a simple mechanism to avoid memorizing or typing host addresses. As the Internet became more commercial, domain names acquired business value, since new users were apt to look for "company" at company.com
. Indeed, as unpleasant to the DNS-knowledgeable ear as it may be, there are a substantial number of enterprises that have "dot-com", or sometimes other TLDs, as part of their corporate name.
Another argument, the details of which involve intellectual property issues beyond the scope of this article, is the legal theory that a trademark must be "defended" or risks going into the public domain. If a second-level domain is identical to a trademarked company name, does the company have exclusive rights to it? Intellectual property attorneys have often argued that a well-known-company is not "defending" its trademark if it allows a domain to be created with its name, so there has been a tendency that whenever some TLD ".new" is created, trademark holders rush to register "well-known-company.new". Speculators, meanwhile, rush to do so before the trademark holder can do so, and, if successful, sell the rights to the domain at a very high price.
One especially hotly argued issue is whether sexually-oriented businesses should have a .xxx TLD; some of those arguing for it also want to restrict access to sexually-oriented content, which would be identified by the TLD. Obviously, there would be no way to enforce keeping sexually-oriented content in .xxx, but it could reasonably be assumed that, if a domain were in .xxx, it was sexually-oriented. After six years of debate the .xxx TLD was approved in June 2010, and is expected to be launched in early 2011.[9]
One of the most confusing things to newcomers to DNS is the difference between a domain and a zone. One way to look at it is that a domain declares a range of potential names, while the zone defines the names actually in use. Formally, a [sub]domain is a namespace that need not have names in it. The basic source of name information that goes into a particular space is a zone file, created manually or with software assistance.
Let us consider citizendium.org, which could have every valid character string as a subdomain from the shortened aaaa.citizendium.org to zzzz.citizendium.org. That are domains, comparable to the Citizendium name spaces such as Main, Talk, User, and CZ, in the sense that, ignoring lengths, the Main or Talk userspaces can have articles from Aaaa to Zzzz. Not all those article names, however, are meaningful.
If, however, there are only actual hosts named en.citizendium.org, test.citizendium.org, reid.citizendium.org, and locke.citizendium.org, Citizendium's zone file would have only four host entries. To continue the analogy with CZ name spaces, the name file would be the set of articles, in each name space, which actually exist. Main: Zzzz is not an article; Main: Zero is an article.
Just as the DNS namespace is a tree of domains, the actual information in that namespace can be regarded as a tree of zone files.
Name servers are computers that contain information about domains, all the way up to the root. Be sure to understand the difference between the abstraction of a domain or subdomain namespace, and the zone file that describes the contents of that namespace and actually runs in a name server. The primary name server is authoritative for domains, and contains the master copy of the zone file for that domain.
Name servers can contain more than one zone file; indeed, this is the usual case when there are domains with subdomains.
Depending on the implementation, a name server may cache information in addition to what it learned from the zone file. For example, a local cache file in a name server could contain data about name-address relationships outside the domain, but which have been needed by a client within that domain. The name server may also contain limited-lifetime dynamic name updates, which might or might not be accessible from outside the domain.
RFC1034, the basic DNS conceptual specification, describes two ways, one optional and one required, for looking up names.[10] The same logic is relevant inside a domain that has caching nameservers.
At each of the levels of the DNS hierarchy — top-level, second level, etc. — is an abstract namespace. No other second-level domain could have notcz.citizendium.org, but the administrator of citizendium.org is not obligated to have any number of subordinate hosts or domains. There is a subtle distinction between the abstraction of a name space, and a zone file that actually defines the hosts and subdomains in the zone. Name spaces define possible records; zone files contain actual records within that space, plus a few special cases such as "glue" records to name servers outside that space. wikipedia.citizendium.org is part of the citizendium.org namespace, but, since there is no such host, it is not in any zone file.
Zone files are made up of resource records (RR). All RRs have several common properties:
While there are many graphic tools for creating RRs, the basic textual syntax is:
[owner] IN [class] [rdata]
For example, the RR defining the address associated with the name XX.LCS.MIT.EDU[11]
XX.LCS.MIT.EDU. IN A 10.0.0.44
Class | RR Name | Function | Typical RDATA |
---|---|---|---|
SOA | Start Of Authority | Defines the start of a zone or a subzone; subordinate records inherit parameters | Multiple fields |
A | Address IPv4 | Specifies the IPv4 address for a host | IPv4 Address |
AAAA | Address IPv6 | Specifies the IPv6 address for a host | IPv6 Address |
PTR | "Pointer" | Reverse mapping of address to name | Name |
CNAME | Canonical name | Specifies an alias name for an address | Address |
NS | Name server | (usually) An address of a name server one level of domain hierarchy above the current domain | Address |
MX | Mail exchanger | Defines the start of a zone or a subzone; subordinate records inherit parameters | A 16 bit preference value (lower is better) followed by a host name willing to act as a mail exchange for the owner domain. |
An additional complexity of RRs is that they may contain wildcards. The simplest example is a " * " character that will match any string in a name expression. In specific situations, this is an extremely useful function, but it can complicate troubleshooting.[12]
In 2003, Verisign, who operates the .com registry, inserted a wildcard into the master DNS files, so that an undefined name, rather than returning an error message, would be redirected to one of the registry's commercial search engines.[13] If the World Wide Web alone were the only function on the Internet, this might, although revenue-generating, have been useful. Unfortunately, there are many other functions on the Internet. In particular, messaging application protocols such as the Simple Mail Transfer Protocol (SMTP) would use the "host not found" information to conclude that mail to that host was undeliverable.
A quite useful use for a wildcard, however, would be in a split DNS application, with different name resolution policies on different sides of a firewall. On the public Internet side of the firewall, the DNS server for example.com
would have explicit records for the organization's public web server, mail server, and other public servers. Any reference to "inside" addresses, however, would be handled by the record:
*.example.com IN A [outside address of the firewall]
Domain Name System security, however, does not have a complete solution to working with wildcarded RRs.
To understand basic DNS, assume that it is being used in a single organization, which has one technical and administrative authority in control. In other words, the domain and its subdomains are homogeneous. While there may be minor exceptions due to the existence of temporarily cached data in individual clients and servers, and not all clients and servers may be able to view all parts of the highest-level domain, a single organization's DNS is essentially a distributed database, where there are multiple copies of a single "golden copy" of information.
Once one starts interconnecting domains under different authority, as in the Internet, both administrative and technical aspects change. First, it is understood that while the total collection of all domains conceptually have access to all public name information, no single domain will have a copy of all information. Rather than being a distributed data base, it has become a federated data base, where there is a common indexing and retrieval model, but requests may need to go to multiple servers, in multiple domains and subdomains, before the request is satisfied.
Second, even between well-recognized business partner organizations, there are trust issues. Third, there are miscreants actively attacking the DNS, for reasons from ideology to technical status to pure criminal revenue.
The administrator of a homogeneous domain (and its subdomains) starts by building a zone file that defines the names and addresses of hosts in that zone, optional additional information to be added to the responses, and to a higher-level nameserver that helps connect the domain of the zone to other domains. For example, if one was in a.com
, one would have to go to the nameserver of .com
to find the address of the b.com
nameserver.
The zone/domain name starts the record; it must end with a trailing period. Assume that it is sub.example.com.
In the resource data, the first field is the primary name server that is in this domain, as opposed to the name server in the NS record, which is above and outside the current domain. In this case, it might be ns1.sub.example.com.
Next comes the mail address of the person or role responsible for the data in this domain, written not in the conventional user@domain
, but in the syntax of a DNS name in a zone file. To create a mail address, replace the leftmost period with an "@" symbol and remove the trailing period.
" administrator.sub.example.com.
" is changed to " administrator@sub.example.com
".
Following the administrator are several parameters that may have defaults, but should be known. The first is the serial number of this version of the zone file, which will increase whenever this file is updated.
The next four are timers for the domain, specified in seconds:
While no two RRs should have the same label and type and data all equal, it is perfectly possible to have RRs with the same label and type, but different RDATA. For example, a physically multihomed server could have four network interface cards (NIC), each on a different subnet. The set of addresses for this host name (i.e., label) would reasonably form a set of four A records with different address data. Such a set of records is called a Resource Record Set (RRSet). [14]
The root name server zone file is expected to be retrieved, by anonymous FTP, from various well-known sites approved by ICANN. In practice, most DNS implementations ship with a recent copy. Root servers remain very busy. [3] In fact, while the root server zone file mentioned above will give the names and addresses of root servers in the general form
a.root-servers.net
the address of a particular server is of the anycast type; [15] there are multiple physical computers with that address, for fault tolerance and load sharing.
For each domain, there must be at least one, and preferably more than one name server that holds the zone files. Primary domain servers have the authoritative zone files, and secondary domain servers keep an exact copy of the primary's zone file. Both types are assumed to have a disk or other storage from which they can restore the domain information.
A secondary server will use a zone transfer to obtain the primary zone file for its domain. There are various operational reasons why a physical server might act as primary and secondary for multiple zones; the important point here is that a zone transfer, as opposed to ordinary DNS retrieval, alters the contents of the definitions and must be treated as a sensitive operation.
The nameserver also can take dynamic transfers, which, strictly speaking, do not have to be secured, but dynamic update, especially in an IPv6 environment, is so open an invitation to miscreants that it should never be considered without being secured. DNS security is the normal way this might be done, but there are other alternatives, such as an encrypted link between the update source and the nameserver.
There are also caching-only servers that contain only the names and addresses that have been recently looked up, and are still valid with respect to the TTL parameter in the relevant records.
The program, on a host, which is the client of DNS servers is most often called a resolver. Depending on the local network architectural implementation, a resolver may go to a caching-only server, a secondary server, or the primary server for its information. It may retain a cache of recently retrieved DNS information, clearing items from cache as their TTLs expire.
While there will be different federated databases, DNS is certainly not limited to the public Internet. It is quite common for organizations to have split DNS "inside the firewall" and "outside the firewall". An inside user will query local DNS for the address of an internal machine and get the address of the actual host, but, if it asks for the address of citizendium.com
, the address returned by DNS may well be that of the "inside" interface of a firewall, or other security middlebox[16] Depending on the firewall implementation, it may deny access, or create a proxy connection to the outside host. To establish that connection, the middlebox will query an "outside" DNS, which contains the addresses of the organization's public hosts, but primarily contains the addresses of external hosts. In some cases, that outside DNS enjoys some trust with an external organization, and may do secured zone transfers. More often, however, the outside DNS is primarily a cache of name-address information that it obtained by queries to the nameservers of other domains.
The most basic DNS protocols are the lookup service, which runs over port 53 of the connectionless User Datagram Protocol, and the zone transfer service, which also runs over port 53 of the connection-oriented Transmission Control Protocol.[17] Lookup is a read-only function, while zone update is read-write and should be implemented as a privileged, authenticated operation. Otherwise any client on a DNS server's network could request a zone transfer, and receive a complete copy of a zonefile, which is a security risk.
There are also protocols for dynamic update, so that network clients can automatically update their DNS servers to reflect correct hostnames (e.g. if they dynamically receive a different IP address via DHCP). This concept is also known as Dynamic DNS. [18]
These include Domain Name System dynamic update, use of the DNS as a data base in Public Key Infrastructure (PKI) for general security, Domain Name System security (DNSSEC) and name-based routing and load distribution.