The Domain Name System (DNS) is a hierarchical naming system for computers, services, or any resource connected to the Internet. Clearly, as it helps Internet users locate resources such as web servers, mailing hosts, and other online services , DNS is one of the core and most important components of the Internet. Unfortunately, besides being used for obvious benign purposes, domain names are also popular for malicious use. For example, domain names are increasingly playing a role for the management of botnet command and control servers, download sites where malicious code is hosted, and phishing pages that aim to steal sensitive information from unsuspecting victims.
In a typical Internet attack scenario, whenever an attacker manages to compromise and infect the computer of an end user, this machine is silently transformed into a bot that listens and reacts to remote commands that are issued by the so called botmaster. Such collections of compromised, remotely controlled hosts are common on the Internet, and are often used to launch DoS attacks, steal sensitive user information, and send large numbers of spam messages with the aim of making a financial profit. In another typical Internet attack scenario, attackers set up a phishing website and lure unsuspecting users into entering sensitive information such as online banking credentials and credit card numbers. The phishing website often has the look and feel of the targeted legitimate website (e.g., an online banking service) and a domain name that sounds similar.
One of the technical problems that attackers face when designing their malicious infrastructures is the question of how to implement a reliable and flexible server infrastructure, and command and control mechanism. Ironically, the attackers are faced with the same engineering challenges that global enterprises face that need to maintain a large, distributed and reliable service infrastructure for their customers. For example, in the case of botnets, that are arguably one of the most serious threats on the Internet today, the attackers need to efficiently manage remote hosts that may easily consists of thousands of compromised end user machines. Obviously, if the IP address of the command and control server is hard-coded into the bot binary, there exists a single point of failure for the botnet. That is, from the point of view of the attacker, whenever this address is identified and is taken down, the botnet would be lost.
Analogously, in other common Internet attacks that target a large number of users, sophisticated hosting infrastructures are typically required that allow the attackers to conduct activities such as collecting the stolen information, distributing their malware, launching social engineering attempts, and hosting other malicious services such as phishing pages. In order to better deal with the complexity of a large, distributed infrastructure, attackers have been increasingly making use of domain names. By using DNS, they acquire the flexibility to change the IP address of the malicious servers that they manage. Furthermore, they can hide their critical servers behind proxy services (e.g., using Fast-Flux) so that their malicious server is more difficult to identify and take down.
The goal of passive DNS analysis is to detect malicious domains that are used as part of malicious operations on the Internet. To this end, the technique performs a passive analysis of the DNS traffic that they have at their disposal. Since the traffic they monitor is generated by real users, they assume that some of these users are infected with malicious content, and that some malware components will be running on their systems. These components are likely to contact the domains that are found to be malicious by various sources such as public malware domain lists and spam blacklists. Hence, by studying the DNS behavior of known malicious and benign domains, the goal is to identify distinguishable generic features that are able to define the maliciousness of a given domain.
Clearly, to be able to identify DNS features that allow to distinguish between benign and malicious domains, and that allow a classifier to work well in practice, large amounts of training data are required. As the offline dataset, the reseachers recorded the recursive DNS (i.e., RDNS) traffic from Security Information Exchange (SIE) . They performed offline analysis on this data and used it to determine DNS features that can be used to distinguish malicious DNS features from benign ones. The part of the RDNS traffic they used as initial input to their system consisted of the DNS answers returned from the authoritative DNS servers to the RDNS servers. An RDNS answer consists of the name of the domain queried, the time the query is issued, the duration the answer is required to be cached (i.e.,TTL) and the list of IP addresses that are associated with the queried domain. Note that the RDNS servers do not share the information of the DNS query source (i.e. the IP address of the user that issues the query) due to privacy concerns.
By studying large amounts of DNS data, the researchers defined 15 different features that they use in the detection of malicious domains. 6 of these features have been used in previous research, in particular in detecting malicious Fast-Flux services or in classifying malicious URLs.
To determine the DNS features that are indicative of malicious behavior, the researchers tracked and studied the DNS usage of several thousand well known benign and malicious domains for a period of several months. After this analysis period, they identified 15 features that are able to characterize malicious DNS usage. The table taken from the scientific publication gives an overview of the components of the DNS requests that they analyzed (i.e., feature sets) and the features that they identified. The complete features that the researchers use in the detection and their rationale for selecting these features are explained in detail in the full scientific publication.