This is part 3 of a multiple part series on web performance improvements. The first Introductory blog can be found here. In previous part we discussed about the performance measurement tools/plugins for a web application. In this part we will study what is DNS lookup and how it affects web application performance.
Whenever we type any URL (host name of the web application/page) in browser which is connected to Internet, the request has to be converted to IP address. It’s quite difficult for user to remember the IP address of web site, but quite easy to remember the name. The IP address is necessary for the browser to make its request. Internet is based on finding servers through IP addresses. This is where Domain Name System (DNS) comes in picture. DNS maps hostnames to IP addresses, just as phonebooks map people’s names to their phone numbers. When we type http://www.abc.com into browser, a request will be made by browser to DNS resolver which returns that server’s IP address.
The biggest benefit of DNS is it adds a layer of indirection between URLs and the actual servers that host them. If due to any reason/s if the IP address of the server changes, DNS will allow users to use the same hostname to connect to the new server.
In case we want to improve performance of the website or increase availability of the site, we can host the application on multiple servers and DNS will help us in identifying the closest server. By adding multiple IP addresses to a hostname, we can achieve high degree of redundancy for a web site.
The biggest drawback of DNS is the time taken to look up the IP address for a given hostname, called DNS lookup. Ideally, this time ranges from 20-120 milliseconds. The browser can’t download anything from this hostname until the DNS lookup process is completed. This response time depends on the DNS resolver typically provided by ISP, the load of requests on it, proximity to it, and bandwidth speed.
DNS lookups are cached for better performance at different places. This caching can occur on a special caching server maintained by the user’s ISP or local area network. As shown in figure below, after a user requests a hostname, the DNS information remains in the operating system’s DNS cache (“DNS Client service” on Microsoft Windows). Due to this caching any future request for that hostname will not require any DSN lookups, at least for a while.
Most browsers have their own caches which is different than the operating system’s cache. As long as the browser keeps a DNS record in its own cache, it doesn’t bother the operating system with a request for the record.
If the DSN entry is not found for requested host name at browser’s cache then a request is made to the operating system’s DNS for the same. If the operation system DNS has entry for the host name that is returned to the browser. Else request is forwarded to the remote server.
If the DNS entry is not present at operating system DNS then the request is forwarded to remote server.
As the request forwarded from one level to another the impact on application performance will increase. The change in IP address complicates the situation. The cache consumes memory. Therefore, the DNS records have to be periodically flushed from the cache. There are many configuration settings which determine how often they are discarded.
The DNS caching time is decided by the server with the help of time-to-live (TTL) value. The DNS record returned from lookup contains a TTL value. TTL tells client how long the record can be cached. The operating system caches always follow TTL. But the browsers caches often ignore it and set their own time limits. Furthermore, the Keep-Alive feature of the HTTP protocol can override both the TTL and the browser’s time limit.
Browsers put a limit on the number of DNS records cached, regardless of the time the records have been in the cache. If the user visits many different sites with different domain names in a short period of time, earlier DNS records are discarded and the domain must be looked up again. If the browser discards a DNS record, the operating system cache might still have it, and that saves the day because no query has to be sent over the network, thereby avoiding what could be noticeable delays.
When the client’s DNS cache is empty (both browser and operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, style-sheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups. Perfect example is Google (http://www.google.com) with only one DNS lookup necessary for the page.
But Google page has minimum couple of downloads, so having a single host name is ideal. But most of websites in the internet world are not so lucky. Reducing the number of unique hostnames means reducing the amount of parallel downloading that takes place in the page. It means avoiding DNS lookups may cuts response times, but will reduce parallel downloads and may increase response times. Some amount of parallelization is always good to speed up things, even if it increases the number of hostnames.
If we take Google.com example, there are only two components in the page. Components are downloaded two per hostname in parallel. So in this case using one hostname minimizes the number of possible DNS lookups while maximizing parallel downloads.
Most pages in today’s world have a number of components. They are not lean as Google. So we can split these components across at least two to four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads. The advantage of using Keep-Alive is that it reuses an existing connection, thereby improving response times by avoiding TCP/IP overhead.