This post is based on the commonly referenced question, "what happens when you type www.google.com into the browser and hit enter, but uses https://www.holbertonschool.com/ as the example input.
Before we begin, most browser’s have developer tools, which make it helpful to understand more about the code that powers a website. Since for the time being, I most often use Google Chrome, I’ve provided an example below of a section of the Chrome developer tools that I will reference in this post. To learn more about Chrome Dev Tools, visit, Network Analysis Reference.
- Browser, Cache, Hosts File
- DNS request, TCP/IP
- Firewall, HTTPS/SSL, Load-balancer
- Web server, Application server, Database
Local Machine and Local Area Network:
Browser auto-complete: your browser has it's own auto-complete system based on history, cache, and bookmarks. Assuming the site has never been visited, auto-complete will not be used, and the user will input the entire string,
FQDN: The user input in this case happens to be a FQDN (Fully Qualified Domain Name) because it uses the format:
[host name].[domain].[tld].. TLD stands for Top Level Domain, and the
. at the end of the URL is implied by the browser.
Keyboard, Machine, Application: The Keyboard interprets that the
enter button (i.e.
'\r' (carriage ret), ascii value:
13) is hit, which sends a signal to the listening applications on the computer, which handles the event according to instructions.
Parse URL: The browser parses the URL (Uniform Resource Locator) for the protocol, domain, and resource. In this instance, the protocol is
HTTPS (Hyper Text Transfer Protocol Secure) the domain is holbertonschool.com and the resource is the main index page,
/. If neither protocol nor domain are input, the browser will use the input as a search term for the default search engine such as
DNS Local lookup: The browser checks if the domain is in it's cache of domain & IP address pairs, if it is not there, the browser checks the machine's local
hosts file to attempt to resolve the IP address. If the IP address is not found, then the browser proceeds to connect with the DNS. In Linux, the default
hosts file is located at the path:
/etc/hosts, and has this default content:
$ cat /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts
Browser HSTS list: The browser checks the cached HSTS (HTTP Strict Transport Security) list of websites that have requested to be contacted via HTTPS only. If the website is in the list, the browser sends its request via HTTPS; otherwise, the initial request is sent via
HTTP: HTTP (Hyper Text Transfer Protocol) is essentially the guidelines for sending and receiving data (or strings of hypertext) through the internet .
Wide Area Network:
Start of Authority: If the domain is not found in the browser cache or local
hosts file, the browser begins a search for the SOA (Start of Authority). The SOA is simply a record of information; this is some of the information in the SOA:
- NS: The primary name server for the domain
- Email: A domain-name (FQDN) for the party responsible for this zone
- TTL: Time to Live, time in seconds the record will be cached in servers
- Refresh: time, in seconds, before the zone will be refreshed
- Retry: time, in seconds, before a failed refresh is retried
- Negative Cache: time, in seconds, an unfound record is cached
A quick solution to find the SOA is with the
$ host -t soa holbertonschool.com holbertonschool.com has SOA record ns-1455.awsdns-53.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
DNS Search: The search for the appropriate IP address begins with a search at the DNS Resolver. For most users, their DNS resolver is their ISP (Internet Service Provider) or they may have a faster or more public alternative such as Google DNS (
188.8.131.52) or OpenDNS (
184.108.40.206). Google's NameBench provides analytics on DNS Servers available for your computer to use.
Root DNS: If the IP address is not found at the ISP or whatever other first DNS Server is initially referenced, the request is forwarded to one of the 13 Root DNS Servers. In my case, the request is sent to be resolved by:
l.root-servers.net, 220.127.116.11, 2001:500:9f::42, ICANN. The Root Server responds with the .com TLD (Top Level Domain) servers address. The TLD servers are queried until the primary Name Servers for holbertonschool.com are found. This Name Server is what is stored in the Start of Authority Record.
WHOIS Record: holbertonschool.com is listed as being registered with gandi.net whois servers, and therefore, the DNS with gandi.net has a zone file with configurations for the domain holbertonschool.com. The whois information for this domain can be found at: whois.icann.org. You may also run the command
$ whois holbertonschool.com to find this information. holbertonschool.com is configured to be pointed to Amazon Name Servers:
NS-1455.AWSDNS-53.ORG. A brief excerpt from the WHOIS record here below shows more of the listing for holbertonschool.com:
Raw WHOIS Record Domain Name: holbertonschool.com Registry Domain ID: 1950068353_DOMAIN_COM-VRSN Registrar WHOIS Server: whois.gandi.net Registrar URL: http://www.gandi.net Updated Date: 2017-07-11T18:25:57Z Creation Date: 2015-07-30T09:53:51Z Registrar Registration Expiration Date: 2018-07-30T09:53:51Z Registrar: GANDI SAS Registrar IANA ID: 81 Registrar Abuse Contact Email: email@example.com Registrar Abuse Contact Phone: +33.170377661 Reseller: Holberton Education Domain Status: clientTransferProhibited http://www.icann.org/epp#clientTransferProhibited Registry Registrant ID: Registrant Name: Julien Barbier Registrant Organization: Holberton Education Name Server: NS-1455.AWSDNS-53.ORG Name Server: NS-792.AWSDNS-35.NET Name Server: NS-176.AWSDNS-22.COM Name Server: NS-1619.AWSDNS-10.CO.UK
IP Address Resolved: From Amazon’s NameServers the IP address of the actual website content for holbertonschool.com can be resolved. The IP happens to be:
18.104.22.168, as you can tell from the image above of the Chrome Dev Tools.
Internet Protocol: There are two types of Internet Protocols that transfer data using "packets". Packets contain the requested data including the actual hypertext and more information about it. To learn more about packets, visit What is a Data Packet? TCP/IP (Transmission Control Protocol / Internet Protocol) and UDP (User Datagram Protocol). With TCP, data can be sent bidirectional, meaning after a response is received from the requesting server, a verification is sent as another response. UDP is a less complex Internet protocol in which a response is sent to the requesting server without verification. UDP may be faster, but also is more prone to have errors if there are interruptions in the response. TCP is commonly referred to as TCP/IP because TCP is responsible for the data delivery of a packet and IP is responsible for obtaining the address.
Request and TLS Handshake: Once the Browser receives the IP (Internet Protocol) address and the destination server (the port defaults to port 80), the browser makes a request to the IP address of holbertonschool.com using TLS (Transport Layer Security) protocol, which is also known as the TLS Handshake. Since, holbertonschool.com uses
HTTPS protocol, both server's will attempt to establish a secure connection before transferring any data.
Physical Server: Once the IP address is resolved, the browser directs the request for
/ the root index to the resolved IP address (i.e. 22.214.171.124). Since Amazon Web Services hosts holbertonschool.com's, data, the request is sent to a physical machine that is listed as one of Amazon's Data Centers on the AWS Global Infrastructure diagram. From there, AWS, likely has a virtual server, which contains the holbertonschool.com network information.
Firewall: Even before a connection to the IP address can be made, however, the request must pass through a firewall. A firewall is a network security device that monitors incoming and outgoing network traffic and decides whether to allow or block specific traffic based on certain criteria, which is most often based on the number of requests per second and the IP address of the request. For more on Firewall's check out, What is a Firewall?
HTTPS/SSL: Since holbertonschool.com uses SSL (Secure Sockets Layer), an encrypted link between the Browser and the Server will need to be established (SSL Handshake). The securely encrypted connection via HTTPS (Hyper Text Transfer Protocol Secure) uses cryptography techniques known as a public key encryption. Essentially, a public key encrypts plaintext into ciphertext, which is then sent to another server with a private key that decrypts the ciphertext into plaintext using a private key. HTTPS also ensures the authenticity of the server hosting holbertonschool.com using a certificate provided from a Certificate Authority. For more on how HTTPS establishes a secure connection using a key exchange with prime numbers such as the Diffie-Hellman algorithm, check out How HTTPS Secures Connections: What Every Web Dev Should Know.
Load-balancer: Assuming that the SSL encrypted link has been established and the request passes through the firewall, the Load Balancer will decide which local port/ IP/ location of Holberton's Network will process the request. A load balancer distributes traffic across a network of servers based on an algorithm that determines the optimal server to process the request. The algorithm could equally distribute the requests, or base the distribution on server processing speed and other factors. For more on load balancer algorithms check out kemptechnologies.com. A commonly used Load Balancer is HAProxy, which happens to be open-source technology. There may be another SSL encryption between the Load Balancer and the next step, the web server.
Web server: The Web Server that Holberton Uses is
nginx/1.10.2 as identified by Chrome dev tools. The web server serves HTML files, images, and other types of data that may be transferred through HTTPS. Nginx has configurations that instruct it on what "Response" to send based on the input "Request." If the website has dynamic content such as
Node.js or a
python application, then an application server is used to coordinate with the web server.
Application server: Also called an application or web framework is a more complex system for taking requests and serving content. An application server is more complex because it can utilize server side code libraries and API’s to have various interactions and responses to requests and can even interact with a database or other servers.
Gunicorn is an example of a
python application server.
nginx happens to have the versatility of being both a web and application server. For more information on application servers, check out this thread at stackoverflow.com.
Database: A database is simply an organized way of storing data. In the case of holbertonschool.com, the have various users that login to the application process. The database could store all user information including passwords, username, id, permissions, and the steps that they completed in the application process. An example of a database that we use is
mysql. According to mysql.com:
A database is a structured collection of data. It may be anything from a simple shopping list to a picture gallery or the vast amounts of information in a corporate network. To add, access, and process data stored in a computer database, you need a database management system such as MySQL Server. Since computers are very good at handling large amounts of data, database management systems play a central role in computing, as standalone utilities, or as parts of other applications.
Response: Based on the request, a response is constructed and returned to the requesting IP address. In the case of the request for holbertonschool.com, the HTML (Hyper Text Markup Language) for
index.html is returned through the firewall. Assuming the response passes the firewall, the browser making the request will receive the response and begin to interpret it. In this case, the browser will make 71 other requests for images and all the other files based on all the data needed from the loaded and interpreted HTML.
Network Infrastructure Stats:
The examples that I’ve used are generally classified as a
LAMP Software Bundle that uses open-source software solutions. L for Linux, A for Apache (or nginx), M for mysql, and P for python (or PHP, Perl). Because the two load balancers are on separate servers and there is monitoring of all components, there is no single point of failure. This setup would be considered a high availability network as it would very rarely have any down time, if ever. During maintenance or updates, the Load Balancer is switched to Active/ Passive mode, in which the passive Server is used for updates or maintenance. Then once complete, the servers switch roles until Active/ Active is achievable and then is established as a more optimal mode. In this way, there will be no downtime for patches or bugs. In the event of errors, this structure would establish automatic protocols and notifications. For example in the event that the database does not function, there can be systems in place to restart it, or spawn an entirely new instance of the database in a new Docker Container.
Finally the Browser receives the response, and the data is interpreted to load a neat, clean website with informative and engaging content. To learn more about how the browser parses the response and renders clean content, check out, How Browsers Work: Behind the scenes of modern web browsers.