We all take the Internet for granted. To some extent, even our
parents understand how the Internet works. You type
and you get the access the platform. You’re already logged and are
ready to browse hundreds of photos.
At the snap of your fingers.
Who would have thought how much work went into making it so simple for you.
What is happening behind the scenes?
For the purpose of this article, we will assume 2 pieces of prior knowledge:
- our machine (further called client) given an HTML file can display its content on the screen. This makes the parent happy, as they can see a new portion memes and pictures of babies.
- we can request this page by typing proper URI and don’t need to worry about the whole DNS thingy.
facebook.comconnects me to facebook’s machines, period.
This gives us a nice abstraction to focus on web servers and how does facebook actually expose their machines to the world. What does it take to expose a machine to the world? How does it send the HTML file to the client? Let’s find out!
IP is this magical thing that you know from your childhood gaming and the fear of being localized if you made somebody angry during the game.
In reality, it is a formalized way of sending data in the network. IP uses packets, which are data split into pieces. An IP packet consists of a header and a payload. Header provides information about where the packet comes from, travels to, which protocol uses etc. Payloads carries the actual data, for example the html file that we want to display.
These are other terms that you might have encountered somewhere in the past. Let’s quickly go through them:
- IPv4 and IPv6 are implementations of the Internet Protocol. IPv4 is the most popular choice right now, but due to supporting only 32-bit-long addresses, will be obsolete in the future of IoT and wider internet usage. Both formats support up to 65KB of payload per packet, although the hosts are required to be able to read at least 576 bytes of a packet payload.
- UDP/TCP are communications protocols and they define how the IP packets are being sent/received. In general, TCP has a larger header, is more reliable and fail-proof. Connection established with TCP will perform a handshake and while using more bandwidth, the information is sure to be passed. UDP, on the other hand, sends much lower (8byte) packets and doesn’t ensure if they actually reached the destination. It is much faster this way (and more suitable for audio/video transimssion), but you can loose same data. Think of it as lags during a video call.
This is certainly out of scope of this blogpost, but if you’re curious feel free to check out this article.
Here we come! Finally the more familiar part. After our data reaches the server, how does it know what to do? We may have thousands of processes running, listening on many ports. fortunately there is a standard for that as well.
As humanity (and W3C in particular) we decided that we will listen to all http requests on port 80 and https requests on port 443. This is used as a gateway between your application and the outer world. If you set up a web server between them, you can map port 443 that gets the requests to port 8000, which is a home to your shiny Django app. In 2020, we have 2 major web servers on the market.
Apache released it’s web server in 1995 and quickly took over the whole market. It was open source, easy to configure and covered all the needs of a LAMP stack application. However, nowadays it has lost much of its popularity to…
Nginx is a more modern competitor of apache. Founded in 2004, the development team wanted to build a server which would outperform Apache. They succeeded (and this is why we will use Nginx in the demo below) and in April 2020 their marker share is expected to reach 36,9%, while Apache seats on 24,7%.
Let’s put it all together - Hosting a website on AWS EC2
So, having all this knowledge, let’s build our own Facebook! To do it, we will use the most barebone service offered in AWS – EC2. It is perfect for us, because:
- we can rent an access to a real computer, with an operating system of choice
- it’s already connected to public network and has a public IP address
- doesn’t serve any certain purpose (unlike something like Amazon Beanstalk) so we can install everything on our own
Let’s build Facebook!
First, we need to launch an EC2 instance. We start with no instances…
and after a short setup we end up with one!
At this stage is really important to open up to the public http traffic.
By default, our instance only listens on port 22 devoted to SSH
connections. To accept http, go to Security Groups and edit
Now, let’s setup Nginx. First, we need to ensure that we have proper
access right to the
.pem file downloaded during the setup. Then, we
use it to log on the machine.
Now, we need to install nginx. On popular systems like Ubuntu, it is as simple as downloading it from the package manager:
1sudo apt update2sudo apt install nginx
and then enter
1systemctl start nginx
Now, when we enter the public address provided by AWS, we should get the starting page of nginx!
And with a slight modifications to the source code at
1sudo vim /var/www/html/index.nginx-debian.html
We can start building our own Facebook!
Not that hard, right?
Ok, I have bad news - this 1000 words is a great oversimplification, and there is still a lot more going on. Just look at this Github repo trying to encapsulate the whole process in one document. It is huge! But let’s not get discouraged. Computers are cool and no magic, so we all can learn them together 🤞🏻