🚀

🚀

Navigate back to the homepage
Work with me!

What is serving the Web?

Wojtek Kulikowski
February 22nd, 2020 · 4 min read

Intro

We all take the Internet for granted. To some extent, even our parents understand how the Internet works. You type facebook.com and you get the access the platform. You’re already logged and are ready to browse hundreds of photos.

For free.

At the snap of your fingers.

Who would have thought how much work went into making it so simple for you.

What is happening behind the scenes?

For the purpose of this article, we will assume 2 pieces of prior knowledge:

  • our machine (further called client) given an HTML file can display its content on the screen. This makes the parent happy, as they can see a new portion memes and pictures of babies.
  • we can request this page by typing proper URI and don’t need to worry about the whole DNS thingy. facebook.com connects me to facebook’s machines, period.

This gives us a nice abstraction to focus on web servers and how does facebook actually expose their machines to the world. What does it take to expose a machine to the world? How does it send the HTML file to the client? Let’s find out!

Internet Protocol

IP is this magical thing that you know from your childhood gaming and the fear of being localized if you made somebody angry during the game.

In reality, it is a formalized way of sending data in the network. IP uses packets, which are data split into pieces. An IP packet consists of a header and a payload. Header provides information about where the packet comes from, travels to, which protocol uses etc. Payloads carries the actual data, for example the html file that we want to display.

IPv4/IPv6, UDP/TCP

These are other terms that you might have encountered somewhere in the past. Let’s quickly go through them:

  • IPv4 and IPv6 are implementations of the Internet Protocol. IPv4 is the most popular choice right now, but due to supporting only 32-bit-long addresses, will be obsolete in the future of IoT and wider internet usage. Both formats support up to 65KB of payload per packet, although the hosts are required to be able to read at least 576 bytes of a packet payload.
  • UDP/TCP are communications protocols and they define how the IP packets are being sent/received. In general, TCP has a larger header, is more reliable and fail-proof. Connection established with TCP will perform a handshake and while using more bandwidth, the information is sure to be passed. UDP, on the other hand, sends much lower (8byte) packets and doesn’t ensure if they actually reached the destination. It is much faster this way (and more suitable for audio/video transimssion), but you can loose same data. Think of it as lags during a video call.

HTTP

//TODO Explain

Physics

This is certainly out of scope of this blogpost, but if you’re curious feel free to check out this article.

Software

Here we come! Finally the more familiar part. After our data reaches the server, how does it know what to do? We may have thousands of processes running, listening on many ports. fortunately there is a standard for that as well.

As humanity (and W3C in particular) we decided that we will listen to all http requests on port 80 and https requests on port 443. This is used as a gateway between your application and the outer world. If you set up a web server between them, you can map port 443 that gets the requests to port 8000, which is a home to your shiny Django app. In 2020, we have 2 major web servers on the market.

Apache HTTP Server

Apache released it’s web server in 1995 and quickly took over the whole market. It was open source, easy to configure and covered all the needs of a LAMP stack application. However, nowadays it has lost much of its popularity to…

Nginx

Nginx is a more modern competitor of apache. Founded in 2004, the development team wanted to build a server which would outperform Apache. They succeeded (and this is why we will use Nginx in the demo below) and in April 2020 their marker share is expected to reach 36,9%, while Apache seats on 24,7%.

Let’s put it all together - Hosting a website on AWS EC2

So, having all this knowledge, let’s build our own Facebook! To do it, we will use the most barebone service offered in AWS – EC2. It is perfect for us, because:

  • we can rent an access to a real computer, with an operating system of choice
  • it’s already connected to public network and has a public IP address
  • doesn’t serve any certain purpose (unlike something like Amazon Beanstalk) so we can install everything on our own

Let’s build Facebook!

First, we need to launch an EC2 instance. We start with no instances… blank ec2 interface

and after a short setup we end up with one! ws ec2 panel

At this stage is really important to open up to the public http traffic. By default, our instance only listens on port 22 devoted to SSH connections. To accept http, go to Security Groups and edit inbound rules ec2 interface with 1 instance

Now, let’s setup Nginx. First, we need to ensure that we have proper access right to the .pem file downloaded during the setup. Then, we use it to log on the machine.

terminal logging to the instance

Now, we need to install nginx. On popular systems like Ubuntu, it is as simple as downloading it from the package manager:

1sudo apt update
2sudo apt install nginx

and then enter

1systemctl start nginx

Now, when we enter the public address provided by AWS, we should get the starting page of nginx!

nginx default page

And with a slight modifications to the source code at

1sudo vim /var/www/html/index.nginx-debian.html

We can start building our own Facebook! New facebook landing page

Not that hard, right?

Summary

Ok, I have bad news - this 1000 words is a great oversimplification, and there is still a lot more going on. Just look at this Github repo trying to encapsulate the whole process in one document. It is huge! But let’s not get discouraged. Computers are cool and no magic, so we all can learn them together 🤞🏻

Do you want to get notified about the next artice? In May I will write about functional programming and its use cases across the industry

Once a month I send a newsletter with a personal note, my current research areas and Twitter accounts I found lately. Sign up and and grow with me 🌱

More articles from Software Engineering by Wojtek

Python Celery for first time users

Celery is healthy for both you and your system, so you two should get familiar

November 30th, 2019 · 4 min read

Yes, you should monitor your system

While some projects may slack on it, logging and monitoring are the best ways to improve the stability of your app

October 22nd, 2019 · 6 min read
© 2019–2020 Software Engineering by Wojtek
Link to $https://twitter.com/wkulikowski1Link to $https://github.com/wkulikowskiLink to $https://www.linkedin.com/in/wkulikowski