Computer scienceFundamentalsEssentialsWeb technologies

How the web works?

10 minutes read

How the web works

As an April Fools' joke in 1990, a technical proposal humorously suggested sending internet data using carrier pigeons carrying tiny memory cards strapped to their legs. Known officially as RFC 1149 ("IP over Avian Carriers"), this proposal was published as an RFC, or Request for Comments, a type of document that internet engineers use to propose technical standards. Surprisingly, some Norwegian Linux enthusiasts actually tested this joke in 2001 by sending small data packets ("ping" commands) via real pigeons across a five-kilometer distance. Although several pigeons were distracted, half the packets eventually arrived, taking around an hour round-trip.

While focusing on application logic is practical, understanding the internet's underlying details provides a more complete picture, which is essential for full-stack development. We often think of the internet as an abstract 'cloud,' a space where data exists and which somehow enables us to share information instantly across the globe. However, as the carrier pigeon proposal demonstrates, the internet relies on a very real physical infrastructure. In this topic, we will explore, piece by piece, the different components and concepts to provide an overview of how this remarkable human invention works.

Network

A network, in its simplest form, is just two or more computers linked together so they can share information. Think about your home network: your laptop, phone, perhaps a printer, are often connected, wirelessly via Wi-Fi or through Ethernet cables, to a central box called a router. This arrangement creates a Local Area Network, or LAN.

LAN

Within this local network, each device needs a unique identifier to be recognized. This is the MAC address (Media Access Control address), a permanent hardware address assigned to the network interface, looking something like 00:1A:2B:3C:4D:5E. Devices on your home LAN use these MAC addresses to send data directly to each other, ensuring information reaches the correct computer or printer within that specific local area. But what happens when you need to reach a computer that's not on your home network, maybe one located halfway across the country or even the world?

Network of Networks: The Internet

The core idea of the internet is simply expanding the concept of a local network, like the one in your home, to a global scale. It's still fundamentally about connecting computers so they can share information, whether wirelessly or through physical cables. While a MAC address is like a device's permanent serial number, it's only useful for identifying devices on the same local network. It doesn't contain any information about where the network is located in the world. To send data globally, we need an address that routers can use to navigate the vast web of interconnected networks. For this global routing, we need IP addresses (Internet Protocol addresses), which act like unique global identifiers for devices on the Internet. It provides a unique identifier that can be located from anywhere in the world, allowing data packets to be routed across continents to the correct destination.

IP Addresses and ISP

You will encounter two versions of IP addresses. The older, more familiar version is IPv4, which looks like four numbers separated by dots (e.g., 192.0.2.10). When it was created, its structure allowed for about 4.3 billion unique addresses, which seemed like an enormous number at the time. However, with the explosive growth of internet-connected devices, from computers and phones to smart TVs and watches, the world has essentially run out of available IPv4 addresses.

To solve this problem, a new version was created: IPv6. Its addresses are much longer and more complex, looking like 2001:0db8:85a3::8a2e:0370:7334. The number of unique addresses IPv6 can provide is staggering: roughly 340 undecillion (that's 340 followed by 36 zeros). This massive address space ensures we won't run out again, easily accommodating the billions of new devices connecting to the internet every year. Today, both systems run in parallel, and your internet traffic likely uses a mix of both.

So, how does your home network get one of these crucial public IP addresses? This is handled by your Internet Service Provider (ISP). The ISP is the company you pay for your internet connection (like Comcast, AT&T, or Verizon). They operate a large regional network and manage the physical connection that links your home's local network to the global internet. When you connect, the ISP assigns your router a public IP address from their pool of available addresses. For most home users, this is a dynamic IP address, meaning it can change from time to time. Businesses that need a consistent address for their servers will often pay extra for a static IP address, which does not change.

Physical Infrastructure

The internet's origins trace back to a US military project called ARPANET, designed for resilient communication. As it evolved and became commercialized, the network expanded rapidly. This growth led to the development of specialized, powerful computers optimized not just for general tasks, but specifically for storing large amounts of data (like websites, videos, and emails) and handling requests from many users simultaneously. We call these specialized computers servers. Often, hundreds or thousands of these servers are housed together in large, secure, purpose-built facilities called data centers.

data center

These servers, data centers, and countless individual networks are all physically linked together, primarily through massive networks of fiber optic cables. Unlike the pigeons from RFC 1149, today’s data travels at nearly the speed of light through these cables, which run underground and stretch across ocean floors to connect continents. Along these physical pathways, devices called routers act like postal sorting offices, reading the destination IP address on incoming data and directing it along the most efficient route to its destination.

fiber optics and the Internet

How?

  • Internet Data Transfer: Packets and Protocols

A key design principle of the Internet is reliability. Information needs to find its way to the destination even if some network paths are damaged or busy. The solution is to break data down into smaller pieces. Instead of sending a large file or webpage all at once, it's divided into many small, numbered units called packets. Think of it like sending a long letter as a series of numbered postcards. Each packet contains not only a piece of the original data but also crucial addressing information in its header, including the sender's and receiver's IP addresses and the packet's sequence number.

This packet approach offers major advantages. If one path becomes congested or fails, routers can send subsequent packets via different physical routes. If a single packet gets lost or corrupted along the way, only that small piece needs to be resent, not the entire message. But how do computers manage this complex process of breaking down, sending, receiving, and reassembling data correctly? They follow a set of strict rules, known as protocols.

You can think of a protocol as a common language and a set of instructions that all connected devices have agreed to use. Just as two people need to speak the same language to have a conversation, different computers made by different manufacturers need to follow the same protocol to communicate. This ensures that a device in one part of the world can understand the data sent from another, regardless of what hardware or software each is running. Without these standardized rules, the global internet would be chaotic.

The most fundamental protocols underpinning the internet are TCP/IP (Transmission Control Protocol / Internet Protocol).

  1. IP (Internet Protocol): This is the part responsible for addressing and routing. Its job is simply to get each individual packet from its source to the correct destination IP address. It acts like the postal service, focused only on delivering the envelope to the right mailbox, not on what's inside or in what order the envelopes arrive.

  2. TCP (Transmission Control Protocol): This protocol provides reliability. It works at both the sending and receiving ends. Before sending, TCP is responsible for chopping the data into numbered packets. At the destination, TCP waits for the packets, checks the sequence numbers to ensure none are missing, and reassembles them in the correct order to perfectly reconstruct the original file. If TCP notices a packet is missing, it requests the sender to resend just that one packet.

Together, the TCP/IP suite provides the reliable, ordered data stream that makes the Internet work. IP handles the delivery of individual packets, and TCP makes sure the entire message gets there intact.

  • DOMAIN NAME SYSTEM

Of course, when you want to visit a website, you don't type in a numerical IP address like 172.217.160.142. Instead, you use a human-friendly domain name like www.jetbrains.com. The system that translates these easy-to-remember names into the IP addresses computers need is called the Domain Name System (DNS). You might imagine it as a globally distributed, constantly updated phonebook for the internet.

When you enter a domain name like www.jetbrains.com, your computer first contacts a DNS resolver, which is a server (often operated by your ISP) whose job is to find the IP address you need. If this resolver has recently looked up the same address, it will have the answer stored in its memory (a "cache") and can return it instantly.

If not, the resolver begins a query, asking a series of increasingly authoritative servers. Think of it like a chain of command. The resolver might first ask a root server, which acts like a master index for the entire internet. The root server won't know the IP for jetbrains.com, but it knows which server manages all .com addresses and will point the resolver there. The resolver then asks the .com server, which in turn knows which specific name server is the official record-keeper for the jetbrains.com domain. Finally, the resolver asks the JetBrains name server, which provides the correct IP address. This IP is then sent back to your computer so it knows where to send its packets.

To make this entire process even faster, especially for popular content, companies often use a Content Delivery Network (CDN). A CDN is a network of servers placed in various locations around the world. These servers store copies, or a cache, of frequently accessed data like images and videos. When you request a video, a CDN can deliver it from a server that is physically much closer to you, rather than from the original server halfway across the world. This significantly reduces the travel time for data packets, making websites and applications feel much faster.

Conclusion

In this topic, we saw how a request from your computer follows a physical path. The Domain Name System acts as the internet's phonebook to find a server's IP address, and your data is then broken down into packets. These packets travel from your local network, through an ISP, and onto the network of fiber optic cables. This process shows that the internet is not an abstract 'cloud' but a real, physical system. The carrier pigeon story, while a joke, correctly illustrated the core idea: information must be physically delivered.

For a full-stack developer, understanding this physical layer provides a crucial foundation. It explains why some connections are slow, how data integrity is maintained with protocols like TCP, and what is happening when a user can't reach a server. Knowing that the internet is a tangible system of hardware and agreed-upon rules is the first step to truly understanding how the web works.

8 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo