Computer scienceSystem administration and DevOpsCommand lineWorking with files

File descriptors and pipes

7 minutes read

When working with files, you often need to write information to them. Scripts can make this task much easier. But before we learn how to write data to files, let's get to know what file descriptors are. In Unix systems, each input and output (IO) resource has descriptors. These descriptors show where the IO operations will happen. Now, let's take a closer look at what a descriptor is and how to use it.

What is a descriptor

A descriptor is a non-negative number assigned to a file or other IO resource. For the rest of this topic, we'll refer to all these resources simply as 'files.' Descriptors allocate in the order that files open, and each subsequent file gets the next available descriptor. Through these descriptors, we can access various IO streams like standard input, standard output, and standard error:

Standard Input (stdin) - Descriptor 0: is a channel where a program receives data for processing. In simple terms, stdin is like a "mailbox" where the program collects the incoming data it needs to work on. For example, when you type commands into a terminal, you're sending data through stdin to be processed by the system.
Standard Output (stdout) - Descriptor 1: This is where the program sends data after processing it.
Standard Error (stderr) - Descriptor 2: This stream is used for error messages and diagnostics.

File redirections

Redirection is a popular feature in Unix/Linux systems. For example, you can write new information to a file using the terminal. Two symbols, >> and >, serve this purpose:

>> appends new content to the file without removing the old information.

# adding new information to a diary
echo "Also my favorite things are ..." >> diary.txt

> overwrites the file, removing any old content.

# rewriting the file content
echo "My new everlasting love is ..." > secret.txt

Great, now you know how to write information to files. But what if you need to redirect the output of the descriptors?

Redirections of descriptors

As mentioned earlier, using scripts can sometimes simplify writing information to files. Let's consider an example.

Let's say you need to write two different messages to two different files simultaneously. You can create a program.sh script for this task. The script contains two echo commands: one for a normal message and another for an error message.

echo "Just a normal message"
echo "ERROR!" >&2

In this example, >&2 directs the "ERROR!" message to Standard Error (stderr), represented by descriptor 2.

To execute this, you can run:

$ bash program.sh 1> /tmp/log.txt 2> /dev/null

Here, 1> /tmp/log.txt redirects the normal message to log.txt, while 2> /dev/null routes the error message to /dev/null, a special file that discards any data sent to it.

Besides redirecting standard output, you can also redirect error messages. If you want to record all errors in one place, run program.sh 2> errors.txt.

Both program.sh 1> file and program.sh > file redirect stdout, doing the same thing.

This covers the basics of how to redirect output using descriptors, but there's more to learn. There is one more way to redirect IO streams, and it is called a pipeline with the corresponding operator |.

Pipe operator

The purpose of the pipe (|) is to redirect the output of one command as input to another command. You can string together multiple commands like this: command1 | command2 | command3 .... In this arrangement, all data displayed by the first command becomes the input for the second command, as if you were typing it from the keyboard. The first command processes its input and passes its output to the second command, and so on. It's important to note that these commands run concurrently. That means the output from the first command moves immediately to the input of the second command without waiting for the first to finish.

For example, let's look at a simple pipeline that uses echo and grep. The echo command prints out "Hmm...\nBrr...\nMmm...", and the grep command looks for all occurrences of the letter "m" in these lines:

$ echo -e "Hmm...\nBrr...\nMmm..." | grep "m"

The output will be:

Hmm...
Mmm...

We will tell you more about the grep command later. Using the pipeline, you can implement more complex chains of commands, but the main thing is that now we know what it's generally for.

Conclusion

In conclusion:

Descriptors allow us to access IO streams;
The > and >> operators can be used for file redirection;
> overwrites file content, while >> appends to it;
Descriptors can be redirected using these same operators;
A pipeline (|) lets us send the standard output (stdout) of one command to the standard input (stdin) of another.

201 learners liked this piece of theory. 28 didn't like it. What about you?

Report a typo