Pipe to multiple commands

Building pipelines with different commands is bread and butter for every Unix / Linux shell user. But what if you want to reuse the output of one command in multiple others?

Pipes

The arguably most famous element of the Unix philosophy is the demand to

write program that do one thing and do it well.

Complex applications can build on these simple components and stick them together to match their respective needs. But in order to do so, the building blocks have to communicate over a very generic interface. This is where pipes come in, connecting standard output (stdout) of one component to standard input (stdin) of another component. I won’t go into details here1, but the important point is that normally you have one command which generates some content and sends it via pipe to the next command down the line. For instance

cat my_lengthy_treatise_on_unix | grep unix

cat echoes the content of a file (here my_lengthy_treatise_on_unix) while grep scans the input for a certain pattern (here simply unix) and outputs matching lines.

Tee

This is a very common pattern in Unix shell scripts. But it gets more complex if you want apply multiple operations on the same input. For example, you intend to analyze the content of the text file mentioned before filtering it by several patterns. But instead of concatenating the matching lines, you want to print the matches to each pattern into a separate file:

              +----------------+     +---------+
    +-------->+ grep pattern 1 +---->+ output1 |
    |         +----------------+     +---------+
+---+---+
| Input |
+---+---+
    |         +----------------+     +---------+
    +-------->+ grep pattern2  +---->+ output2 |
              +----------------+     +---------+

There is a command you can use for this goal: tee. As its man page states:

read from standard input and write to standard output and files

And that’s almost all there is to it. But mind the plural form of “files”. So tee allows to send stdin to as much files as you want and at last to stdout. Since there is only one stdout, it of no interest for us. So we can get rid of it by writing it to /dev/null:

cat input | tee file1 file2 >/dev/null

Unfortunately the redirection to files isn’t either what we really want, because we need to further process the streaming data instead of just dumping it to files. But what if we could “disguise” commands as files? That’s where process substitution comes into play.

Process substitution

Process substitution2 is a mechanism that some shells (like Bash, Zsh or the Korn shell) use to trick commands into thinking that they communicate with a file (either as input or ouput), but which is in reality replaced by the shell with a command or a pipeline of commands. The syntax is as follows:

command_reading_data <(command_producing_input)   # Process substitution for input
command_writing_data >(command_processing_output) # Process substitution for output

A quite common use case is to normalize to data sources before comparing them:

diff <(sort file1) <(sort file2)

As you might guess by now, we use process substitution for the searching and dumping part in our example. And so we end up with the following solution:

cat input | tee >(grep "pattern1" >output_file_1) >(grep >output_file_2) >/dev/null

  1. Wikipedia gives an overview of the inner workings of pipes ↩︎

  2. For a lot more examples see the respective chapter in the Advanced Bash-scripting Guide ↩︎


552 Words

2019-04-01 20:00 +0000

6743b74 @ 2019-07-20