Pipes are cool! We saw how handy they are in a previous blog post. Let’s look at a typical way to use the pipe operator. We have some output, and we want to look at the first lines of the output. Let’s download The Brothers Karamazov by Fyodor Dostoevsky, a fairly long novel.
> wc -l karamazov.txt
36484 karamazov.txt
If we cat
this file, it will be printed to the terminal.
> cat brothers_karamazov.txt
... many lines of text!
***FINIS***
It takes a noticeable amount of time to finish.
Now let’s just look at the first two lines by piping it into head
.
> cat karamazov.txt | head -n 2
The Project Gutenberg EBook of The Brothers Karamazov by Fyodor
Dostoyevsky
Now it’s done in an instant.
It seems that the cat
operation terminates when head
is done!
Of course, head -n 2
only needs two lines of input to output what it’s supposed to output.
But how does cat
know to stop when head
is finished?
In this blog post, we’ll learn a bit about how pipes work, and write a small cat
clone in Python and Go.
Processes in a pipeline are started simultaneously. We can confirm this by running
> sleep 100 | head
in one terminal window, and taking a look at running processes with ps
in another.
> ps
PID TTY TIME CMD
52892 ttys007 0:00.00 sleep 100
52893 ttys007 0:00.00 head
From the Unix manual page on pipes, man(7) pipe
, we learn that if we pipe process1
into process2
, the second process will wait (block) until it receives input.
Furthermore, if process2
is finished, it closes its end of the pipe.
This will cause a SIGPIPE
signal to be generated for process1
.
Writing a cat
clone in Python
Let’s try to write a simple cat
clone that mimics this behavior.
Instead of reading from file, let’s read from standard input.
That means we should be able to pipe into it.
In Python, standard input is found in sys.stdin
.
# cat.py
import sys
for line in sys.stdin:
print(line, end='')
This behaves the same as the regular cat
command if we pipe into it.
cat karamazov | python cat.py
... many lines of text!
***FINIS*
But does it also stop when we pipe it into head
?
> cat karamazov | python cat.py | head -n 2
The Project Gutenberg EBook of The Brothers Karamazov by Fyodor
Dostoyevsky
Traceback (most recent call last):
File "cat.py", line 4, in <module>
print(line, end='')
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
We see that we get an error BrokenPipeError
.
But if we look closely at the output here, we see that it’s printed twice.
We can try catching the error with a try
/except
.
# cat.py
import sys
for line in sys.stdin:
try:
print(line, end='')
except BrokenPipeError:
pass
Now we only get the second part of the error output from before.
> cat karamazov.txt | python cat.py | head -n 2
The Project Gutenberg EBook of The Brothers Karamazov by Fyodor
Dostoyevsky
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
Since we know that a terminated process to the right of a pipe sends a SIGPIPE
when it’s done, maybe this means that this signal isn’t handled properly?
Indeed, a quick search leads us to a section of the Python documentation, namely Note on SIGPIPE
.
It gives a recipe for handling it.
Our program now becomes
# cat.py
import sys
import os
for line in sys.stdin:
try:
print(line, end='')
except BrokenPipeError:
devnull = os.open(os.devnull, os.O_WRONLY)
os.dup2(devnull, sys.stdout.fileno())
and it works as we want, by getting rid of the pesky error message.
> cat karamazov.txt | python cat.py | head -n 2
The Project Gutenberg EBook of The Brothers Karamazov by Fyodor
Dostoyevsky
But an explanation of how this works would be nice. The recipe has a comment saying that
Python flushes standard streams on exit; redirect remaining output to devnull to avoid another
BrokenPipeError
at shutdown.
What we do here, apparently, is redirect the remaining error output to /dev/null
.
/dev/null
refers to [https://en.wikipedia.org/wiki/Null_device](the null device), which accepts output, but discards it all.
In other words, to not print something, we can redirect output there.
The first line in the except BrokenPipeError
block opens /dev/null
.
The second line calls os.dup2
, which takes two file descriptors.
It “duplicate[s] file descriptor fd
to fd2
, closing the latter first if necessary.”
So in the second line we duplicate /dev/null
into standard output, and possibly close standard output first.
But I don’t really understand this. It turns out that another way of silencing the output is to close standard error.
import sys
for line in sys.stdin:
try:
print(line, end='')
except BrokenPipeError:
sys.stderr.close()
Writing a cat
clone in Go
I want to learn some Go, so let’s try writing the same program in it. In Go, we can apparently simply copy from standard in to standard out. The first program becomes a tidy one liner.
// cat.go
package main
import (
"os"
"io"
)
func main() {
io.Copy(os.Stdout, os.Stdin)
}
When we run this, it also complains about SIGPIPE
:
> cat karamazov.txt | go run cat.go | head -n 2
The Project Gutenberg EBook of The Brothers Karamazov by Fyodor
Dostoyevsky
signal: broken pipe
Turns out ignoring the SIGPIPE
signal also is one line.
// cat.go
package main
import (
"os"
"io"
"os/signal"
"syscall"
)
func main() {
signal.Ignore(syscall.SIGPIPE)
io.Copy(os.Stdout, os.Stdin)
}
I hope you learned something reading this!
This post got some attention on Hacker News, and many valid concerns were raised. You should read the comments there!
If you liked this post, you might like my other post Problem solving with Unix commands.