This is the third in a series of articles praising some of the tools that I have found to be “Secret Weapons”—tools that have proven unreasonably and repeatedly effective throughout my career, and that I think are not as well known or appreciated as they should be.
Let us now praise pv, which seems to stand for either “pipe viewer” or “progress viewer.”
From the man page:
pv shows the progress of data through a pipeline by giving information such as time elapsed, percentage completed (with progress bar), current throughput rate, total data transferred, and ETA.
You can insert it at any point in a shell pipeline to get realtime feedback about how much data has moved through that point in the pipeline. For example, if you are kicking off a big, long-running shell process, and you want to see progress as you go, you can run:
generate_input.sh | \ pv | \ expensive_computation.sh
As the process progresses, pv will refresh standard out with lines that look something like this:
66.0MiB 0:00:06 [10.1MiB/s] [<=> ]
The numbers (indicating how much data has passed through
pv so far, how long it has been running, and the rate of data per second) will update, and the
<=> will bounce back and forth to show you that it’s working.
If you’re operating on a file, you can use
pv now knows the size of the input—it will print a “percent done” indicator and ETA instead of the bouncing arrow.
pv big_file.txt | \ expensive_computation.sh
9.07M 0:00:02 [4.48M/s] [=> ] 9% ETA 0:00:20
Even if you’re not dealing with files, you can still get the percent done & ETA if you know the size of the input, in lines or bytes. E.g., if you know
generate_input.sh will produce 100,000,000 lines of input:
generate_input.sh | \ pv -l -s 100000000 | \ expensive_computation.sh
-l indicates “line mode” and
-s gives the expected size.
When I used to run long processing jobs in the days before a colleague introduced me to
pv, I either had to build in my own progress indicators all through the pipeline (expensive & tedious, and so rarely done), or else watch the silent command line wondering whether the command was frozen or working, and—if it was working—how long it would take to finish. Now that doesn’t happen any more.
pv is an incredibly valuable tool to have in your long-running job belt, especially when paired with
tmux (the subject of an upcoming article).