Is there a way to get the min, max, median, and average of a list of numbers in a single command?

  • I have a list of numbers in a file, one per line. How can I get the minimum, maximum, median and average values? I want to use the results in a bash script.

    Although my immediate situation is for integers, a solution for floating-point numbers would be useful down the line, but a simple integer method is fine.

  • lesmana

    lesmana Correct answer

    9 years ago

    You can use the R programming language.

    Here is a quick and dirty R script:

    #! /usr/bin/env Rscript
    d<-scan("stdin", quiet=TRUE)
    cat(min(d), max(d), median(d), mean(d), sep="\n")
    

    Note the "stdin" in scan which is a special filename to read from standard input (that means from pipes or redirections).

    Now you can redirect your data over stdin to the R script:

    $ cat datafile
    1
    2
    4
    $ ./mmmm.r < datafile
    1
    4
    2
    2.333333
    

    Also works for floating points:

    $ cat datafile2
    1.1
    2.2
    4.4
    $ ./mmmm.r < datafile2
    1.1
    4.4
    2.2
    2.566667
    

    If you don't want to write an R script file you can invoke a true one-liner (with linebreak only for readability) in the command line using Rscript:

    $ Rscript -e 'd<-scan("stdin", quiet=TRUE)' \
              -e 'cat(min(d), max(d), median(d), mean(d), sep="\n")' < datafile
    1
    4
    2
    2.333333
    

    Read the fine R manuals at http://cran.r-project.org/manuals.html.

    Unfortunately the full reference is only available in PDF. Another way to read the reference is by typing ?topicname in the prompt of an interactive R session.


    For completeness: there is an R command which outputs all the values you want and more. Unfortunately in a human friendly format which is hard to parse programmatically.

    > summary(c(1,2,4))
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1.000   1.500   2.000   2.333   3.000   4.000 
    

    It looks interesting.. I'll have a closer look at it tomorrow.. Based on wikipedia's page, "R has become a de facto standard among statisticians"... well that's a significant accolade... I actaully tried to dowload it the other day (I kept seeing it mentioned), but I couldn't find it in the Ubuntu repo... I'll follow it up tomorrow...

    in the ubuntu (and debian?) repo the package is named `r-base`.

    thanks, I needed that name reference :) I didn't think of r- in the synaptic search field and it doesn't act on a lone character... I've tried it out now, and it looks ideal.. The `R` language is clearly the best for my requirement in this situation.. As per Gilles' answer, the `Rscript` interface to script files is most appropriate (vs. `R`, which is the interactive interface)... and R in the terminal makes for a handy calculator, or test environment (like python :)

    (+1) I love R. I can't recommend it enough.

    If you have data on stdin, you can use such a one-liner: `{ echo 'd<-scan()'; cat; echo; echo 'summary(d)'; } | R --slave`

    or just `cat datafile | Rscript -e 'print(summary(scan("stdin")));'`

    If you want to parse the output of `summary()`, use `tail +2 | awk '{print $N}'` where `N` is the column you want

    Very nice, actually my very first R script ever run :)

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM