How to use a shell command to only show the first column and last column in a text file?

  • I need some help to figure out how to use the sed command to only show the first column and last column in a text file. Here is what I have so far for column 1:

    cat logfile | sed 's/\|/ /'|awk '{print $1}'
    

    My feeble attempt at getting the last column to show as well was:

    cat logfile | sed 's/\|/ /'|awk '{print $1}{print $8}'
    

    However this takes the first column and last column and merges them together in one list. Is there a way to print the first column and last columns clearly with sed and awk commands?

    Sample input:

    foo|dog|cat|mouse|lion|ox|tiger|bar
    

    Please provide some sample input.

  • Almost there. Just put both column references next to each other.

    cat logfile | sed 's/|/ /' | awk '{print $1, $8}'
    

    Also note that you don't need cat here.

    sed 's/|/ /' logfile | awk '{print $1, $8}'
    

    Also note you can tell awk that the column separators is |, instead of blanks, so you don't need sed either.

    awk -F '|' '{print $1, $8}' logfile
    

    As per suggestions by Caleb, if you want a solution that still outputs the last field, even if there are not exactly eight, you can use $NF.

    awk -F '|' '{print $1, $NF}' logfile
    

    Also, if you want the output to retain the | separators, instead of using a space, you can specify the output field separators. Unfortunately, it's a bit more clumsy than just using the -F flag, but here are three approaches.

    • You can assign the input and output field separators in awk itself, in the BEGIN block.

      awk 'BEGIN {FS = OFS = "|"} {print $1, $8}' logfile
      
    • You can assign these variables when calling awk from the command line, via the -v flag.

      awk -v 'FS=|' -v 'OFS=|' '{print $1, $8}' logfile
      
    • or simply:

      awk -F '|' '{print $1 "|" $8}' logfile
      

    Good job breaking down how this problem can be simplified. You might add a note about how to use `|` as an output separator instead of the default space for string concatenation. Also you could explain to use `$NF` instead of hard coding `$8` to get the last column.

  • You are using awk anyway:

    awk '{ print $1, $NF }' file
    

    Wouldn't you need to specify the input field separator (since in this case it seems to be `|` rather that space) with `-F\|` or similar? Also what if he wanted to use the same delimiter for output?

    @Caleb Probably: I was waiting for the OP to confirm what *exactly* the input looked like, rather than trying to guess based on the non-working examples...

    Note that that assumes the input contains at least 2 fields.

    @StéphaneChazelas OP clearly stated in code that it has eight fields, always.

    @michaelb958 I think "clearly" is overstating the case, just a little :)

    @michaelb958, though I'd agree that will probably address the OP's specific requirements, I think it's still worth mentioning for anyone coming here wanting to retain the first and last field on the input. Leaving it as a comment (as I did) is probably enough.

  • Just replace from the first to last | with a | (or space if you prefer):

    sed 's/|.*|/|/'
    

    Note that though there's no sed implementation where | is special (as long as extended regular expressions are not enabled via -E or -r in some implementations), \| itself is special in some like GNU sed. So you should not escape | if you intend it to match the | character.

    If replacing with space and if the input may already contain lines with only one |, then, you'll have to treat that specially as |.*| won't match on those. That could be:

    sed 's/|\(.*|\)\{0,1\}/ /'
    

    (that is make the .*| part optional) Or:

    sed 's/|.*|/ /;s/|/ /'
    

    or:

    sed 's/\([^|]*\).*|/\1 /'
    

    If you want the first and eighth fields regardless of the number of fields in the input, then it's just:

    cut -d'|' -f1,8
    


    (all those would work with any POSIX compliant utility assuming the input forms valid text (in particular, the sed ones will generally not work if the input has bytes or sequences of bytes that don't form valid characters in the current locale like for instance printf 'unix|St\351phane|Chazelas\n' | sed 's/|.*|/|/' in a UTF-8 locale)).

  • If you find yourself awk- and sed-less, you can achieve the same thing with coreutils:

    paste <(           cut -d'|' -f1  file) \ 
          <(rev file | cut -d'|' -f1 | rev)
    

    `cut` is cleaner and more compact than awk/sed when you are just interested in the first column, or if the delimeters are fixed (i.e. not a variable number of spaces).

  • It seems like you are try to get the first and last fields of text which are delimited by |.

    I assumed your log file contains the text like below,

    foo|dog|cat|mouse|lion|ox|tiger|bar
    bar|dog|cat|mouse|lion|ox|tiger|foo
    

    And you want the output like,

    foo bar
    bar foo
    

    If yes, then here comes the command for your's

    Through GNU sed,

    sed -r 's~^([^|]*).*\|(.*)$~\1 \2~' file
    

    Example:

    $ echo 'foo|dog|cat|mouse|lion|ox|tiger|bar' | sed -r 's~^([^|]*).*\|(.*)$~\1 \2~'
    foo bar
    

    The columns are not delimited by a pipe | but they are in columns, I am interested in using sed but not using the awk command like you did in your command: sed -r 's~^([^|]*).*\|(.*)$~\1 \2~' file

    "The columns are not delimited by a pipe | but they are in columns", you mean columns are separated by spaces?

    A sample input and an output would be better.

  • You should probably do it with sed - I would anyway - but, just cause no one has written this one yet:

    while IFS=\| read col1 cols
    do  printf %10s%-s\\n "$col1 |" " ${cols##*|}"
    done <<\INPUT
    foo|dog|cat|mouse|lion|ox|tiger|bar
    INPUT
    

    OUTPUT

         foo | bar
    

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM