How to grep for tabs without using literal tabs and why does \t not work?

  • When I search for tabs in a file with (e)grep I use the litteral tab (^v + <tab>). I can not utilize \t as a replacement for tabs in regular expressions. With e.g. sed this expression works very well.

    So is there any possibility to use a non-litteral replacement for <tab> and what are the backgrounds for a non working / not interpreted \t ?

  • lesmana

    lesmana Correct answer

    9 years ago

    grep is using regular expressions as defined by POSIX. For whatever reasons POSIX have not defined \t as tab.

    You have several alternatives:

    • tell grep to use the regular expressions as defined by perl (perl has \t as tab):

      grep -P "\t" foo.txt
      

      the man page warns that this is an "experimental" feature. at least \t seems to work fine. but more advanced perl regex features may not.

    • use printf to print a tab character for you:

      grep "$(printf '\t')" foo.txt
      
    • use the literal tab character:

      grep "^V<tab>" foo.txt
      

      that is: type grep ", then press ctrl+v, then press tab, then type " foo.txt. pressing ctrl+v in the terminal causes the next key to be taken verbatim. that means the terminal will insert a tab character instead of triggering some function bound to the tab key.

    • use the ansi c quoting feature of bash:

      grep $'\t' foo.txt
      

      this does not work in all shells.

    • use awk:

      awk '/\t/'
      
    • use sed:

      sed -n '/\t/p'
      

    See the wikipedia article about regular expressions for an overview of the defined character classes in POSIX and other systems.

    basing on enzotib's answer let me add the following: `grep $'\t' foo.txt` (but I would usually write `fgrep` instead of `grep`)

    I needed this, combined with using the value of an environment variable. I used `grep "$(printf '\t')${myvar}" foo.txt`. It worked fine. With a few tries, I could not get the last form to work.

    Is there any reason that plain `grep` couldn't silently interpret `\t` as tab? Does POSIX require that `\t` mean something else? Perhaps it's supposed to match only a literal `\ ` followed by a `t`?

    Perhaps worth noting that BSD (including OSX) grep, lacks the -P option.

    From the man page `This is highly experimental and grep -P may warn of unimplemented features.` Probably not a good idea to use `-P` in legacy systems. The `printf` choice is better

  • It is not exactly the answer you would want to hear, but a possible use of escape sequences is provided by bash

    command | grep $'\t'
    

    (do not put it into double quotes!).

    there is no need for the -E (what is searched for is no regex). There is also no need to pipe from a command. That said, thank you for pointing out this quite overlooked feature of bash (single-quoted strings preceded by $)

    Indeed, I suggest that @enzotib edit the answer to be simply `grep $'\t'`.

    It should be stressed that this is a feature of bash and will (silently!) do the wrong thing if executed by some other shell (such as dash, which is the default for shell scripts on Ubuntu and others)

  • awk '/\t/' is my favorite workaround:

    printf 'a\t\nb' | awk '/\t/'
    

    Output: a\t.

  • One can always resort to using ascii hex-code for tab:

    $ echo "one"$'\t'"two" > input.txt                                 
    
    $ grep -P "\x9" input.txt                                          
    one two
    
    $ grep $'\x9' input.txt                                            
    one two
    

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM