Bash test: what does "=~" do?

  • #!/bin/bash
    if [[ "$INT" =~ ^-?[0-9]+$ ]]; then
    echo "INT is an integer."
    echo "INT is not an integer." >&2
    exit 1

    What does the leading ~ do in the starting regular expression?

  • The ~ is actually part of the operator =~ which performs a regular expression match of the string to its left to the extended regular expression on its right.

    [[ "string" =~ pattern ]]

    Note that the string should be quoted, and that the regular expression shouldn't be quoted.

    A similar operator is used in the Perl programming language.

    The regular expressions understood by bash are the same as those that GNU grep understands with the -E flag, i.e. the extended set of regular expressions.

    Somewhat off-topic, but good to know:

    When matching against a regular expression containing capturing groups, the part of the string captured by each group is available in the BASH_REMATCH array. The zeroth/first entry in this array corresponds to & in the replacement pattern of sed's substitution command (or $& in Perl), which is the bit of the string that matches the pattern, while the entries at index 1 and onwards corresponds to \1, \2, etc. in a sed replacement pattern (or $1, $2 etc. in Perl), i.e. the bits matched by each parenthesis.


    string=$( date +%T )
    if [[ "$string" =~ ^([0-9][0-9]):([0-9][0-9]):([0-9][0-9])$ ]]; then
      printf 'Got %s, %s and %s\n' \
        "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"

    This may output

    Got 09, 19 and 14

    if the current time happens to be 09:19:14.

    The REMATCH bit of the BASH_REMATCH array name comes from "Regular Expression Match", i.e. "RE-Match".

    In non-bash Bourne-like shells, one may also use expr for limited regular expression matching (using only basic regular expressions).

    A small example:

    $ string="hello 123 world"
    $ expr "$string" : ".*[^0-9]\([0-9][0-9]*\)"

    It's the same as what `grep -E` understands only on GNU systems and only when using an unquoted variable as the pattern `[[ $var = $pattern ]]` (see `[[ 'a b' =~ a\sb ]]` vs `p='a\sb'; [[ 'a b' =~ $p ]]`). Also beware that shell quoting affects the meaning of RE operators and that some characters need to be quoted for the shell tokenising that may affect the RE processing. `[[ '\' =~ [\/] ]]` returns false. `ksh93` has even worse issues. See `zsh` (or bash 3.1) for a saner approach where shell and RE quoting are clearly separate. The `[` builtin of `zsh` and `yash` also have a `=~` operator.

    very cool `off-topic`! +1 (

    @StéphaneChazelas How is it "saner" that both of this match in zsh?: `[[ "This is a fine mess." =~ T.........fin*es* ]]; [[ "This is a fine mess." =~ T.........fin\*es\* ]]`. Or that a quoted `*` also match? `[[ "This is a fine mess." =~ "T.........fin*es*" ]]`.

    It's saner (IMO) in that it's much simpler rules. Shell quoting and RE escaping are clearly separate. In `[[ a =~ .* ]]` or `[[ a =~ '.*' ]]` or `[[ a =~ \.\* ]]`, the same `.*` RE is passed to the `=~` operator. OTH, in `bash`, `[[ '\' =~ [)] ]]` returns an error, would you know without trying it whether `[[ '\' =~ [\)] ]]` matches? How about `[[ '\' =~ [\/] ]]` (it does in ksh93). How about `c='a-z'; [[ a =~ ["$c"] ]]` (compare with the `=` operator)? See also: `[[ '\' =~ [^]"."] ]]` which returns false... Note that you can do `shopt -s compat31` in `bash` to get the `zsh` behaviour.

    `zsh`/`bash -o compat31`'s behaviour for `[[ a =~ '.*' ]]` is also consistent with `[ a '=~' '.*' ]` (for `[` implementations that support `=~`) or `expr a : '.*'`. OTOH, it's not consistent with `[[ a = '*' ]]` vs `[[ a = * ]]` (but then, globs are part of the shell language, while REs are not).

    To deal with characters in the pattern that might be interpreted by the shell, it's often recommended to do something like this: `pat="..."; if [[ "$string" =~ $pat ]]; then ...`. (@StéphaneChazelas's topmost comment suggested it, I'm just emphasizing it.)

  • You should read the bash man pages, under the [[ expression ]] section.

    An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).

    Long story short, =~ is an operator, just like == and !=. It has nothing to do with the actual regex in the string to its right.

    Can you figure out some examples demonstrating the use of `=~` in real life...?

    @GeorgeVasiliou I use it fairly often in scripts that put the output from a command into a variable. Then the variable is checked to see if it matches some string pattern. This is useful for example if you want to take some action based on some error output from that command.

    @Sokel For some, “RTFM” is easier said than done. ⋯ `man [[ expresssion ]]` and `man [[` return nothing. `help [[` returns useful information—since `[[` an internal bash command—but does not say whether `=~` uses basic or extended regex syntax. ⋯ The text you quoted is from the **bash** man page. I realize you said “read the bash man pages” but at first, I thought you meant read the man pages within bash. At any rate, `man bash` returns a huge file, which is 4139 lines (72 pages) long. It can be searched by pressing `/▒▒▒`, which takes a regex, the flavor of which—like `=~`—is not specified.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM