How to find lines containing a string and then printing those specific lines and something else

  • I use the following command to recursively search multiple files and find the line number in each file in which the string is found.

        grep -nr "the_string" /media/slowly/DATA/lots_of_files > output.txt
    

    The output is as follows:

        /media/slowly/DATA/lots_of_files/lots_of_files/file_3.txt:3:the_string
        /media/slowly/DATA/lots_of_files/lots_of_files/file_7.txt:6:the_string is in this sentence.
        /media/slowly/DATA/lots_of_files/lots_of_files/file_7.txt:9:the_string is in this sentence too.
    

    As shown above, the output includes the filename, line number and all the text in that line including the string.

    I have also figured out how to print just the specific lines of a files containing the string using the following command:

        sed '3!d' /media/slowly/DATA/lots_of_files/lots_of_files/file_3.txt > print.txt
        sed '6!d' /media/slowly/DATA/lots_of_files/lots_of_files/file_7.txt >> print.txt
        sed '9!d' /media/slowly/DATA/lots_of_files/lots_of_files/file_7.txt >> print.txt
    

    I created the above commands manually by reading the line numbers and filenames

    Here's my question.

    Q1a

    Is there a way to combine both steps into one command? I'm thinking piping the line number and the filename into sed and printing the line. I'm having a problem with the order in which the grep output is generated.

    Q1b

    Same as above but also print the 2 lines before and 2 lines after the line containing the string (total of 5 lines)? I'm thinking piping the line number and the filename into sed and printing all the required lines somehow.

    Big thanks.

    What is the issue with grep output order? This sounds like an XY problem

  • If I am understanding the question correctly, you can accomplish this with one grep command.

    For Q1a, your grep output can suppress the filename using -h, e.g.:

    grep -hnr "the_string" /media/slowly/DATA/lots_of_files > output.txt
    

    For Q1b, your grep output can include lines preceding and following matched lines using -A and -B, e.g.:

    grep -hnr -A2 -B2 "the_string" /media/slowly/DATA/lots_of_files > output.txt
    

    The output will contain a separator between matches, which you can suppress with --no-group-separator, e.g.:

    grep -hnr -A2 -B2 --no-group-separator "the_string" /media/slowly/DATA/lots_of_files > output.txt
    

    Note that the output uses a different delimiter for matching lines (:) and context lines (-).

  • Your first question as far as I know can be answered by coming at grep a different way. When you send it a list of files (or directory to recurse through with -r or -R), it will always output which file it has found a match in as well as the line number. You can get around this with a construct such as:

    find /path/to/files -type f | xargs grep -n 'the_pattern'
    

    As for your second question, if you want to see the lines before and after a match, you can use the -C (for Context) switch:

    grep -C2 'pattern' /path/to/file # displays the two lines before and after a match
    

    Related to -C are -A (for After), and -B (for Before), which only give the specified number of lines after or before a match, respectively.

    You can combine the two answers thusly:

    find /path/to/files -type f | xargs grep -n -C2 'the_pattern'
    

    As for your question about sed, the example you gave only works if you already know the line numbers. You can also do something like:

    sed -n '/the_pattern/p' /path/to/files/*
    

    (but it will not recurse into subdirectories)

  • find /media/slowly/DATA/lots_of_files -type f -exec grep -h -C2 'the_pattern' {} +
    

    This will find things which are files (as opposed to directories or links) under the /media/slowly/DATA/lots_of_files directory. It will group them up (no need for xargs this decade) and run grep on them. grep will not print the filenames (-h) but will give 2 lines of context before and after the matching lines (-C2, use -A and -B for more precise control).

    The advantage of this command over the one from @cherdt is you can add additional filters into the find command, for example you can choose not to go into directories like .git

    if grep, takes input from the `standard input`, is there a way to printout the contents of the entire standard input, if a match is found in that particular instance?

    @alpha_989 sorry I am not sure exactly what you are asking. If the input is of reasonable size, so you can expect i to fit in memory then you can just use a very large number to the `-C` parameter. If the input size is going to be essentially unlimited you will need to copy the input to disk, and then show it. So something like `T=$(mktemp); cat > T ; grep -q pattern $T && cat $T ; rm $T`.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM