How to delete the rest of each line after a certain pattern or a string in a file?

  • Suppose I have a list of URLs in a text file:

    google.com/funny
    unix.stackexchange.com/questions
    isuckatunix.com/ireallydo
    

    I want to delete everything that comes after '.com'.

    Expected Results:

    google.com
    unix.stackexchange.com
    isuckatunix.com
    

    I tried

    sed 's/.com*//' file.txt 
    

    but it deleted .com as well.

    Is there a specific reason for which you want to search for `.com` only instead of removing everything after and including the first `/` character? What if you had a URL like `en.wikipedia.org/wiki/Ubuntu` in your list?

  • To explicitly delete everything that comes after ".com", just tweak your existing sed solution to replace ".com(anything)" with ".com":

    sed 's/\.com.*/.com/' file.txt
    

    I tweaked your regex to escape the first period; otherwise it would have matched something like "thisiscommon.com/something".

    Note that you may want to further anchor the ".com" pattern with a trailing forward-slash so that you don't accidentally trim something like "sub.com.domain.com/foo":

    sed 's/\.com\/.*/.com/' file.txt
    

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM