Removing duplicated rows data frame in R

  • How can I remove duplicate rows from this example data frame?

    A   1
    A   1
    A   2
    B   4  
    B   1
    B   1
    C   2
    C   2
    

    I would like to remove the duplicates based on both the columns:

    A   1
    A   2
    B   4
    B   1
    C   2
    

    Order is not important.

    @whuber shouldn't that be moved to SO?

    @Llopis Yes, but it's too late to do that now--and it was too late when we originally closed it. This kind of question was considered (borderline) on-topic many years ago but nowadays it would be migrated quickly.

  • Rahul

    Rahul Correct answer

    10 years ago

    unique() indeed answers your question, but another related and interesting function to achieve the same end is duplicated().

    It gives you the possibility to look up which rows are duplicated.

    a <- c(rep("A", 3), rep("B", 3), rep("C",2))
    b <- c(1,1,2,4,1,1,2,2)
    df <-data.frame(a,b)
    
    duplicated(df)
    [1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE
    
    > df[duplicated(df), ]
      a b
    2 A 1
    6 B 1
    8 C 2
    
    > df[!duplicated(df), ]
      a b
    1 A 1
    3 A 2
    4 B 4
    5 B 1
    7 C 2
    

    Thanks for mentioning the 'duplicated' function. It can be used to delete duplicated rows based on a subset of the columns.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM