select rows where column contains same data in more than one record

  • I have a table that has a column called article_title. Let's say the table name is articles. I need to find out the records where the article_title data is the same on more than one record.

    Here's what I've got:

    select a.* 
    from articles a 
    where a.article_title = (select article_title 
                             from articles 
                             where article_title = a.article_title 
                             AND a.id <> articles.id)
    
  • HAVING is a great aggregate filter. (http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html) For example, select the article_titles with more than on occurrence:

    SELECT count(*), article_title
    FROM articles
    GROUP BY article_title
    HAVING COUNT(*) > 1;
    

    Adding columns to the SELECT and GROUP BY clauses allow you to locate duplicates based on a composite key of multiple columns.

    @jkushner: but beware of the quirks (or bugs as some may see it) in MySQL's implementation of `GROUP BY`: http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/

  • Your problem can be solved with this query:

    SELECT *
    FROM article
    WHERE article_title IN (SELECT *
                            FROM (SELECT article_title
                                  FROM article
                                  GROUP BY article_title
                                  HAVING COUNT(article_title) > 1)
                            AS a);
    

    This is the correct answer because it actually returns the rows that have the duplicates

  • have a table that has a column called article_title. Let's say the table name is articles. I need to find out the records where the article_title data is the same on more than one record.

    Sound like to me you also need to have the id because you want to find records based on article_title because you have duplicates

    Basic MIN/MAX with GROUP BY (you will miss id's when more then 2 duplicates)

    SELECT 
      MIN(id) -- for FIFO id's (first id by duplicate)
    , MAX(id) -- for LIFO id's (last id by duplicate)
    , article_title
    , COUNT(*)  
    FROM
     articles
    WHERE -- Maybe to filter out '' or IS NOT NULL
     article_title != '' AND article_title IS NOT NULL
    GROUP BY
     article_title ASC
    HAVING
     COUNT(*) >= 2
    ;
    

    Or back to denormalisation to generate an CSV for LIFO id's (older id's by duplicates) but you know all id's here..

    SELECT 
      GROUP_CONCAT(id ORDER BY ASC SEPARATOR ',') -- change to DESC if want the last record first
    , article_title
    , COUNT(*)  
    FROM
     articles
    GROUP BY
     article_title ASC
    HAVING
     COUNT(*) >= 2
    ;
    

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM