How to select the first row of each group?

  • I have a table like this:

     ID |  Val   |  Kind
    ----------------------
     1  |  1337  |   2
     2  |  1337  |   1
     3  |   3    |   4
     4  |   3    |   4
    

    I want to make a SELECT that will return just the first row for each Val, ordering by Kind.

    Sample output:

     ID |  Val   |  Kind
    ----------------------
     2  |  1337  |   1
     3  |   3    |   4
    

    How can I build this query?

    why 3|3|4 and not 4|3|4 - what is the tie-break or do you not care?

    @JackDouglas Actually I have a `ORDER BY ID DESC`, but that is not relevant for the question. In this example I do not care.

  • mik

    mik Correct answer

    7 years ago

    This solution also uses keep, but val and kind can also be simply calculated for each group without a subquery:

    select min(id) keep(dense_rank first order by kind) id
         , val
         , min(kind) kind
      from mytable
     group by val;
    
    ID |  VAL | KIND
    -: | ---: | ---:
     3 |    3 |    4
     2 | 1337 |    1
    

    dbfiddle here

    KEEP…FIRST and KEEP…LAST are an Oracle-specific feature of aggregates — you can read about then here in the Oracle docs, or on ORACLE_BASE:

    The FIRST and LAST functions can be used to return the first or last value from an ordered sequence

  • Use a common table expression (CTE) and a windowing/ranking/partitioning function like ROW_NUMBER.

    This query will create an in-memory table called ORDERED and add an additional column of rn which is a sequence of numbers from 1 to N. The PARTITION BY indicates it should restart at 1 every time the value of Val changes and we want to order rows by the smallest value of Kind.

    WITH ORDERED AS
    (
    SELECT
        ID
    ,   Val
    ,   kind
    ,   ROW_NUMBER() OVER (PARTITION BY Val ORDER BY Kind ASC) AS rn
    FROM
        mytable
    )
    SELECT
        ID
    ,   Val
    ,   Kind
    FROM
        ORDERED
    WHERE
        rn = 1;
    

    The above approach should work with any RDBMS that has implemented the ROW_NUMBER() function. Oracle has some elegant functionality as expressed in mik's answer that will generally yield better performance than this answer.

  • bilinkc's solution works fine, but I thought I'd toss mine out as well. It has the same cost, but might be faster (or slower, I haven't tested it). The difference is that it uses the First_Value instead of Row_Number. Since we are only interested in the first value, in my mind it is more straightforward.

    SELECT ID, Val, Kind FROM
    (
       SELECT First_Value(ID) OVER (PARTITION BY Val ORDER BY Kind) First, ID, Val, Kind 
       FROM mytable
    )
    WHERE ID = First;
    

    Test Data.

    --drop table mytable;
    create table mytable (ID Number(5) Primary Key, Val Number(5), Kind Number(5));
    
    insert into mytable values (1,1337,2);
    insert into mytable values (2,1337,1);
    insert into mytable values (3,3,4);
    insert into mytable values (4,3,4);
    

    If you prefer, here is the CTE equivalent.

    WITH FirstIDentified AS (
       SELECT First_Value(ID) OVER (PARTITION BY Val ORDER BY Kind) First, ID, Val, Kind 
       FROM mytable
       )
    SELECT ID, Val, Kind FROM FirstIdentified
    WHERE ID = First;
    

    +1 but I just thought it worth emphasising that your answer and billinkc's are not logically the same unless `id` is unique.

    @Jack Douglas - True, I assumed that.

  • You can use keep to select an id from each group:

    select *
    from mytable
    where id in ( select min(id) keep (dense_rank first order by kind, id)
                  from mytable
                  group by val );
    
    ID |  VAL | KIND
    -: | ---: | ---:
     2 | 1337 |    1
     3 |    3 |    4
    

    dbfiddle here

  • SELECT MIN(MyTable01.Id) as Id,
           MyTable01.Val     as Val,
           MyTable01.Kind    as Kind 
      FROM MyTable MyTable01,                         
           (SELECT Val,MIN(Kind) as Kind
              FROM MyTable                   
          GROUP BY Val) MyTableGroup
    WHERE MyTable01.Val  = MyTableGroup.Val
      AND MyTable01.Kind = MyTableGroup.Kind
    GROUP BY MyTable01.Val,MyTable01.Kind
    ORDER BY Id;
    

    That will be a lot less efficient than the other answers due to the fact that two scans over MyTable are needed.

    That's only true if the optimizer takes the written query literally. More advanced optimizers can see the intent (row per group) and produce a plan with a single table access.

  • select * from (select t1.*,ROW_NUMBER() OVER (PARTITION BY Val ORDER BY Val desc) as seqnum from tablename t1) where seqnum=1;

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM