Select which has max date or latest date

  • Here are two tables.

    SCHOOL_STAFF

    SCHOOL_CODE + STAFF_TYPE_NAME + LAST_UPDATE_DATE_TIME + PERSON_ID
    =================================================================
    ABE           Principal         24-JAN-13               111222
    ABE           Principal         09-FEB-12               222111
    

    PERSONS

    PERSON_ID + NAME
    =================
    111222      ABC
    222111      XYZ
    

    Here is my oracle query.

    SELECT MAX(LAST_UPDATE_DATE_TIME) AS LAST_UPDATE, SCHOOL_CODE, PERSON_ID
    FROM SCHOOL_STAFF
    WHERE STAFF_TYPE_NAME='Principal'
    GROUP BY SCHOOL_CODE, PERSON_ID
    ORDER BY SCHOOL_CODE;
    

    which gives this results

    LAST_UPDATE SCHOOL_CODE PERSON_ID
    ===========+===========+=========
    24-JAN-13   ABE         111222
    09-FEB-12   ABE         222111
    

    I want to select the first one for the school which has latest date.

    Thanks.

  • Taryn

    Taryn Correct answer

    7 years ago

    Your current query is not giving the desired result because you are using a GROUP BY clause on the PERSON_ID column which has a unique value for both entries. As a result you will return both rows.

    There are a few ways that you can solve this. You can use a subquery to apply the aggregate function to return the max(LAST_UPDATE_DATE_TIME) for each SCHOOL_CODE:

    select s1.LAST_UPDATE_DATE_TIME,
      s1.SCHOOL_CODE,
      s1.PERSON_ID
    from SCHOOL_STAFF s1
    inner join
    (
      select max(LAST_UPDATE_DATE_TIME) LAST_UPDATE_DATE_TIME,
        SCHOOL_CODE
      from SCHOOL_STAFF
      group by SCHOOL_CODE
    ) s2
      on s1.SCHOOL_CODE = s2.SCHOOL_CODE
      and s1.LAST_UPDATE_DATE_TIME = s2.LAST_UPDATE_DATE_TIME;
    

    See SQL Fiddle with Demo

    Or you can use use a windowing function to return the rows of data for each school with the most recent LAST_UPDATE_DATE_TIME:

    select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME
    from
    (
      select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME,
        row_number() over(partition by SCHOOL_CODE 
                            order by LAST_UPDATE_DATE_TIME desc) seq
      from SCHOOL_STAFF
      where STAFF_TYPE_NAME='Principal'
    ) d
    where seq = 1;
    

    See SQL Fiddle with Demo

    This query implements row_number() which assigns a unique number to each row in the partition of SCHOOL_CODE and placed in a descending order based on the LAST_UPDATE_DATE_TIME.

    As a side note, the JOIN with aggregate function is not exactly the same as the row_number() version. If you have two rows with the same event time the JOIN will return both rows, while the row_number() will only return one. If you want to return both with a windowing function, then consider using the rank() windowing function instead as it will return ties:

    select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME
    from
    (
      select SCHOOL_CODE, PERSON_ID, LAST_UPDATE_DATE_TIME,
        rank() over(partition by SCHOOL_CODE 
                            order by LAST_UPDATE_DATE_TIME desc) seq
      from SCHOOL_STAFF
      where STAFF_TYPE_NAME='Principal'
    ) d
    where seq = 1;
    

    See Demo

    Thanks, I find the inner join to a subquery table (example 1 above) to be the most intuitive.. and doesn't require me to learn what _partition by_ is all about. Here is a look at a similar syntax to example 1: select oT.dateField, oT.siteID, oT.field1, oT.field2, oT.field3, from **originalTable** as **oT** inner join (select max(dateField) as newestDate, siteID from originalTable group by siteID ) as newTable on oT.siteID = newTable.site_ID and oT.dateField = newTable.newestDate order by oT.siteID asc To me, that better explains what's happening in the subquery.

  • I'm surprised nobody has taken advantage of window functions beyond row_number()

    Here's some data to play with:

    CREATE TABLE SCHOOL_STAFF
    (
    LAST_UPDATE_DATE_TIME VARCHAR(20),
    SCHOOL_CODE VARCHAR(20),
    PERSON_ID VARCHAR(20),
    STAFF_TYPE_NAME VARCHAR(20)
    );
    INSERT INTO SCHOOL_STAFF VALUES ('24-JAN-13', 'ABE', '111222', 'Principal');
    INSERT INTO SCHOOL_STAFF VALUES ('09-FEB-12', 'ABE', '222111', 'Principal');
    

    The OVER() clause creates a window for which you will define your aggregate groups. In this case, I am only partitioning on the SHOOL_CODE, so we will see the FIRST_VALUE, which will come from LAST_UPDATE_DATE_TIME, grouped by SCHOOL_CODE, and in the order of LAST_UPDATE_DATE_TIME by descending order. This value will be applied to the entire column for each SCHOOL_CODE.

    It is important to pay close attention to your partitioning and ordering in the over() clause.

    SELECT DISTINCT
     FIRST_VALUE(LAST_UPDATE_DATE_TIME) OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS LAST_UPDATE
    ,FIRST_VALUE(SCHOOL_CODE)           OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS SCHOOL_CODE
    ,FIRST_VALUE(PERSON_ID)             OVER (PARTITION BY SCHOOL_CODE ORDER BY LAST_UPDATE_DATE_TIME DESC) AS PERSON_ID
    FROM SCHOOL_STAFF
    WHERE STAFF_TYPE_NAME = 'Principal'
    ORDER BY SCHOOL_CODE
    

    Returns:

    24-JAN-13   ABE 111222
    

    This should eliminate your need for GROUP BY and Subqueries for the most part. You will want to make sure to include DISTINCT though.

    This is nice, but is there a way to avoid repeating the over clause for all columns?

  • select LAST_UPDATE_DATE_TIME as LAST_UPDATE,
      SCHOOL_CODE,
      PERSON_ID
    from SCHOOL_STAFF
    WHERE STAFF_TYPE_NAME='Principal'
    AND LAST_UPDATE_DATE_TIME = (SELECT MAX(LAST_UPDATE_DATE_TIME)
                                FROM SCHOOL_STAFF s2
                                WHERE PERSON_ID = s2.PERSON_ID)
    

    Instead of posting *just* code, you should attempt to explain how this answers the question; and potentially what the OP was doing incorrectly.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM