How to select distinct for one column and any in another column?
I need to query an SQL database to find all distinct values of one column and I need an arbitrary value from another column. For example, consider the following table with two columns, key and value:
key value === ===== one test one another one value two goes two here two also three example
I wish to get back one sample row, chosen arbitrarily, from each distinct key, perhaps getting these three rows:
key value === ===== one test two goes three example
How can I formulate such a query in SQL?
It is a proprietary system.
The easiest query to write is for MySQL (with not strict ANSI settings). It uses the non-standard construction:
SELECT key, value FROM tableX GROUP BY key ;
In recent version (5.7 and 8.0+) where the strict settings and
ONLY_FULL_GROUP_BY
are the default, you can use theANY_VALUE()
function, added in 5.7:SELECT key, ANY_VALUE(value) AS value FROM tableX GROUP BY key ;
For other DBMSs, that have window functions (like Postgres, SQL-Server, Oracle, DB2), you can use them like this. The advantage is that you can select other columns in the result as well (besides the
key
andvalue
) :SELECT key, value FROM tableX ( SELECT key, value, ROW_NUMBER() OVER (PARTITION BY key ORDER BY whatever) --- ORDER BY NULL AS rn --- for example FROM tableX ) tmp WHERE rn = 1 ;
For older versions of the above and for any other DBMS, a general way that works almost everywhere. One disadvantage is that you cannot select other columns with this approach. Another is that aggregate functions like
MIN()
andMAX()
do not work with some datatypes in some DBMSs (like bit, text, blobs):SELECT key, MIN(value) AS value FROM tableX GROUP BY key ;
PostgreSQL has a special non-standard
DISTINCT ON
operator that can also be used. The optionalORDER BY
is for selecting which row from every group should be selected:SELECT DISTINCT ON (key) key, value FROM tableX -- ORDER BY key, <some_other_expressions> ;
@WilliamKF If by "chosen arbitrarily" you mean "chosen randomly" then simply replace the `ORDER BY whatever` in ypercube's query with a call to a function to randomize the results.
@LeighRiffel It need not be random, any choice, as simple as the first one encountered works fine.
For MS-SQl Server:
;with FinalDataset as ( select *, row_number() over(partition by key order by value) as rownum from YourOriginalTable ) select key, value from FinalDataset where rownum = 1
Likewise, you could have rownum = 2 for your second result set
Similar to accepted answer, but instead of min() or max() you can use array_agg()
SELECT key, (array_agg(value))[1] AS value FROM tableX GROUP BY key ;
You can optionally order values inside array to select biggest or smallest of them:
SELECT key, (array_agg(value) ORDER BY value DESC)[1] AS value FROM tableX GROUP BY key ;
(checked on PostgreSQL)
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
ypercubeᵀᴹ 8 years ago
Which DBMS (Oracle, SQL-Server, DB2, MySQL, Postgres)?