How do I insert a row which contains a foreign key?
Using PostgreSQL v9.1. I have the following tables:
CREATE TABLE foo ( id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY, type VARCHAR(60) NOT NULL UNIQUE ); CREATE TABLE bar ( id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY, description VARCHAR(40) NOT NULL UNIQUE, foo_id BIGINT NOT NULL REFERENCES foo ON DELETE RESTRICT );
Say the first table
foois populated like this:
INSERT INTO foo (type) VALUES ( 'red' ), ( 'green' ), ( 'blue' );
Is there any way to insert rows into
bareasily by referencing the
footable? Or must I do it in two steps, first by looking up the
footype I want, and then inserting a new row into
Here is an example of pseudo-code showing what I was hoping could be done:
INSERT INTO bar (description, foo_id) VALUES ( 'testing', SELECT id from foo WHERE type='blue' ), ( 'another row', SELECT id from foo WHERE type='red' );
Your syntax is almost good, needs some parenthesis around the subqueries and it will work:
INSERT INTO bar (description, foo_id) VALUES ( 'testing', (SELECT id from foo WHERE type='blue') ), ( 'another row', (SELECT id from foo WHERE type='red' ) );
Tested at DB-Fiddle
Another way, with shorter syntax if you have a lot of values to insert:
WITH ins (description, type) AS ( VALUES ( 'more testing', 'blue') , ( 'yet another row', 'green' ) ) INSERT INTO bar (description, foo_id) SELECT ins.description, foo.id FROM foo JOIN ins ON ins.type = foo.type ;
INSERT INTO bar (description, foo_id) SELECT val.description, f.id FROM ( VALUES (text 'testing', text 'blue') -- explicit type declaration; see below , ('another row', 'red' ) , ('new row1' , 'purple') -- purple does not exist in foo, yet , ('new row2' , 'purple') ) val (description, type) LEFT JOIN foo f USING (type);
The use of a
LEFT [OUTER] JOINinstead of
[INNER] JOINmeans that rows from
valaren't dropped when no match is found in
NULLis entered for
VALUESexpression in the subquery does the same as @ypercube's CTE. Common Table Expressions offer additional features and are easier to read in big queries, but they also pose as optimization barriers. So subqueries are typically a bit faster when none of the above is needed.
idas column name is a wide-spread anti-pattern. Should be
bar_idor anything descriptive. When joining a bunch of tables, you end up with multiple columns all named
varchar(n). If you really need to impose a length restriction, add a
You may need to add explicit type casts. Since the
VALUESexpression is not directly attached to a table (like in
INSERT ... VALUES ...), types cannot be derived and default data types are used without explicit type declaration, which may not work in all cases. It's enough to do it in the first row, the rest will fall in line.
INSERT missing FK rows at the same time
If you want to create non-existent entries in
fooon the fly, in a single SQL statement, CTEs are instrumental:
WITH sel AS ( SELECT val.description, val.type, f.id AS foo_id FROM ( VALUES (text 'testing', text 'blue') , ('another row', 'red' ) , ('new row1' , 'purple') , ('new row2' , 'purple') ) val (description, type) LEFT JOIN foo f USING (type) ) , ins AS ( INSERT INTO foo (type) SELECT DISTINCT type FROM sel WHERE foo_id IS NULL RETURNING id AS foo_id, type ) INSERT INTO bar (description, foo_id) SELECT sel.description, COALESCE(sel.foo_id, ins.foo_id) FROM sel LEFT JOIN ins USING (type);
Note the two new dummy rows to insert. Both are purple, which does not exist in
foo, yet. Two rows to illustrate the need for
DISTINCTin the first
The 1st CTE
selprovides multiples rows of input data. The subquery
VALUESexpression can be replaced with a table or subquery as source. Immediately
footo append the
typerows. All other rows get
foo_id IS NULLthis way.
The 2nd CTE
insinserts distinct new types (
foo_id IS NULL) into
foo, and returns the newly generated
foo_id- together with the
typeto join back to insert rows.
The final outer
INSERTcan now insert a foo.id for every row: either the type pre-existed, or it was inserted in step 2.
Strictly speaking, both inserts happen "in parallel", but since this is a single statement, default
FOREIGN KEYconstraints will not complain. Referential integrity is enforced at the end of the statement by default.
SQL Fiddle for Postgres 9.3. (Works the same in 9.1.)
There is a tiny race condition if you run multiple of these queries concurrently. Read more under related questions here and here and here. Really only happens under heavy concurrent load, if ever. In comparison to caching solutions like advertised in another answer, the chance is super-tiny.
Function for repeated use
For repeated use I would create an SQL function that takes an array of records as parameter and use
unnest(param)in place of the
Or, if the syntax for arrays of records is too messy for you, use a comma-separated string as parameter
_param. For instance of the form:
Then use this to replace the
VALUESexpression in above statement:
SELECT split_part(x, ',', 1) AS description split_part(x, ',', 2) AS type FROM unnest(string_to_array(_param, ';')) x;
Function with UPSERT in Postgres 9.5
Create a custom row type for parameter passing. We could do without it, but it's simpler:
CREATE TYPE foobar AS (description text, type text);
CREATE OR REPLACE FUNCTION f_insert_foobar(VARIADIC _val foobar) RETURNS void AS $func$ WITH val AS (SELECT * FROM unnest(_val)) -- well-known row type , ins AS ( INSERT INTO foo AS f (type) SELECT DISTINCT v.type -- DISTINCT! FROM val v ON CONFLICT(type) DO UPDATE -- type already exists SET type = excluded.type WHERE FALSE -- never executed, but lock rows RETURNING f.type, f.id ) INSERT INTO bar AS b (description, foo_id) SELECT v.description, COALESCE(f.id, i.id) -- assuming most types pre-exist FROM val v LEFT JOIN foo f USING (type) -- already existed LEFT JOIN ins i USING (type) -- newly inserted ON CONFLICT (description) DO UPDATE -- description already exists SET foo_id = excluded.foo_id -- real UPSERT this time WHERE b.foo_id IS DISTINCT FROM excluded.foo_id -- only if actually changed $func$ LANGUAGE sql;
SELECT f_insert_foobar( '(testing,blue)' , '(another row,red)' , '(new row1,purple)' , '(new row2,purple)' , '("with,comma",green)' -- added to demonstrate row syntax );
Fast and rock-solid for environments with concurrent transactions.
In addition to the queries above, this ...
typethat doesn't exist in the FK table, yet, is inserted. Assuming most types pre-exist. To be absolutely sure and rule out race conditions, existing rows we need are locked (so that concurrent transactions cannot interfere). If that's too paranoid for your case you can replace:
ON CONFLICT(type) DO UPDATE -- type already exists SET type = excluded.type WHERE FALSE -- never executed, but lock rows
ON CONFLICT(type) DO NOTHING
UPDATE(true "UPSERT") on
bar: If the
descriptionalready exists its
ON CONFLICT (description) DO UPDATE -- description already exists SET foo_id = excluded.foo_id -- real UPSERT this time WHERE b.foo_id IS DISTINCT FROM excluded.foo_id -- only if actually changed
But only if
... passes values as well known row types with a
VARIADICparameter. Note the default maximum of 100 parameters! Compare:
There are many other ways to pass multiple rows ...
In your `INSERT missing FK rows at the same time` example, would putting this in a Transaction reduce the risk of race conditions in SQL Server?
@element11: The answer is for Postgres, but since we are talking about a *single* SQL command, it's a single transaction in any case. Executing it inside a bigger transaction would only *increase* the time window for possible race conditions. As for SQL Server: data-modifying CTEs are not supported at all (only `SELECT` inside a `WITH` clause) . Source: MS documentation.
You can also do this with `INSERT ... RETURNING \gset` in `psql` then use the returned values as psql `:'variables'`, but this only works for single row inserts.
@ErwinBrandstetter this is great, but I'm too new to sql to understand it all, could you add some comments to "INSERT missing FK rows at the same time" explaining how it works? also, thanks for the SQLFiddle working examples!
@glallen: I added a step-by-step explanation. There are also many links to related answers and the manual with more explanation. You *need* to understand what the query does or you may be in over your head.
Bravo on the single statement. I understand that MySQL won't support CTEs until 8.0, would this single statement be feasible in MySQL without CTEs?
@Jon: I don't think so. But please ask your new question as *question*. You can always link to this one for context.
Lookup. You basically need the foo id's to insert them into bar.
Not postgres specific, btw. (and you did not tag it like that) - this is generally how SQL works. No shortcuts here.
Application wise, though, you may have a cache of foo items in memory. My tables often have up to 3 unique fields:
- Id (integer or something) that is the table level primary key.
- Identifier, which is a GUID that is used as stable ID application level wise (and may be exposed to the customer in URL's etc.)
- Code - a string that may be there and has to be unique if it is there (sql server: filtered unique index on not null). That is a customer set identifier.
- Account (in a trading application) -> Id is a int used for foreign keys. -> Identifier is a Guid and used in the web portals etc. - always accepted. -> Code is manually set. Rule: once set it does not change.
Obviously when you want to link something to an account - you first must, technically, get the Id - but given both Identifier and Code never change once they are there, a positive cache in memory kan stop most lookups from hitting the database.
You are aware that you can let the RDBMS do the lookup for you, in a single SQL statement, avoiding error-prone cache?
You are aware that looking up non-changing elements is not error prone? Also, typically, the RDBMS is not scalable and the most expensive element in the game, due to licensing costs. Taking as much load from it as possible is not exactly bad. Also, not many ORM's do support that to start with.