How do I insert a row which contains a foreign key?
Using PostgreSQL v9.1. I have the following tables:
CREATE TABLE foo ( id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY, type VARCHAR(60) NOT NULL UNIQUE ); CREATE TABLE bar ( id BIGSERIAL NOT NULL UNIQUE PRIMARY KEY, description VARCHAR(40) NOT NULL UNIQUE, foo_id BIGINT NOT NULL REFERENCES foo ON DELETE RESTRICT );
Say the first table
foo
is populated like this:INSERT INTO foo (type) VALUES ( 'red' ), ( 'green' ), ( 'blue' );
Is there any way to insert rows into
bar
easily by referencing thefoo
table? Or must I do it in two steps, first by looking up thefoo
type I want, and then inserting a new row intobar
?Here is an example of pseudo-code showing what I was hoping could be done:
INSERT INTO bar (description, foo_id) VALUES ( 'testing', SELECT id from foo WHERE type='blue' ), ( 'another row', SELECT id from foo WHERE type='red' );
Your syntax is almost good, needs some parenthesis around the subqueries and it will work:
INSERT INTO bar (description, foo_id) VALUES ( 'testing', (SELECT id from foo WHERE type='blue') ), ( 'another row', (SELECT id from foo WHERE type='red' ) );
Tested at DB-Fiddle
Another way, with shorter syntax if you have a lot of values to insert:
WITH ins (description, type) AS ( VALUES ( 'more testing', 'blue') , ( 'yet another row', 'green' ) ) INSERT INTO bar (description, foo_id) SELECT ins.description, foo.id FROM foo JOIN ins ON ins.type = foo.type ;
Took reading it a few times, but I now understand that 2nd solution you provided. I like it. Using it now to bootstrap my database with a handful of known values when the system first comes up.
Plain INSERT
INSERT INTO bar (description, foo_id) SELECT val.description, f.id FROM ( VALUES (text 'testing', text 'blue') -- explicit type declaration; see below , ('another row', 'red' ) , ('new row1' , 'purple') -- purple does not exist in foo, yet , ('new row2' , 'purple') ) val (description, type) LEFT JOIN foo f USING (type);
The use of a
LEFT [OUTER] JOIN
instead of[INNER] JOIN
means that rows fromval
aren't dropped when no match is found infoo
. Instead,NULL
is entered forfoo_id
.The
VALUES
expression in the subquery does the same as @ypercube's CTE. Common Table Expressions offer additional features and are easier to read in big queries, but they also pose as optimization barriers. So subqueries are typically a bit faster when none of the above is needed.id
as column name is a wide-spread anti-pattern. Should befoo_id
andbar_id
or anything descriptive. When joining a bunch of tables, you end up with multiple columns all namedid
...Consider plain
text
orvarchar
instead ofvarchar(n)
. If you really need to impose a length restriction, add aCHECK
constraint:You may need to add explicit type casts. Since the
VALUES
expression is not directly attached to a table (like inINSERT ... VALUES ...
), types cannot be derived and default data types are used without explicit type declaration, which may not work in all cases. It's enough to do it in the first row, the rest will fall in line.
INSERT missing FK rows at the same time
If you want to create non-existent entries in
foo
on the fly, in a single SQL statement, CTEs are instrumental:WITH sel AS ( SELECT val.description, val.type, f.id AS foo_id FROM ( VALUES (text 'testing', text 'blue') , ('another row', 'red' ) , ('new row1' , 'purple') , ('new row2' , 'purple') ) val (description, type) LEFT JOIN foo f USING (type) ) , ins AS ( INSERT INTO foo (type) SELECT DISTINCT type FROM sel WHERE foo_id IS NULL RETURNING id AS foo_id, type ) INSERT INTO bar (description, foo_id) SELECT sel.description, COALESCE(sel.foo_id, ins.foo_id) FROM sel LEFT JOIN ins USING (type);
Note the two new dummy rows to insert. Both are purple, which does not exist in
foo
, yet. Two rows to illustrate the need forDISTINCT
in the firstINSERT
statement.Step-by-step explanation
The 1st CTE
sel
provides multiples rows of input data. The subqueryval
with theVALUES
expression can be replaced with a table or subquery as source. ImmediatelyLEFT JOIN
tofoo
to append thefoo_id
for pre-existingtype
rows. All other rows getfoo_id IS NULL
this way.The 2nd CTE
ins
inserts distinct new types (foo_id IS NULL
) intofoo
, and returns the newly generatedfoo_id
- together with thetype
to join back to insert rows.The final outer
INSERT
can now insert a foo.id for every row: either the type pre-existed, or it was inserted in step 2.
Strictly speaking, both inserts happen "in parallel", but since this is a single statement, default
FOREIGN KEY
constraints will not complain. Referential integrity is enforced at the end of the statement by default.SQL Fiddle for Postgres 9.3. (Works the same in 9.1.)
There is a tiny race condition if you run multiple of these queries concurrently. Read more under related questions here and here and here. Really only happens under heavy concurrent load, if ever. In comparison to caching solutions like advertised in another answer, the chance is super-tiny.
Function for repeated use
For repeated use I would create an SQL function that takes an array of records as parameter and use
unnest(param)
in place of theVALUES
expression.Or, if the syntax for arrays of records is too messy for you, use a comma-separated string as parameter
_param
. For instance of the form:'description1,type1;description2,type2;description3,type3'
Then use this to replace the
VALUES
expression in above statement:SELECT split_part(x, ',', 1) AS description split_part(x, ',', 2) AS type FROM unnest(string_to_array(_param, ';')) x;
Function with UPSERT in Postgres 9.5
Create a custom row type for parameter passing. We could do without it, but it's simpler:
CREATE TYPE foobar AS (description text, type text);
Function:
CREATE OR REPLACE FUNCTION f_insert_foobar(VARIADIC _val foobar[]) RETURNS void AS $func$ WITH val AS (SELECT * FROM unnest(_val)) -- well-known row type , ins AS ( INSERT INTO foo AS f (type) SELECT DISTINCT v.type -- DISTINCT! FROM val v ON CONFLICT(type) DO UPDATE -- type already exists SET type = excluded.type WHERE FALSE -- never executed, but lock rows RETURNING f.type, f.id ) INSERT INTO bar AS b (description, foo_id) SELECT v.description, COALESCE(f.id, i.id) -- assuming most types pre-exist FROM val v LEFT JOIN foo f USING (type) -- already existed LEFT JOIN ins i USING (type) -- newly inserted ON CONFLICT (description) DO UPDATE -- description already exists SET foo_id = excluded.foo_id -- real UPSERT this time WHERE b.foo_id IS DISTINCT FROM excluded.foo_id -- only if actually changed $func$ LANGUAGE sql;
Call:
SELECT f_insert_foobar( '(testing,blue)' , '(another row,red)' , '(new row1,purple)' , '(new row2,purple)' , '("with,comma",green)' -- added to demonstrate row syntax );
Fast and rock-solid for environments with concurrent transactions.
In addition to the queries above, this ...
... applies
SELECT
orINSERT
onfoo
: Anytype
that doesn't exist in the FK table, yet, is inserted. Assuming most types pre-exist. To be absolutely sure and rule out race conditions, existing rows we need are locked (so that concurrent transactions cannot interfere). If that's too paranoid for your case you can replace:ON CONFLICT(type) DO UPDATE -- type already exists SET type = excluded.type WHERE FALSE -- never executed, but lock rows
with
ON CONFLICT(type) DO NOTHING
... applies
INSERT
orUPDATE
(true "UPSERT") onbar
: If thedescription
already exists itstype
is updated:ON CONFLICT (description) DO UPDATE -- description already exists SET foo_id = excluded.foo_id -- real UPSERT this time WHERE b.foo_id IS DISTINCT FROM excluded.foo_id -- only if actually changed
But only if
type
actually changes:... passes values as well known row types with a
VARIADIC
parameter. Note the default maximum of 100 parameters! Compare:There are many other ways to pass multiple rows ...
Related:
In your `INSERT missing FK rows at the same time` example, would putting this in a Transaction reduce the risk of race conditions in SQL Server?
@element11: The answer is for Postgres, but since we are talking about a *single* SQL command, it's a single transaction in any case. Executing it inside a bigger transaction would only *increase* the time window for possible race conditions. As for SQL Server: data-modifying CTEs are not supported at all (only `SELECT` inside a `WITH` clause) . Source: MS documentation.
You can also do this with `INSERT ... RETURNING \gset` in `psql` then use the returned values as psql `:'variables'`, but this only works for single row inserts.
@ErwinBrandstetter this is great, but I'm too new to sql to understand it all, could you add some comments to "INSERT missing FK rows at the same time" explaining how it works? also, thanks for the SQLFiddle working examples!
@glallen: I added a step-by-step explanation. There are also many links to related answers and the manual with more explanation. You *need* to understand what the query does or you may be in over your head.
Bravo on the single statement. I understand that MySQL won't support CTEs until 8.0, would this single statement be feasible in MySQL without CTEs?
@Jon: I don't think so. But please ask your new question as *question*. You can always link to this one for context.
Brilliant explanation and examples!
Lookup. You basically need the foo id's to insert them into bar.
Not postgres specific, btw. (and you did not tag it like that) - this is generally how SQL works. No shortcuts here.
Application wise, though, you may have a cache of foo items in memory. My tables often have up to 3 unique fields:
- Id (integer or something) that is the table level primary key.
- Identifier, which is a GUID that is used as stable ID application level wise (and may be exposed to the customer in URL's etc.)
- Code - a string that may be there and has to be unique if it is there (sql server: filtered unique index on not null). That is a customer set identifier.
Example:
- Account (in a trading application) -> Id is a int used for foreign keys. -> Identifier is a Guid and used in the web portals etc. - always accepted. -> Code is manually set. Rule: once set it does not change.
Obviously when you want to link something to an account - you first must, technically, get the Id - but given both Identifier and Code never change once they are there, a positive cache in memory kan stop most lookups from hitting the database.
You are aware that you can let the RDBMS do the lookup for you, in a single SQL statement, avoiding error-prone cache?
You are aware that looking up non-changing elements is not error prone? Also, typically, the RDBMS is not scalable and the most expensive element in the game, due to licensing costs. Taking as much load from it as possible is not exactly bad. Also, not many ORM's do support that to start with.
Non-changing elements? Most expensive element? Licensing costs (for PostgreSQL)? ORMs defining what's sane? No I wasn't aware of all of that.
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
Stéphane 7 years ago
Took reading it a few times, but I now understand that 2nd solution you provided. I like it. Using it now to bootstrap my database with a handful of known values when the system first comes up.