SQL: empty string vs NULL value
I know this subject is a bit controversial and there are a lot of various articles/opinions floating around the internet. Unfortunatelly, most of them assume the person doesn't know what the difference between NULL and empty string is. So they tell stories about surprising results with joins/aggregates and generally do a bit more advanced SQL lessons. By doing this, they absolutely miss the whole point and are therefore useless for me. So hopefully this question and all answers will move subject a bit forward.
Let's suppose I have a table with personal information (name, birth, etc) where one of the columns is an email address with varchar type. We assume that for some reason some people might not want to provide an email address. When inserting such data (without email) into the table, there are two available choices: set cell to NULL or set it to empty string (''). Let's assume that I'm aware of all the technical implications of choosing one solution over another and I can create correct SQL queries for either scenario. The problem is even when both values differ on the technical level, they are exactly the same on logical level. After looking at NULL and '' I came to a single conclusion: I don't know email address of the guy. Also no matter how hard i tried, I was not able to sent an e-mail using either NULL or empty string, so apparently most SMTP servers out there agree with my logic. So i tend to use NULL where i don't know the value and consider empty string a bad thing.
After some intense discussions with colleagues i came with two questions:
am I right in assuming that using empty string for an unknown value is causing a database to "lie" about the facts? To be more precise: using SQL's idea of what is value and what is not, I might come to conclusion: we have e-mail address, just by finding out it is not null. But then later on, when trying to send e-mail I'll come to contradictory conclusion: no, we don't have e-mail address, that @!#$ Database must have been lying!
Is there any logical scenario in which an empty string '' could be such a good carrier of important information (besides value and no value), which would be troublesome/inefficient to store by any other way (like additional column). I've seen many posts claiming that sometimes it's good to use empty string along with real values and NULLs, but so far haven't seen a scenario that would be logical (in terms of SQL/DB design).
P.S. Some people will be tempted to answer, that it is just a matter of personal taste. I don't agree. To me it is a design decision with important consequences. So i'd like to see answers where opion about this is backed by some logical and/or technical reasons.
@ammoQ: Oracle's treatment of zero-length strings is non-standard. Besides, `''` even in Oracle, is not the same as `NULL`. For example, assigning a `CHAR(1)` column the value `''` will result in `' '` (i.e. a space), not `NULL`. Besides, if Jacek was using Oracle, this question would likely not even come up :-)
Dean: You are right about the char(1) example, but that's yet another WTF, since `'' IS NULL` evaluates to `true` in PL/SQL.
"am I right in assuming that using empty string for an unknown value is causing a database to "lie" about the facts?" if your business users don't care about unknown vs empty, does the lie even matter?
I would say that
NULLis the correct choice for "no email address". There are many "invalid" email addresses, and "" (empty string) is just one. For example "foo" is not a valid email address, "[email protected]@c" is not valid and so on. So just because "" is not a valid email address is no reason to use it as the "no email address" value.
I think you're right in saying that "" is not the correct way to say "I don't have a value for this column". "" is a value.
An example of where "" might be a valid value, separate to
NULLcould be a person's middle name. Not every one has a middle name, so you need to differentiate between "no middle name" ("" - empty string) and "I don't know if this person has a middle name or not" (
NULL). There's probably many other examples where an empty string is still a valid value for a column.
Totally agree. NULL is there for a reason. SELECT COUNT(*) FROM YOURTABLE WHERE EMAIL IS [NOT] NULL is the way to do it, not string comparison which will tend to be slower (even for empty strings I suppose but I'm not sure of this one :).
I think `NULL` does not mean that there is no email address, i think it means that the email address is currently not known, not known to exist, or is impossible to fill in for other reasons. Fortunately, the is probably no situation where one would want to keep in a database the information about people who truly do not have and do not plan to have any email address, otherwise a separate boolean field would probably be necessary.
@Alexey - NULL means there is no value. As others have pointed out an empty string is a value.
@Ramhound, i agree that the empty string is a value, and that NULL vaguely means "there is no value". I just explained my interpretation of "no value". In my opinion, it is not the same as "the person has not opened any email account". It is rather "no email address recorded for that person".