What is the difference between orthologs, paralogs and homologs?
These three terms are often misused in the literature. Many researchers seem to treat them as synonyms. So, what is the definition of each of these terms and how do they differ from one another?
Another pet peeve: homology is qualitative, not quantitative. So saying "significant homology" or "weak homology" is just wrong.
I hope you are getting the point here - the definitions are clear, but determining whether two genes are paralogs or orthologs is in the grey area. Adaptation can give a gene new functions and phenotypes even without duplication, so two paralogs can be as different as orthologs.
@shigeta, I'm not sure what you mean. Usually, orthologs are more similar than paralogs, exactly because functional divergence often follows duplication. Paralogous genes often have different functions and, therefore, less sequence similarity than orthologs. As for acquiring novel functions without duplication, believe me I know, I have spent the last couple of years working on moonlighting proteins :).
I'm saying that orthologs might resemble each other relatively little - their sequence can diverge to the point that they don't align or they may acquire secondary functions i addition to the original function. Think about e coli - fly orthologs. In the bacterial genome space this is a common issue.
Regarding the function of paralogs, it should also be mentioned that in some cases they do not diverge in function but instead retain their function but work in different conditions or as genetic backups.
In biology, definitions are only 'suggested guidelines'. Often the organisms don't cooperate though.
First, a note on spelling. Both "ortholog" and "orthologue" are correct, one is the American and the other the British spelling. The same is true for homolog and paralog.
On to the biology. Homology is the blanket term, both ortho- and paralogs are homologs. So, when in doubt use "homologs". However:
Orthologs are homologous genes that are the result of a speciation event.
Paralogs are homologous genes that are the result of a duplication event.
The following image, adapted (slightly) from , illustrates the differences:
Part (a) of the diagram above shows a hypothetical evolutionary history of a gene. The ancestral genome had two copies of this gene (A and B) which were paralogs. At some point, the ancestral species split into two daughter species, each of whose genome contains two copies of the ancestral duplicated gene (A1,A2 and B1,B2).
These genes are all homologous to one another but are they paralogs or orthologs? Since the duplication event that created genes A and B occurred before the speciation event that created species 1 and 2, A genes will be paralogs of B genes and 1 genes will be orthologs of 2 genes:
- A1 and B1 are paralogs
- A1 and B2 are paralogs.
- A2 and B1 are paralogs.
A2 and B2 are paralogs.
A1 and A2 are orthologs.
- B1 and B2 are orthologs
This however, is a very simple case. What happens when a duplication occurs after a speciation event? In part (b) of the above diagram, the ancestral gene was duplicated only in species 2's lineage. Therefore, in (b):
- A2 and B2 are orthologs of A1.
- A2 and B2 are paralogs of each other.
A common misconception is that paralogous genes are those homologous genes that are in the same genome while orthologous genes are those that are in different genomes. As you can see in the example above, this is absolutely not true. While it can happen that way, ortho- vs paralogy depends exclusively on the evolutionary history of the genes involved. If you do not know whether a particular homology relationship is the result of a gene duplication or a speciation event, then you cannot know if it is a case of paralogy or orthology.
I highly recommend the Jensen article referenced above. I read it when I was first starting to work on comparative genomics and evolution and it is a wonderfully clear and succinct explanation of the terms. Some of the articles referenced therein are also worth a read:
- Koonin EV: An apology for orthologs - or brave new memes. Genome Biol, 2001, 2:comment1005.1-1005.2.
- Petsko GA: Homologuephobia. Genome Biol 2001, 2:comment1002.1-1002.2.
- Fitch WM: Distinguishing homologous from analogous proteins. Syst Zool 1970, 19:99-113. (of historical interest, the terms were first used here)
- Fitch WM: Homology a personal view on some of the problems. Trends Genet 2000, 16:227-31.
@January, isn't it? I've been sending it to everyone who asks me the ortho vs para question for years. Thought I'd share it with y'all here :).
So given a pair of genes that have diverged by both speciation and duplication, they are orthologs or paralogs depending on which divergence event occurred first. Is that right?
Both orthologs and paralogs are types of homologs, that is, they denote genes that derive from the same ancestral sequence.
Orthologs are corresponding genes in different lineages and are a result of speciation, whereas paralogs result from a gene duplication. This often has important implications: while orthologs often fulfill the same role, paralogs tend to diverge in their function, so paralogy is a worse indicator of functional analogy than orthology.
This, however, is only the tip of the iceberg, since the situation can be much more complex (see, for example, the hidden paralogy problem).
There is a great article by Fitch on that subject.
First, the definition: two genes are homologs if they derive from a common ancestor. Generally speaking, if two nucleotide sequences have at least 30% (or greater than 10% amino acid sequence) identity, they are likely to be from a common ancestry, however, they may not be homologous. Note the reverse does not apply: two genes can also be homologs if there is no similarity; this happens every time the drift was long enough (many genes are no longer similar---beyond random identity---after 1 billion years; only highly conserved genes keep some similarity).
In addition to the other answers another diagram and further terms:
(after W. M. Fitch, Trends in Genetics, 16, 5, May 2000, p.228)
- B1 and C1 are (1:1) orthologs
- B1, C2 and C3 are (1:n) orthologs
- A1 and AB1 are xenologs (horiz. gene transfer)
I think (1:n) ortholog is a synonym for outparalog.
Klaus D. Grasser: Annual Plant Reviews, Regulation of Transcription in Plants (Volume 29). Wiley-Blackwell, ISBN 1-4051-4528-5, p. 37.
I am sorry but this is simply not true. >=30% nt and >=10% aa are not sufficient to demonstrate homology. For example, 2 long proteins can share a domain. They can easily have >10% aa identity and not be homologous at all. In any case, >=30% nt does not mean there is "no other possible explanation" but homology, you are forgetting convergent evolution. For a well known example, AFGP proteins from _D. mawsoni_ and _B. saida_ show 69% sequence identity but are not homologous.
I'm aware of convergent evolution, as well as repeating sequence parts. There is also the conservation of domains with respect to non-functional parts. That's why I wrote "in general".
Given the quality of your many other answers, I imagine you must be aware of it :). I am just saying that there is no magic sequence similarity threshold that signifies homology. I was reacting to this phrase: "there can be no other explanation than common ancestry for this fact".
I modified your answer to be less forceful in its statement that terdon took issue with. If you don't agree with it, you can reject the change.