Lecture 1 details

Lecture 1: the evolution of protein-coding genes

Pauling and Zuckerkandl

Genes evolve at the level of DNA (or RNA for some viruses)

We routinely abstract the complex molecule DNA to strings of letters with little loss of relevant information:

atg gtg ctc agc gag gga gaa tgg cag ttg gtt ctg cac gtc ...

It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.Watson and Crick (1953)

The translation from nucleic acid to protein proceeds in a sequential fashion according to a systematic code with relatively simple rules.Nirenberg (1968)

But most selection is on the protein

The properties of proteins are not easily abstracted.

Proteins are linear polymers encoded by DNA:

atg gtg ctc agc gag gga gaa tgg cag ttg gtt ctg cac gtc ...
M   V   L   S   E   G   E   W   Q   L   V   L   H   V   ...

Proteins derive their relevant properties from their three-dimensional structures.

Properties of proteins are not easily abstracted

_images/myoglobin_static.png

PyMol analysis of myoglobin

Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates, and it is more complicated than has been predicated by any theory of protein structure.Kendrew et al (1958)

Context for reading Zuckerkandl and Pauling (1965)

They knew the structure of sperm whale myoglobin

_images/myoglobin_static.png

PyMol analysis of myoglobin

Structure had been determined by Kendrew et al (1958)

They knew that single mutations could potentially have large biological consequences

MVHLTPEEKSAVT...
MVHLTPvEKSAVT...

“Sickle Cell Anemia, a Molecular Disease” (Pauling et al, 1949)

“A Specific Chemical Difference between Globins of Normal and Sickle-cell Anemia Hemoglobins” (Ingram, 1956)

They knew the sequences of a variety of homologous globins

Analysis of myoglobin homologs

We will perform an analysis similar to that of Zuckerkandl and Pauling, and look at some of the conclusions that they draw.

Myoglobin homologs

>carp
MA----DHELVLKCWGGVEADFEGTGGEVLTRLFKQHPETQKLFPKFVGIA-QSDLAGNAAVKAHGATVLKSWASCLKARGDHAAILKPLATTHANTHKIALNNFRLITEVLVKVMAEKAGLD--AGGQSALRRVMDVVIGDIDTYYKEIGFAG
>chicken
MGLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKIPVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEFGFQG
>horse
MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG
>human
MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG
>mouse
MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSEDLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRHSGDFGADAQGAMSKALELFRNDIAAKYKELGFQG
>sperm-whale
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
>tuna
MA----DFDAVLKCWGPVEADYTTMGGLVLTRLFKEHPETQKLFPKFAGIA-QADIAGNAAISAHGATVLKKLGELLKAKGSHAAILKPLANSHATKHKIPINNFKLISEVLVKVMHEKAGLD--AGGQTALRNVMGIIIADLEANYKELGFSG
>turtle
MGLSDDEWNHVLGIWAKVEPDLTAHGQEVIIRLFQLHPETQERFAKFKNLTTIDALKSSEEVKKHGTTVLTALGRILKQKNNHEQELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHPSDFGADSQAAMKKALELFRNDMASKYKEFGFLG
>zebrafish
MA----DHDLVLKCWGAVEADYAANGGEVLNRLFKEYPDTLKLFPKFSGIS-QGDLAGSPAVAAHGATVLKKLGELLKAKGDHAALLKPLANTHANIHKVALNNFRLITEVLVKVMAEKAGLD--AAGQGALRRVMDAVIGDIDGYYKEIGFAG

Sequences at Analysis of myoglobin homologs

Site variability among myoglobin homologs

_images/myoglobin_site_preferences_logoplot.jpg

Code /data to produce this figure at Analysis of myoglobin homologs

Very little sequence is strictly conserved

Relatively few insertions and deletions

Deletions or additions of one to several amino acid residues are expected to be eliminated by natural selection in a high proportion of cases. Those that are preserved should be mostly found at either end of a chain, at the end of helices, in short helices, or in nonhelical regions, notably in loops that may be shortened or lengthened without affecting the steric relationships in the rest of the molecule. A deletion or addition in the middle of a long helix would result in so many simultaneous alterations in side-chain interactions that it is highly unlikely that the tertiary structure and the function of the molecule could survive such an event. The deletions or additions found in hemoglobin and myoglobin chains are compatible with these generalities.

Zuckerkandl and Pauling (1965)

Structure is highly conserved

This observation has stood the test of time. In fact, sequences with as little as 30-35% identity generally have very similar structures. See Chothia and Lesk (1984) and Sander and Schneider (1991).

Structure is highly conserved

_images/sperm_whale_tuna_myoglobin_overlay_static.png

PyMol analysis of myoglobin

Structural alignment of sperm whale (gray) and yellowfin tuna (orange) myoglobins. These proteins are only 57% identical.

Different proteins evolve at different rates

Zuckerkandl and Pauling’s view of evolution

One key point (subject of next week)

Many such substitutions may lead to relatively little fuinctional change, whereas at other times the replacement of one single amino acid residue by another may lead to a radical functional change… Of course, the two aspects are not unrelated, since the functional effect of a given single substitution will frequently depend on the presence or absence of a number of other substitutions.

What do we call that phenomenon?

Zuckerkandl and Pauling propose a molecular clock

A striking example of the molecular clock

Figure 6 of dos Reis et al (2009)

Important implications

Hand-written notes

notes from the second half of class