DNA Icon Sequence Icon Icon

A bit of evolution

Evolution owl
10 to 100 million species live on earth.
Evolution is the source of this diversity: all species are related, to varying degrees. Within each species, individuals evolve over generations through random changes in their DNA.
These changes (mutations) will be selected, according to the environment and the interaction of species and individuals within them. Sometimes, a new characteristic or even a new species appears, which is better adapted to its environment.
In order to define these relationships, biologists look not only at what species have in common, but also what distinguishes them. The relationships between different species are most often represented in the form of a tree, called a ‘phylogenetic tree’ or ‘tree of life’.
In Darwin’s time, species were first compared on the basis of their morphology – for example, analyses of the size, shape and structure of bones, the presence of hair or scales or, for plants, the position of leaves on a stem. The more similar the morphological characteristics of two species are, the more recent their common ancestor is.
Today, it is possible to study the evolution of species by comparing their DNA and in particular, their genes or proteins. The more similar the DNA of two species is, the more recent their common ancestor is.

What is a tree of life? What is a mutation? What is natural selection? How to build a tree of life with molecular data? What are 'orthologs'?

The tree of life

"All species have common ancestors and evolve by natural selection". This is the theory proposed by Charles Darwin in 1859 in his book "On the Origin of Species". He also popularized the concept of the tree of life to describe the relationship between species.

Here is a sketch of a tree of life proposed by Darwin in 1837 (First Notebook on Transmutation of Species).

In Darwin's time, species were first compared on the basis of their morphology - for example, analyses of the size, shape and structure of bones, the presence of hair or scales, or, for plants, the position of leaves on a stem.

Source: Wikimedia

The evolution of the tree of life...

New technologies (DNA sequencing), access to sequences (DNA & proteins) and the advancement of bioinformatics and statistical techniques have changed the way species are classified. We can now estimate the degree of relatedness between species and build a tree of life by comparing the sequences of their genes and/or proteins.

How does it work?

To understand how it is possible to build a tree of life by comparing genes or protein sequences, it is necessary to (re)dive into some basic notions of evolution.

Here, for example, is the tree of life published by scientists in 2006. They compared 31 proteins from 191 different species.

The molecular basis for evolution

The molecular basis for evolution are the random changes that occur in DNA.

Most often, these changes occur at the time of cell division, which is when the cell must make a copy of its DNA.

Copying is error-prone and can lead to 'typos'! The DNA replication system is not perfect! These 'typos' are referred to as mutations. Here are some examples:

Mutations and natural selection

Depending on where these mutations are located in the genome, they can affect mechanisms which are fundamental to the biology, physiology, and development of organisms. Some mutations do not.

Why ?

– Certain mutations have no impact because they are found outside of genes or outside the regions which regulate gene expression.

– Mutations located within introns generally have less impact than those located in exons.

– Certain mutations have no impact on the sequence of the protein, because the genetic code is redundant: several codons code for the same amino acid (GTT-> V, GTA -> V, GTC -> V, GTG -> V).

– The change(s) in the amino acid(s) do not affect the 3D structure and the functioning of the protein.

If a mutation leads to a change in amino acid that modifies the function of a protein and this change confers an advantage for an individual in a particular environment, this particular individual will be able to survive and/or reproduce more quickly : the mutation will then find itself in the generations that will follow.

And sometimes, a new characteristic, even a new species appears. But if the environment changes again, this new species may not survive!

Note that in multicellular organisms, only the mutations present in the DNA of the cells involved in reproduction can have an impact on the following generations (ovules, sperm, pollen, spores, ...). The impacts are often visible only after several generations. And as the time between 2 generations is sometimes very long (25 years for man or turtle, for example), it is difficult to 'see' the evolution!

Natural selection: an example

In the following example, every circle represents an individual. All these individuals belong to the same species.

The red mutation is deleterious : it is possibly the cause of a rare genetic disease. The green mutation gives an advantage to individuals who carry it in the environment in which they live. As a consequence, over many generations, the number of individuals with the green mutation will increase, until the green mutation is the most frequent within the population in the given environment. The red mutation will disappear. In another environment, it is the red mutation that could have been selected!

A story about bird feeder

The bird feeder

Source: Wikimedia


The length of the bird beaks is influenced by a number of genes. But one gene in particular attracted the attention of researchers in 2017: COLA45.

In the populations of great tits studied, this gene had two alleles, T and C. The C allele is associated with a longer beak and is more frequently found in the populations of UK great tits than in the populations of Dutch great tits.

The selection for longer beaks may be specific to the UK: something in the environment in this country has favored the great tits with the C allele and a long beak.

Hypothesis: The long beaks might confer an advantage in the UK, as they could allow the birds to access the food provided in bird feeders more easily, bird feeders which are particularly frequent in the gardens of this country.

Thanks to trackers placed on the birds, the researchers discovered that the tits with the C allele used the bird feeders more often than those with the T allele.

This indicates that the researchers proposed a good hypothesis: the availability of food in the bird feeders could give an advantage to birds with longer beaks, which could access food more easily.

The evolution of the length of the beaks of these birds has been observed for 25 years. The genetic analyses have been carried out on more than 2,300 birds: 490,000 mutations have been studied. But these data, to be validated, will require additional, meticulous studies on the genetics of these birds and on their environment.

Recent natural selection causes adaptive evolution of an avian polygenic trait
Understanding evolution (Berkeley): A New Story of Birds and Beaks

Building a tree of life with molecular data: important concepts

1. Reference genome

Each species is made up of a multitude of individuals. Each individual is unique, and the genome of each individual within a species is unique!

In order to compare species on the basis of their genome, biologists work with a reference genome that has been chosen for each species. For each species whose genome has been sequenced, there is a reference genome sequence and a set of 'reference' gene and protein sequences.

It is thus possible to compare either the sequences of the entire genomes (but this is not simple and does not always make sense), or the sequences of genes or proteins.

And that's not all! We must compare what is comparable!

2. Orthology: another important concept

"Orthology, the formalization of the intuitive notion of ‘corresponding genes in different species’, is a cornerstone of genomics"

To classify species, we must compare the 'same' characters! It is crucial to compare the sequence of the same gene or same protein present in different species.

The quest for these orthologs is a very important step to study evolution at the molecular level.

Orthologs (example 1): the insulin gene sequences of human, chimpanzee, cow and fish

Here is the DNA sequence corresponding to the insulin gene in the reference genome for man, chimpanzee, cattle and fish (Danio rerio). In red: the exons.
These 4 genes are ‘orthologous’: they code for a similar protein, which has the same biological function and a common ancestor.

Insulin exists in all vertebrates since the myxines (a very ancient taxon, including individuals who lived probably some 100 million years ago) until man (a more recent taxon, who appeared about 7 million years ago).

Orthologs (example 2): the insulin protein sequences of human, chimpanzee, cow and fish

Here are the amino acid sequences of the insulin protein in man, chimpanzee, fish and cow:

Have fun finding the differences!

Building a tree of life with molecular data: the basic concept.

It is possible to construct a tree of life by comparing the amino acid sequences of these orthologous proteins.

We can, for instance, compare the amino acid sequences and ‘count’ the differences. The simplified procedure can be carried out manually in the following example:

Here is part of the amino acid sequence of insulin from humans, chimpanzees, cows and a fish.

The human and chimpanzee sequences are the most similar (1 difference). The fish sequence contains the most differences from the other sequences.

These observations can be represented by a tree.

Your turn to play: who is the cousin of the cucumber?

Build trees with the program Philophylo (in FR)

  • Select a protein.
  • Philophylo searches for the sequences of this protein in different species in the UniProtKB/Swiss-Prot database.
  • Philophylo compares protein sequences… in bioinformatics language, it builds a ‘multiple sequence alignment’.
  • Bioinformatics programs evaluate the differences or similarities observed in the alignment. The result is ‘modeled’ as a tree. The phyla correspond to hypothetical ancestral organisms.
  • Remark: Philophylo does not build a genuine phylogenetic tree (the calculations would be much more complicated and too long!).

Who is the ‘cousin’ of the cucumber?
– You can find out who has a common ancestor with the cucumber by building a phylogenetic tree with the Ethylene Receptor sequences.

Who is the ‘cousin’ of the dodo or the mammoth?
– You can find out who has a common ancestor with the dodo or the mammoth by building a phylogenetic tree with the Cytochrome B sequences.

Experts compare tens of thousands of sequences, using complex bioinformatics and statistical programs.

To build this 'new' tree of life in 2016, the researchers compared 16 different proteins from 3,830 species.

Source: A new view of the tree of life (2016)

The hypothetical common ancestor of all species (located in the center of the tree) is called LUCA (Last Universal Common Ancestor).

It theoretically lived 3.5 to 4 billion years ago and would have been composed of only one cell.

Source: The physiology and habitat of the last universal common ancestor (2016) - Physiology, phylogeny, and LUCA (2016)

These trees of life are continuously updated with new data!

But this history of life will always remain approximate, because we do not have and will never have access to all the sequences of all the organisms that live or have lived on Earth!

The challenges, an overview

The challenges of building a tree of life with molecular data are many:

(1) To have access to the genome's sequences of the species of interest (sequencing),

(2) To have access to information about the location of genes within the genome sequences & to determine the corresponding protein sequences (annotation),

(3) To determine which gene(s) or protein(s) are 'orthologs (‘quest for orthologs’).

And that's where bioinformatics comes in!

Print Friendly, PDF & Email

What did you think?

Loading spinner