Skip to main content
DNA Icon Sequence Icon Icon

A bit of biology

Context

10 to 100 million species live on earth.

Evolution is the source of this diversity: all species are related, to varying degrees. Within each species, individuals evolve over generations through random changes in their DNA.

These changes (mutations) will be selected, according to the environment and the interaction of species and individuals within them. Sometimes, a new characteristic or even a new species appears, which is better adapted to its environment.

In order to define these relationships, biologists look not only at what species have in common, but also what distinguishes them. The relationships between different species are most often represented in the form of a tree, called a ‘phylogenetic tree’ or ‘tree of life’.

In Darwin’s time, species were first compared on the basis of their morphology – for example, analyses of the size, shape and structure of bones, the presence of hair or scales or, for plants, the position of leaves on a stem. The more similar the morphological characteristics of two species are, the more recent their common ancestor is.

Today, it is possible to study the evolution of species by comparing their DNA and in particular, their genes or proteins. The more similar the DNA of two species is, the more recent their common ancestor is.

How are proteins, these incredible little machines essential to the proper functioning of all organisms, made? What is the link between DNA, genes, proteins and evolution?

What is DNA? What is a gene? What is a protein? What does this data look like today?

Cells, chromosomes and DNA

Every living organism is made of of cells.

Each cell contains chromosomes.

Who has the largest number of chromosomes?

common name

bacteria

E.coli

gold fish
fern

Ophioglossum

chimpanzee humans banana

Australian ant

number of chromosomes (2n) in a cell

1

100 1,440 48
46
11

1 or 2*

Icon Bacterie Icon Fish Icon Fougère Icon Monkey Icon Human Icon Banana Icon Fourmi
*Females, diploid, have 2 chromosomes; males, haploid, have only one.
Source: wikipedia

DNA and genomes

A chromosome can be compared more or less to a compact ball of thread, where the thread is the DNA.

Adapted from Wikimedia Commons

DNA usually has a characteristic ‘double helix’ structure composed of two strands. Each strand is a long molecule consisting of a succession of 4 nucleotides, called A (adenine), T (thymine), G (guanine) and C (cytosine). The 2 strands are complementary:
an A on one strand faces a T in the other strand, a G faces a C.

DNA is universal: it is found in all living organisms! It is also found in some viruses, sometimes in a slightly different form (single-stranded DNA).

What is the sequence of banana chromosome 3? What is the length of human chromosome 1 in centimeters (cm)?

Beginning of the sequence of banana chromosome 3 (total length: 30,470,407 bp; 1 cm):

>NC_025204.1
ACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA
ACCCTAAACCCTAAACCCTAAAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT
AAACCCTAAACCCTAAAACCAAAAAAAATGGAATAATTACTTTAAATCTTAATTATTCCTTTATTTTTGT
TTTTTTTTTTTTTAATCTTGATGCCCGATTACCCGATATGTCGGCTGGGCGGGCGCTTGGACATTGCGCT
CGTTGGGCCCAACCTGTGCTGGGCTTTTGCGTCGGCCTTTTCAATGTACTGGGTCAAACCTGAGTCATGA...

Beginning of the chromosome sequence of the E.coli bacterium (total length: 4,646,332 bp; 0.15 cm):

>AP009048.1
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA...

A piece of sequence from human chromosome 1 (total length: 248,956,422 bp; 8.2 cm):

>CM000663.2
GGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCCGGGCACTACAGGACCC
GCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGCAGGGCTCTCTTGCTTA
GAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTATAGTGGTGG
CACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTAGTGGCAGCACGCCCAC
CTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGATTATAGGGAAACACCC...

Note: ‘N’s can be found in the sequences: this means that the nucleotides could not be identified during sequencing.

The size of genomes, usually expressed in base pairs (bp) or in millions of bases (Mb), is highly variable from one organism to another. And it is not always the organism that you think has the largest genome!

Who has the largest genome?

common name
bacteria
E.coli
virus
SARS-CoV-2
fruit fly plant
Paris japonica
humans banana
size of the genome (bp) 4,646,332 29,903 143 millions
150 billions 3 billions 472 millions
Icon Bacterie Icon Virus Icon Mouche Icon Paris Icon Human Icon Banana
Note: this is the size of the ‘haploid’ genome: for humans, for example, 3 billion bp correspond to the sequence of 23 chromosomes.
Sources:
E.coli: NCBI Genome; SARS-COV-2: NCBI Genome; Drosophila Melanogaster: NCBI Genome; Paris japonica: Harvard bionumbers; Homo sapiens: NCBI Genome; Banana: NCBI Genome

Why is it so important to know the sequence of genomes?

Because genomes contain the information necessary for the construction of organisms and in particular to make proteins…

DNA and genes

The order of the nucleotides in DNA is very important and constitutes what is known as genetic information. A bit like a cookbook, DNA contains a number of recipes, called ‘genes‘. We will be interested in the genes that code for proteins.

 

Who, among the following species, has the largest number of protein-coding genes?

common name
bacteria
E.coli
nematode
C.elegans
banana chimpanzee humans
number of genes coding for proteins 4,140 20,356 36,439 23,534 20,430
Icon Bacterie Icon Worm Icon Banana Icon Monkey Icon Human
The number of genes varies depending on the method used for finding genes and can change over time!
Sources:
E coli: OMA; C. elegans: OMA; Chimp: OMA; Human: OMA; Banana: OMA

Here is a piece of DNA located on human chromosome 11 that corresponds to the gene coding for the protein insulin.

In eukaryotes, genes are not 'continuous.' They are composed of non-coding regions (introns, in black), and coding regions, (exons, in red). Exons are translated into an amino acid sequence.During translation, or protein synthesis, the introns are removed.

Finding genes (and exons) in the genomes of different organisms is still a major challenge today!

Proteins

Proteins are essential to life.

Comics: A protein? A what?

When a cell or organism needs a protein, the corresponding gene will first be copied.

The copy, called messenger RNA (mRNA), is then transmitted to the ribosomes, the cellular machines which produce the proteins.

A note from the experts: a gene can code for several proteins

RNA undergoes a maturation process which leads to the elimination of introns. This process is called ‘splicing’.
It can be alternative: the combination of exons present at the end can be different according. A gene can produce different mRNAs…and hence different proteins.
One of the most extreme examples is the Drosophila (fruit fly) Dscam gene: this gene is composed of 95 alternative exons and can produce up to 38,000 different proteins. That is, there are more different proteins that could be produced from this one gene than the total number of genes in the entire genome!
(Source: Role of RNA secondary structures in regulating Dscam alternative splicing (2019)).

In evolutionary studies at the molecular level, biologists associate each gene with a representative (‘canonical’, consensus) protein. This is the (non-biological!) reason why, very often, the words gene or protein are used interchangeably to talk about genes or proteins which are in common between different species!

The ribosome translates the nucleotide sequence of the mRNA into an amino acid sequence and thus gives rise to a protein.

Proteins consist of a chain of amino acids. There are 20 different amino acids, also referred to by letters (G, E, N, I, A, L, ...) : 3 nucleotide 'letters' (codon) correspond to one amino acid 'letter': for example, the codon GTG codes for a V amino acid..
Biologists use the genetic code to translate a sequence of nucleotides into a sequence of amino acids.

In this representation of the genetic code, the amino acids are on the outer 2 rings of the circle and the the 1,2,3rd nucleotides are shown as the inner circles.

Once synthesized, the chain of amino acids folds upon itself to adopt a specific 3D structure, which is essential to the proper functioning of the protein.

Different representations of the BRAF protein's 3D structure, composed of 766 amino acids. The positions of the amino acids L, A, T, V, and K are shown.

Proteins and biological functions

Proteins come in different sizes and shapes and perform different biological functions.

Certain proteins are only found in certain species:
- The proteins involved in photosynthesis are only found in plants, algae and cyanobacteria;
- The proteins involved in vision are neither found in plants, nor algae, nor cyanobacteria...

Proteins that are found in a large number of species are involved in universal biological processes, such as protein synthesis or DNA replication.

These proteins (or the corresponding genes) are very useful for studying evolution!