Evolution And Diversification Of Biomolecules, From Cosmic Dust To Complex Biopolymers
From the atoms that build the cosmic dust of the universe, to the most complex organ in the world, the brain. There were many molecules that had to evolve to allow the life to exist, getting us to the outcome, a living world conformed of four principal biomolecules: amino acids, nucleic acids, carbohydrates and lipids. Each of these have a long history to tell us about how macromolecules have built up an entire complex world, where every cell and living organism are perfectly synchronized to work with each other in a virtuous cycle. Since the first blocks of life appeared, these have been through several changes and modifications. According to some theories, these first blocks were some primitive amino acids, which is supported due to the presence of these components even in asteroids. These molecules had to evolve, in the same way of a Darwinian evolution, change after change until get an advantage over others. However, to allow the life to begin, there was necessary a macromolecule which could have biocatalytic activity to be able to generate other molecules. In other theory, called RNA World, RNA is proposed to be the first molecule in appear which is supported with a finding of an RNA structure with catalytic activity to split other RNA molecules. Whatever the real origin was, it is accepted that every organism needs energy to survive and perform any activity. Our sun is the main source of energy, which plants have developed the ability to convert it into carbohydrates through photosynthesis. The smallest structures of carbohydrates are monosaccharides with a simple formula (CH2O) n, these are classified by its number of carbons (triose, tetrose, pentose and hexose) and by the functional group in the molecule either with aldehyde or ketone group. Despite of the many simple sugars present in nature, there are combinations of them to diversify its activity, its amount of energy and its storage capacity.
Glucose is the source of energy of cells, it is a monosaccharide composed of 6 carbons (hexose). Nonetheless, this structure can change because of its chiral characteristic, allose is the result of a change in its third carbon, being an epimer of glucose. A simple change of configuration, but two different molecules, glucose being in a higher amount in nature. Furthermore, a structure built of monosaccharides, a polysaccharide, can store more energy in form of starch or glycogen, in plants and animals respectively. The polymerization of monosaccharides is allowed by the formation of glycosidic bonds; starch is a polysaccharide built of amylose and amylopectin. Amylose is a chain of glucoses linked with a 1-4 α bond, meanwhile amylopectin is the result of glucoses bound through a 1-6 α bond to form the branched structure. In the same way, sugars can provide support, through a modification of the association of monosaccharides. Cellulose is a polymer for plant support which links glucose units through a 1-4 β glycosidic bond; the interaction of hydrogen bonds between chains forms microfibrils, making the structure high tensile strength, tasteless and insoluble in organic solvents.
As has been described, a higher number of monosaccharides turns out in a complex structure with a simple change in the way that these are assembled. However, higher modifications can be done to sugars; chitin, for insects and fungi support, is synthesized by joining N-acetyl-D-glucosamine, a molecule that resembles the structure of glucose, the change is made in its second carbon, which instead of a hydroxyl group contains an acetyl amine group. So, the synthesis of chitin is the same as described to cellulose, 1-4 beta glycosidic bond but with higher support due to the increased interaction between chains (hydrogen bonding.
Furthermore, polysaccharides have another role, provide a mechanism of molecular recognition for cells. Structural changes of sugars as sulphated carbohydrates, O-acetylation, O-methylation, N-deacetylation and N-sulfations occur after the formation of oligosaccharides. These structural modifications are known as carbohydrate post-glycosylational modifications (PGMs). Sialic acids (SA) are carbohydrates formed of nine carbons, found as component of internal and external membrane areas. An O-acetylation of polysaccharides in bacteria alters its immune response, acetylation of the hydroxyl group of the C9 in SA enhances the activation of the alternate pathway of complement. The binding and invasion of influenza C viruses occurs due to the recognition of the 9-O-acetylated SA in the membrane surface of the host cells, but this modification also prevents the invasion of A and B influenza viruses.
As we have seen, modifications of macromolecules diversify the function of a given structure, and carbohydrates produce many of them; however, now is time to talk about proteins, since its large library of modifications and structures. Proteins are polymers made from sequence of amino acids, organic compound with an amine and a carboxylic group, one in each side linked by its alpha carbon and what is most important, a side chain. The primary structure of a protein is given by the genetic information of an organism and is a linear sequence of amino acids linked by a peptide bond. However, the function of a protein is related with their folding, and this is directly governed by the side chains of the amino acids that build the polypeptide, which differ in size, structure, solubility and its ability to associate/ disassociate hydrogen atoms under different conditions. Secondary structures are formed by the non-covalent interactions between amino acids, giving as result alpha-helices, beta-strands, beta sheets and beta turns. Tertiary structures are clusters of secondary structures, and quaternary structures are the result of the interaction of different proteins. It seems that the genetic code is the only one responsible for the conformations of proteins, nevertheless, the proteome can be more complex that genome. As an example, human beings have roughly 30k genes, meanwhile its proteome is constituted by between 300k and 3 million different proteins. The diversity of these molecules is caused by posttranslational modifications, which can be glycosylation, acetylation, methylation, phosphorylation, myristoylation or ubiquitinylation.
Many modifications, but how do they affect to the function of proteins? For example, the removal of the first methionyl residue and the N-terminal acetylation of a protein, the most common modifications, provide stability and is related to the turnover of a protein.
Protein glycosylation which is common in eukaryotes, but not in prokaryotes, is the result of covalent bonding of oligosaccharides, which are classified in two groups: N and O-glycosylation, being attached to an asparagine or a threonine/serine residue, correspondingly. Half of all proteins are modified through the addition of a glycosidic group, and one third of these are N-glycosylations. It has been proved that removal of these modifications either reduce the secretion of active enzymes or inactivate them, and the deletion of individual N-glycosylations could cause function and activity changes.
Studies about the role of these modifications have shown that N-glycosylations of the M envelope protein of hepatitis B is necessary to viral protein folding and secretion. A family of proteins: human lipoprotein lipase, endothelial lipase and hepatic lipase, conserve two N-glycosylation sites, and the removal of any of the conserved glycosilation sites results in a decrease of its secretion. Finally, inactivation of enzymes caused by the removal of a glycosilation site has been shown by Fan et al. (1997), DPP-IV (Glycoprotein dipeptidyl peptidase IV) requires this modification to regulate protein transport, folding and its activity. The remotion of the glycosydic modification resulted in the loss of its catalytic function and an increased rate of degradation. It is worth to mention that there are reversible posttranslational modifications, such as methylation. Methylation is a simple modification by which methyl groups are added to some amino acids (histidine, alanine, lysine, arginine and asparagine). The N-terminal methylation of histone proteins has a remarkable importance due to its interaction with genetic information and its direct effects in transcription and chromosome maintenance. An increased hydrophobicity is caused by the addition of methyl groups; Histone 3 is known to be methylated in different amino acids, but this modification can be progressive, being mono, di or trimethylated and each one turns out in different changes in size and hydrophobicity level which allows to this protein to recruit different other complexes involved in transcriptional control. So far it has been described molecules which are modified by using other molecules, functional groups and enzymes, but where is all this information and how is it saved? The most important molecules for all living organisms are nucleic acids, structures formed by a phosphate group, a five-carbon sugar and a nitrogenous base. Remembering our RNA world mentioned in the first part, RNA was the first molecule to appear and store information. However, another molecule was needed to improve its capacity, a more stable molecule, our DNA, conformed of two antiparallel chains, instead of one. Both DNA and RNA are made from nucleic acids, but there are specific modifications that allow them to have different tasks. First, the sugar of RNA is a ribose, but for DNA there was a simple change in its sugar, the second Carbon of ribose lacks one oxygen atom, reason why it is called, deoxyribose. Second, RNA nitrogenous bases are adenine, cytosine, guanine and uracil, DNA uses the same but instead of uracil there is a thymine. There is only one modification that changed our primitive RNA world to a DNA world, the methylation of uracil.
Both modifications contribute to get a more stable molecule, the dihydroxylation of nucleotides in DNA is used to make the phosphodiester bonds less likely to hydrolyze and get damaged from UV radiation. The addition of a methyl group, with its hydrophobic characteristic, in thymine contributes to the correct paring between sugar bases. These modifications were not made in purpose, evolution of the primitive macromolecules favored them due to their stability to carry our genetic information. Nucleic acids also have a broad spectrum of chemical modifications, which play important roles in gene expression. DNA methylation, for example, contributes with genomic imprinting, inactivation of the X-chromosome and suppression of repetitive elements. A study from Li et al. (1993), revealed a reactivation of inactive genes after the silencing (knock out) of Dnmt1 gene (DNA methyltransferases), suggesting that methyl modifications of DNA silence expression of genes. One of the first nucleic acid modification studied was the 5-methylcytosine (5mC), which results from the covalent addition of a methyl group to the fifth carbon of the cytosine ring, usually occurring in CpG sites, regions enriched in Cytosine and Guanine linked only by a phosphate group. Other modification is the N6 -methyladenine (6mA), present in prokaryotes, plays an important role in DNA repair, replication and as a marker to prevent host genome digestion; bacteria labels its own genome with 6mA and breaks down foreign unmethylated DNA. These are some examples of how modification of nucleic acids can determine the expression, function or subsequent action of proteins; cells have a balance between methylations and demethylations to have a correct function, any disturbance with this balance is associated with diseases, such as cancer.
Finally, it is worth to mention that interaction of modified macromolecules allows to diversify biological structures and functions. However, further studies are required to get a complete view and understanding of other different modifications that are still outside waiting to be discovered.