From Peas to Proteome


“What more powerful form of study of mankind could there be than to read our own instruction book?” – Dr. Francis Collins, Director of the National Institutes of Health on the completion of the Human Genome Project.

The instruction book that Dr. Collins was commenting on is our human genome. We are our genome. Each of us is unique because of our genome – because of our DNA. The Human Genome Project, which culminated in 2001, aimed to “map” human DNA, gathering a cornucopia of information about the very basic thing that makes each of us unique. But the completion of the human genome was just the beginning.

The Human Genome Project provided information about ~20,500 genes that are coded by the human genome. It also gave scientists the ability to identify disease causing mutations, but not the capacity to develop cures.   

A recent article, published in the Journal Nature, describes a database containing a comprehensive list of all the proteins that are coded by the human genome called the Human Proteome. Think of the human genome as an encrypted file containing valuable information, which has now been decrypted, and the data stored in it is the Human Proteome. Information about these proteins may be the key to disease prevention and treatment.  

All of this might be a little confusing, so let me give you a little bit of background on how genes and proteins are connected.

You know how every word in the English language is made up of a combination of one or more of the 26 letters in the alphabet? Well, in the same way, DNA (deoxyribonucleic acid) has its own language of four bases/nucleotides: Adenine (A), Guanine (G), Thymine (T) and Cytosine (C). Human DNA contains 3.1 billion base pairs that are condensed and packed into structures called chromosomes, of which all humans have 23 pairs.

When nucleotides are present in a certain order, they code for a protein and the molecular coding unit is called a gene. There are two steps to retrieving the protein codes stored in the genetic sequence.

Step 1: Transcription. DNA is converted to a different language based on certain rules and the resulting molecule is called RNA. For example, TAC is converted to AUG and CGC is converted to GCG.

Step 2: Translation. Three nucleotides in a RNA sequence form a codon, which is read and deciphered into an amino acid based on the RNA sequence.  AUG codes for the amino acid Methionine and CAC codes for the amino acid Histidine. Strings of amino acid form proteins. Every protein has a unique amino acid sequence.

By now, you might have realized that if something is wrong in the genetic sequence (mutations), it affects the amino acid that it codes for.  

For example, in the image above, if the 20th base pair ‘T’ is replaced by a ‘A’ the DNA sequence would be CAC, RNA sequence would be ‘GUG’ and the amino acid that would be added to the chain would be Valine instead of Glutamic acid. The partial sequence shown above is for the β-hemoglobin gene and replacement of T with A causes sickle cell anemia.

The Human Proteome Project is an important milestone in understanding these gene-protein pathways in humans. So, how are researchers planning to use all of this data?

The Human Proteome Project is currently a database of ~19,000 proteins that are coded by the genes under normal “non-disease” conditions. This data set will be a useful resource when studying how proteins are expressed under disease conditions, which will further advance development of disease treatments. It will also help identify biomarkers for disease conditions. For example, the blood level of alanine aminotransferase (ALT), a biomarker, is an indicator of liver health.  Higher levels of this enzyme in the blood indicates liver damage.

Genetic mutations can cause medical disorders but, in some cases, mutations can also confer health benefits. Let me give you an example; let’s say Person X is born with a mutation in the SLC30A8 gene. Research has shown that people born with this loss-of-function mutation have a greatly decreased risk of type-2 diabetes. With new knowledge about proteins and their corresponding coding genes, therapeutics could be developed for patients with type-2 diabetes by blocking the protein coded by the SLC30A8 gene.

Our understanding of genetics has come a long way since 1865, when Gregor Mendel developed the principles of inheritance that described the transmission of genetic traits through generations from his experiments with pea plant breeding.

With all the advancements in the field of genomics and biomedical research, tailored therapies or a personalized cure for every genome seem highly plausible as the future of medicine.



Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <div> <br>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Refresh Type the characters you see in this picture. Type the characters you see in the picture; if you can't read them, submit the form and a new image will be generated. Not case sensitive.  Switch to audio verification.