Understanding the Genome

By Carl Zimmer
The New York Times, November 11, 2008

Edited by Andy Ross

Genes are the fundamental unit of heredity. The word was coined by the Danish geneticist Wilhelm Johanssen in 1909. By the 1960s, scientists had a compelling definition of the gene.

A gene was a specific stretch of DNA containing the instructions to make a protein molecule. To make a protein from a gene, a cell had to read it and build a single-stranded copy known as a transcript out of RNA. A cluster of molecules called a ribosome used the RNA as a template to build a protein. Every time a cell divided, it replicated its genes.

This definition worked well. Biologists knew that genes could be shut off and switched on when proteins clamped onto nearby bits of DNA. They also knew that a few genes encoded RNA molecules that had other jobs, like helping build proteins in the ribosome. But these exceptions did not seem too important.

More complications emerged in the 1980s and 1990s. Scientists discovered that when a cell produces an RNA transcript, it cuts out huge chunks (called introns) and saves only a few small remnants (called exons). Vast stretches of noncoding DNA also lie between these protein-coding regions. The 21,000 protein-coding genes in the human genome make up just 1.2 percent of the total.

In 2000, an international team of scientists finished the first rough draft of the human genome. They identified the location of many protein-coding genes but left the rest largely unexplored.

An effort called the Encyclopedia of DNA Elements, or Encode, aims to determine the function of every piece of DNA in the human genome. Last summer they published their results on 1 percent of the 3 billion letters (G, A, T, C) of the genome. The Encode team expects to have full results next year.

In a process known as alternative splicing, a cell can select different combinations of exons to make different transcripts. Studies show that almost all genes are being spliced. The Encode team estimates that the average protein-coding region produces 5.7 different transcripts. Different kinds of cells appear to produce different transcripts from the same gene.

Cells often toss exons into transcripts from other genes. Those exons may come from distant locations, even from different chromosomes. So we can no longer think of genes as being single stretches of DNA at one physical location.

In a common flower called toadflax, most have white petals arranged in a mirror-like symmetry. But some have yellow five-pointed stars. These two forms of toadflax pass down their flower to their offspring. The difference between their flowers comes down to the pattern of caps that are attached to their DNA. These caps are known as methyl groups. The star-shaped toadflax have a distinct pattern of caps on one gene involved in the development of flowers.

DNA is also wrapped around spool-like proteins called histones that can wind up a stretch of DNA so that the cell cannot make transcripts from it. All of the molecules that hang onto DNA, collectively known as epigenetic marks, are essential for cells to take their final form in the body. As an embryo matures, epigenetic marks in different cells are altered, and as a result they develop into different tissues. Once the final pattern of epigenetic marks is laid down, it clings stubbornly to cells. When cells divide, their descendants carry the same set of marks.

In September, the National Institutes of Health began a $190 million program to start mapping epigenetic marks on DNA in different tissues. Studies suggest that when epigenetic marks are disturbed, cells may also be made more vulnerable to cancer, because essential genes are shut off and genes that should be shut off are turned on.

When an embryo begins to develop, the epigenetic marks that have accumulated on the parental DNA are stripped away. The cells add a fresh set of epigenetic marks in the same pattern that its parents had when they were embryos. This process is very delicate. If an embryo experiences certain kinds of stress, it may fail to lay down the right epigenetic marks.

In at least some cases, these new epigenetic patterns may be passed down to future generations. In a paper to be published next year in The Quarterly Review of Biology, Eva Jablonski and Gal Raz of Tel Aviv University in Israel assemble a list of 101 cases in which a trait linked to an epigenetic change was passed down through three generations.

Epigenetic marks are intriguing not just for their effects, but also for how they are created. To place a cap of methyl groups on DNA, a cluster of proteins is guided to the right spot by an RNA molecule.

Over the last decade, scientists have uncovered a number of new kinds of noncoding RNA molecules. In 2006, Craig Mello of the University of Massachusetts and Andrew Fire of Stanford University won the Nobel Prize for establishing that small RNA molecules could silence genes by interfering with their transcription.

Early Encode results suggest that 93 percent of the genome produces RNA transcripts. Encode scientists have identified the location of variations in DNA that have been linked to common diseases like cancer. A third of those variations were far from any protein-coding gene. But most of the transcripts discovered by the Encode project may not do anything, says David Haussler, an Encode team member at the University of California, Santa Cruz.

If a segment of DNA encodes some essential molecule, mutations will tend to produce catastrophic damage. Natural selection will weed out most mutants. If a segment of DNA does not do much, it can mutate without causing any harm. Over millions of years, an essential piece of DNA will gather few mutations compared with less important ones.

Only about 4 percent of the noncoding DNA in the human genome shows signs of having experienced strong natural selection. Some of those segments may encode RNA molecules and some may contain stretches that control neighboring genes.

Mutations can make it impossible for a cell to make a protein from a gene. Scientists refer to such a disabled piece of DNA as a pseudogene. Yale bioinformatician Mark Gerstein estimates that there are 10,000 to 20,000 pseudogenes in the human genome. Most of them are effectively dead.

Much of the baggage in the genome comes from invading viruses. Viruses repeatedly infected our distant ancestors. Once these viruses invaded our genomes, they sometimes made new copies of themselves, and the copies were pasted in other spots in the genome. As these chunks of viral DNA hop around, they can cause a lot of harm. But some of them have evolved to make RNA genes that our cells use. Other stretches have evolved into sites where our proteins can attach and switch on nearby genes.

These new concepts are moving the gene away from a physical snippet of DNA and back to a more abstract definition.
 

AR  Progress here is the enabler for the most exciting scientific breakthroughs of the 21st century, when we learn how to build genomes for entirely new lifeforms that can do useful things for us.