
Understanding the Genome
By Carl Zimmer
The New York Times, November 11, 2008
Edited by Andy Ross
Genes are the fundamental unit of heredity. The word was coined
by the Danish geneticist Wilhelm Johanssen in 1909. By the 1960s, scientists had
a compelling definition of the gene.
A gene was a specific stretch of DNA containing the instructions to make a
protein molecule. To make a protein from a gene, a cell had to read it and build
a single-stranded copy known as a transcript out of RNA. A cluster of molecules
called a ribosome used the RNA as a template to build a protein. Every time a
cell divided, it replicated its genes.
This definition worked well. Biologists knew that genes could be shut off and
switched on when proteins clamped onto nearby bits of DNA. They also knew that a
few genes encoded RNA molecules that had other jobs, like helping build proteins
in the ribosome. But these exceptions did not seem too important.
More complications emerged in the 1980s and 1990s. Scientists discovered that
when a cell produces an RNA transcript, it cuts out huge chunks (called introns)
and saves only a few small remnants (called exons). Vast stretches of noncoding
DNA also lie between these protein-coding regions. The 21,000 protein-coding
genes in the human genome make up just 1.2 percent of the total.
In 2000, an international team of scientists finished the first rough draft of
the human genome. They identified the location of many protein-coding genes but
left the rest largely unexplored.
An effort called the Encyclopedia of DNA Elements, or Encode, aims to determine
the function of every piece of DNA in the human genome. Last summer they
published their results on 1 percent of the 3 billion letters (G, A, T, C) of
the genome. The Encode team expects to have full results next year.
In a process known as alternative splicing, a cell can select different
combinations of exons to make different transcripts. Studies show that almost
all genes are being spliced. The Encode team estimates that the average
protein-coding region produces 5.7 different transcripts. Different kinds of
cells appear to produce different transcripts from the same gene.
Cells often toss exons into transcripts from other genes. Those exons may come
from distant locations, even from different chromosomes. So we can no longer
think of genes as being single stretches of DNA at one physical location.
In a common flower called toadflax, most have white petals arranged in a
mirror-like symmetry. But some have yellow five-pointed stars. These two forms
of toadflax pass down their flower to their offspring. The difference between
their flowers comes down to the pattern of caps that are attached to their DNA.
These caps are known as methyl groups. The star-shaped toadflax have a distinct
pattern of caps on one gene involved in the development of flowers.
DNA is also wrapped around spool-like proteins called histones that can wind up
a stretch of DNA so that the cell cannot make transcripts from it. All of the
molecules that hang onto DNA, collectively known as epigenetic marks, are
essential for cells to take their final form in the body. As an embryo matures,
epigenetic marks in different cells are altered, and as a result they develop
into different tissues. Once the final pattern of epigenetic marks is laid down,
it clings stubbornly to cells. When cells divide, their descendants carry the
same set of marks.
In September, the National Institutes of Health began a $190 million program to
start mapping epigenetic marks on DNA in different tissues. Studies suggest that
when epigenetic marks are disturbed, cells may also be made more vulnerable to
cancer, because essential genes are shut off and genes that should be shut off
are turned on.
When an embryo begins to develop, the epigenetic marks that have accumulated on
the parental DNA are stripped away. The cells add a fresh set of epigenetic
marks in the same pattern that its parents had when they were embryos. This
process is very delicate. If an embryo experiences certain kinds of stress, it
may fail to lay down the right epigenetic marks.
In at least some cases, these new epigenetic patterns may be passed down to
future generations. In a paper to be published next year in The Quarterly Review
of Biology, Eva Jablonski and Gal Raz of Tel Aviv University in Israel assemble
a list of 101 cases in which a trait linked to an epigenetic change was passed
down through three generations.
Epigenetic marks are intriguing not just for their effects, but also for how
they are created. To place a cap of methyl groups on DNA, a cluster of proteins
is guided to the right spot by an RNA molecule.
Over the last decade, scientists have uncovered a number of new kinds of
noncoding RNA molecules. In 2006, Craig Mello of the University of Massachusetts
and Andrew Fire of Stanford University won the Nobel Prize for establishing that
small RNA molecules could silence genes by interfering with their transcription.
Early Encode results suggest that 93 percent of the genome produces RNA
transcripts. Encode scientists have identified the location of variations in DNA
that have been linked to common diseases like cancer. A third of those
variations were far from any protein-coding gene. But most of the transcripts
discovered by the Encode project may not do anything, says David Haussler, an
Encode team member at the University of California, Santa Cruz.
If a segment of DNA encodes some essential molecule, mutations will tend to
produce catastrophic damage. Natural selection will weed out most mutants. If a
segment of DNA does not do much, it can mutate without causing any harm. Over
millions of years, an essential piece of DNA will gather few mutations compared
with less important ones.
Only about 4 percent of the noncoding DNA in the human genome shows signs of
having experienced strong natural selection. Some of those segments may encode
RNA molecules and some may contain stretches that control neighboring genes.
Mutations can make it impossible for a cell to make a protein from a gene.
Scientists refer to such a disabled piece of DNA as a pseudogene. Yale
bioinformatician Mark Gerstein estimates that there are 10,000 to 20,000
pseudogenes in the human genome. Most of them are effectively dead.
Much of the baggage in the genome comes from invading viruses. Viruses
repeatedly infected our distant ancestors. Once these viruses invaded our
genomes, they sometimes made new copies of themselves, and the copies were
pasted in other spots in the genome. As these chunks of viral DNA hop around,
they can cause a lot of harm. But some of them have evolved to make RNA genes
that our cells use. Other stretches have evolved into sites where our proteins
can attach and switch on nearby genes.
These new concepts are moving the gene away from a physical snippet of DNA and
back to a more abstract definition.
AR Progress here is
the enabler for the most exciting scientific breakthroughs of the 21st century, when we
learn how to build genomes for entirely new lifeforms that can do useful things for us.

