Table of Contents
Foreword and Preface
History of the Human Genome Project
Scientific Five-Year Goals of the U.S. Human Genome Project
Highlights of Research Progress
Activities Addressing Ethical, Legal, and Social Issues Related to
Human Genome Project Data
Technology Transfer and Industrial Collaboration
Human Genome Center Research Narratives
Lawrence Berkeley Laboratory
Lawrence Livermore National Laboratory
Los Alamos National Laboratory
Program Management Infrastructure
DOE OHER Mission
Program Management Task Group
Human Genome Coordinating Committee
Human Genome Management Information System
Human Genome Distinguished Postdoctoral Fellowships
Joint DOE-NIH Activities
Joint Mapping Working Group
Joint Informatics Task Force
Joint Sequencing Working Group
Joint Working Group on Ethical, Legal, and Social Issues
Joint Working Group on the Mouse
Other U.S. Genome Research
U.S. Department of Agriculture
National Science Foundation
Howard Hughes Medical Institute
HUGO: Worldwide Genome Research Coordination
UNESCO: Promoting the Interests of Developing Countries
A. Primer on Molecular Genetics
B. Conferences, Meetings, and Workshops Sponsored by DOE
C. Members of the DOE Health and Environmental Research
D. Members of the DOE-NIH Joint Working Groups
Index to Principal and Coinvestigators Listed in Abstracts
Acquiring complete knowledge of the organization, structure, and
function of the human genome_the master blueprint of each of
us_is the broad aim of the Human Genome Project. It is a new kind
of program in biology, both in its size and focus on a limited
set of goals and in its dependence on the development and use of
technology. The coordinated U.S. Human Genome Project was
officially initiated by the Department of Energy (DOE) Office of
Health and Environmental Research and the National Institutes of
Health (NIH) National Center for Human Genome Research (NCHGR) in
FY 1991 with the publication in April 1990 of Understanding Our
Genetic Inheritance; The U.S. Human Genome Project: The First
Five Years 1991-1995. The DOE effort, which began very modestly
almost 4 years before, is now over 5 years old. Taking stock of
what has been done and what remains to be done is particularly
appropriate at this time.
That the ambitious scientific goal of the Human Genome Project
can now be imagined is the result of the revolution occurring in
biology during the last 20 years. Modern biological science has
achieved a profound but still quite incomplete level of
understanding of how the diversity of all living things is
determined. This insight, along with scientific and technical
advances in other fields, has brought unprecedented power both in
being able to analyze and manipulate genetic structures and to
use and store large quantities of genetic information. DOE is
uniquely positioned to bring together expertise in physics,
chemistry, engineering, and computer science to help solve
fundamental biological problems and to exploit exciting
opportunities presented by the Human Genome Project. Genome
research will also contribute to the department's role in
providing the scientific foundation for understanding the health
effects of radiation and of chemical insults to the genome.
The DOE program stresses mapping, the development of sequencing
technologies and instrumentation, and informatics. Informatics
refers to computational approaches in acquiring, storing,
distributing, analyzing, and manipulating vast amounts of mapping
and sequence data that will result from the project. Another
important program component studies the ethical, legal, and
social issues arising from use of the generated data,
particularly in the privacy and confidentiality of genetic
information. Cutting across all DOE biological and environmental
research programs are several science education activities.
The Human Genome Project is a closely cooperative activity
between NIH and DOE. NCHGR is an important and essential
participant. Internationally, the formation of the Human Genome
Organization and the establishment of national genome projects by
an increasing number of countries indicate the fascination and
promise of this effort on the collective imaginations of many
nations. In addition to the inherent excitement about increased
knowledge of human life, the project offers the promise of many
new opportunities for benefiting humanity through the development
of new diagnostics, pharmaceuticals, and therapies for a
multitude of human diseases; a wide range of improvements will
flow from other biotechnology advances. Further expected benefits
include improved risk assessment for individuals and populations
exposed to agents that impact genetic material, as well as
possible applications of the data to environmental and
To be successful, the program must continue to focus on clear
objectives for mapping and sequencing and to incorporate the flow
of technological developments into the efforts of all working
laboratories. Strategies must be planned carefully and in a
comprehensive fashion as the next phase begins, in which mapping
and sequencing results proliferate and technologies mature.
Planning must be project-wide and include interagency planning at
This report describes the status of the DOE Human Genome Program
and its accomplishments to date. Research highlights are noted
from the program as a whole and from the three principal DOE
human genome centers at Lawrence Berkeley Laboratory, Lawrence
Livermore National Laboratory, and Los Alamos National
Laboratory. These national laboratory facilities of DOE have been
especially successful because they are organized to focus
efforts, foster interdisciplinary projects, and use advanced
technologies, some developed for other purposes, toward program
goals. Essential work is also reported from 41 different research
universities. Remarkable progress has been made in advanced
instrumentation and informatics.
A further indication of the increasing development of the DOE
program is the simple statistic that the 1989-90 report had 157
pages and included 57 abstracts of work involving 211 scientists.
The current program report contains over 240 pages and includes
more than 150 abstracts of work involving over 400 investigators,
essentially a doubling of DOE program size.
The Human Genome Project ultimately will create scientific
resources for the next wave of advances in biology and medicine.
As the project is completed, accomplishments will dwarf those
that have occurred in the biological sciences since the advent of
recombinant DNA technologies. By the same token, the ethical and
social consequences of the uses of this new knowledge must be
considered as the knowledge is acquired; if this knowledge is
responsibly obtained and applied, the next decade of biological
research will be history's most fruitful and rewarding by any
David J. Galas, Associate Director
Office of Health and Environmental Research
Office of Energy Research
U.S. Department of Energy
This is the third report summarizing the Department of Energy
(DOE) Human Genome Program, its content, progress, and
accomplishments. Since the program's conception in 1986 and
initiation in 1987 by the DOE Office of Health and Environmental
Research (OHER), its broad objectives have rapidly gained both
national and international support. The program has made
important strides in the development and application of
technologies and tools that are required for the cost-effective
characterization of the molecular nature of the human genome.
This country's Human Genome Project is jointly administered by
OHER and the National Center for Human Genome Research of the
National Institutes of Health. A successful effort to
characterize the molecular nature of human inheritance will
require continuing international cooperation involving scientists
from many countries. A number of other nations have begun
substantial efforts to map and sequence the human genome and
those of key model organisms. Although intellectual property
issues threaten some aspects of international cooperation,
increasing exchange of information has led to more involvement of
the international community in discovery, acceleration of the
pace of the research, and increased cost-effectiveness.
International communication is facilitated by regular meetings to
update the maps of individual chromosomes and by contributions to
databases such as the Genome Data Base and nucleic acid sequence
databanks. Through such databases a worldwide data aggregation
and distribution system is being developed to exchange
information regarding the genome.
Aided by funding from the Human Genome Project, serious study is
under way on ethical, legal, and social issues that are becoming
more urgent because of the rapid growth in knowledge of human
genetics. It is important to develop and disseminate deeper and
more widespread understanding of these dynamic issues and of the
choices available for families, the law, and society. An educated
public is required to make intelligent choices in this area. The
national genome project is now the largest provider of funds for
study of such issues.
A key to the long-term success of the program is the initial
phase of intensive resource and technology development that
requires input and involvement from many scientific and
engineering disciplines. Exciting contributions have already been
made to biomedical knowledge and biotechnology, and such advances
are certain to continue at an ever-increasing rate. Announcements
of discovery of important disease genes have become commonplace.
Within 10 years nearly all the perhaps 100,000 genes that make up
the human genome are likely to be found. Within 15 years the
program is expected to culminate in a reference DNA sequence of
the entire genome.
Never has such a mass of data flowed into biology and medicine.
An understanding of how genetic variations account for much of
the richness and adventure of human diversity will be greatly
increased. More practically, there can be little doubt of
tremendous payoffs in terms of diagnoses and, ultimately,
specific therapies for many human diseases.
Moreover, new technologies and rapidly developing analytical
tools to characterize the human genome will have widespread
impact beyond human health. They will find application in
revealing the genetic inheritance of many organisms of potential
scientific and commercial interest and will provide an important
stimulus to broaden and deepen the impact of modern biology in
areas such as energy, environmental protection and waste
treatment, agriculture, and the materials sciences.
Of particular importance is the facile access to proteins that
rapidly follows discovery of their genes. As a result of genome
projects, we will soon be in a position to begin the systematic
large-scale characterization of proteins and their structure. The
interplay of molecular biology, structural studies,
high-performance computing, and advanced molecular graphics will
certainly lead to an understanding of macromolecular
structure-function relationships. The scientific and economic
implications of such a predictive understanding cannot be
overestimated. It is the key to full realization of the potential
of modern biology.
Intense X-ray light and neutrons produced by unique, large, and
expensive machines (synchrotrons and reactors) at DOE
laboratories are important national resources for the
determination of biological structure and, hence, for the
national effort in biotechnology. A central goal of OHER is to
provide access to these machines by making facilities and
technical support available to structural-biology users, a need
that has been projected to increase tenfold in the next several
Finally, as Robert Sinsheimer elegantly pointed out in The FASEB
Journal (November 1991), the Human Genome Project is an epic
venture of discovery that will in time clarify many endlessly and
fruitlessly debated mysteries of human nature. With this project
we are launched upon a new stage of the age-old quest to
illuminate the record of the human past_the prehistory of our
species as recorded in the genetic script or blueprint for our
being. When complete, the project will have provided us with an
unprecedented resource_the complete text of our genetic
endowment. It will be seen as a turning point in human history.
David A. Smith, Director
Health Effects and Life Sciences Research Division
Office of Health and Environmental Research
Office of Energy Research
U.S. Department of Energy
The DOE Office of Health and Environmental Research gratefully
acknowledges the contributions made by genome research grantees
and contractors in submitting abstracts, photographs, captions,
and narratives. The Human Genome Management Information System at
Oak Ridge National Laboratory (managed by Martin Marietta Energy
Systems, Inc., for the U.S. Department of Energy under contract
DE-AC05-84OR21400) collected and organized the information,
prepared the manuscript, and implemented the design and
production of this publication.
The U.S. Human Genome Project is the national coordinated 15-year
effort to characterize all the human genetic material_the
genome_by improving existing human genetic maps, constructing
physical maps of entire chromosomes, and ultimately determining
the complete sequence of the deoxyribonucleic acid (DNA) subunits
in the human genome. Parallel studies are being carried out on
selected model organisms to facilitate the interpretation of
human gene function. The ultimate goal of the U.S. project is to
discover all of the more than 100,000 human genes and render them
accessible for further biological study.
Current technology could probably be used to attain the
objectives of the Human Genome Project, but the cost and time
required would be unacceptable. For this reason, a major feature
of the first 10 years of the project is to optimize existing
methods and develop new technology to increase efficiency in DNA
mapping and sequencing by 1 or 2 orders of magnitude. The genome
will eventually be sequenced using continually evolving
technologies and revolutionary methods not in existence today.
Information obtained as part of the Human Genome Project will
dramatically change almost all biological and medical research
and dwarf the catalog of current genetic knowledge. In addition,
both the methods and the data developed as part of the project
are likely to benefit investigations of many other genomes,
including a large number of commercially important plants and
For more information on the science of genomics, see Appendix A,
"Primer on Molecular Genetics," p. 191. Terms are defined in the
Glossary, p. 229. An acronym list is on the inside back cover.
History of the DOE Human Genome Program
A brief history of the U.S. Department of Energy (DOE) Human
Genome Program will be useful in a discussion of the objectives
of the DOE program as well as those of the collaborative U.S.
Human Genome Project. The Office of Health and Environmental
Research (OHER) of DOE and its predecessor agencies_the Atomic
Energy Commission and the Energy Research and Development
Administration_have long sponsored research into genetics, both
in microbial systems and in mammals, including basic studies on
genome structure, replication, damage, and repair and the
consequences of genetic mutations.
In 1984, OHER and the International Commission on Protection
Against Environmental Mutagens and Carcinogens cosponsored a
conference in Alta, Utah, which highlighted the growing roles of
recombinant DNA technologies. Substantial portions of the
meeting's proceedings were incorporated into the Congressional
Office of Technology Assessment report, Technologies for
Detecting Heritable Mutations in Humans, in which the value of a
reference sequence of the human genome was recognized.
Acquisition of such a reference sequence was, however, far beyond
the capabilities of biomedical research resources and
infrastructure existing at that time. Although the small genomes
of several microbes had been mapped or partially sequenced, the
detailed mapping and eventual sequencing of 24 distinct human
chromosomes (22 autosomes and the sex chromosomes X and Y) that
together comprise an estimated 3 billion subunits was a task some
DOE OHER was already engaged in several multidisciplinary
projects contributing to the nation's biomedical capabilities,
including the GenBankr DNA sequence repository, which was
initiated and sustained by DOE computer and data-management
expertise. Several major user facilities supporting
microstructure research were developed and are maintained by DOE
(see box, p. 55). Unique chromosome-processing resources and
capabilities were in place at Los Alamos National Laboratory and
Lawrence Livermore National Laboratory. Among these were the
fluorescence-activated cell sorter (FACS) systems to purify human
chromosomes within the National Laboratory Gene Library Project
for the production of libraries of DNA clones. The availability
of these monochromosomal libraries opened an important path_a
practical means of subdividing the huge total genome into 24 much
more manageable components.
With these capabilities, OHER began in 1986 to consider the
feasibility of a dedicated human genome program. Leading
scientists were invited to the March 1986 international
conference at Santa Fe, New Mexico, to assess the desirability
and feasibility of implementing such a project. With virtual
unanimity, participants agreed that ordering and eventually
sequencing DNA clones representing the human genome were
desirable and feasible goals. With the receipt of this
enthusiastic response, OHER initiated several pilot projects.
Program guidance was further sought from the DOE Health Effects
Research Advisory Committee (HERAC, see Appendix C for a list of
The HERAC Recommendation. The April 1987 HERAC report recommended
that DOE and the nation commit to a large, multidisciplinary,
scientific, and technological undertaking to map and sequence the
human genome. DOE was particularly well suited to focus on
resource and technology development, the report noted; HERAC
further recommended a leadership role for DOE because of its
demonstrated expertise in managing complex and long-term
multidisciplinary projects involving both the development of new
technologies and the coordination of efforts in industries,
universities, and its own laboratories. Evolution of the nation's
Human Genome Project further benefited from a 1988 study by the
National Research Council (NRC) entitled Mapping and Sequencing
the Human Genome, which recommended that the United States
support this research effort and presented an outline for a
The National Institutes of Health (NIH) was a necessary
participant in the large-scale effort to map and sequence the
human genome because of its long history of support for
biomedical research and its vast community of scientists. This
was confirmed by the NRC report, which recommended a major role
for NIH. In 1987, under the leadership of Director James
Wyngaarden, NIH established the Office of Genome Research in the
Director's Office. In 1989 this office became the National Center
for Human Genome Research (NCHGR), directed by James D. Watson.
After Watson's resignation in April 1992, Michael Gottesman was
appointed NCHGR Acting Director.
In addition to extramural support for research projects in
physical mapping and the development of index linkage markers and
technology, NIH also provides support for genetic mapping based
on family studies and, following NRC recommendations, for studies
on several relevant model organisms. DOE-supported genome
research is focused almost exclusively on the human genome
through support of large-scale physical mapping, resource and
instrumentation technology development, and improvements in
computational and database capabilities and research
infrastructure. A significant portion of the DOE Human Genome
Program is allocated to the DOE national laboratories.
In several important areas, DOE and NIH cooperate to support
critical resources such as the Genome Data Base (GDB) at Johns
Hopkins University. Cofunded since 1991 as the central
international repository of human chromosome mapping data, GDB is
expected to receive supporting funds from other nations. DOE and
NIH also cooperate to support joint workshops; a number of
ethical, legal, and social issues projects; and the Human Genome
Joint task groups under the DOE-NIH Joint Subcommittee on the
Human Genome meet periodically to define program needs and
develop recommendations for their parent DOE and NIH committees.
OHER and NCHGR cosponsor workshops and meetings of the task
groups on mapping; sequencing; informatics; the use of the mouse
as a mammalian model; and_in a departure from most scientific
programs_ethical, legal, and social issues related to data
produced in the project.
Many other highlights of the DOE OHER program follow in the
succeeding sections of this report, including reports from the
human genome centers; further details of program infrastructure,
management, and coordination; resource allocation; and abstracts
of individual research projects.
Scientific Five-Year Goals of the U.S. Human Genome Project from
the NIH-DOE Five Year Plan* [Implemented October 1, 1990 (FY
1. Mapping and Sequencing the Human Genome
Complete a fully connected human genetic map with markers
spaced an average of 2 to 5 cM apart. Identify each marker
by a sequence tagged site (STS).
Assemble STS maps of all human chromosomes with the goal of
having markers spaced at approximately 100,000-bp intervals.
Generate overlapping sets of cloned DNA or closely spaced
unambiguously ordered markers with continuity over lengths
of 2 Mb for large parts of the human genome.
Improve current and develop new methods for DNA sequencing
that will allow large-scale sequencing of DNA at a cost of
$0.50 per base pair.
Determine the sequence of an aggregate of 10 Mb of human DNA
in large continuous stretches in the course of technology
development and validation.
2. Model Organisms
Prepare a mouse genome genetic map based on DNA markers.
Start physical mapping on one or two chromosomes.
Sequence an aggregate of about 20 Mb of DNA from a variety
of model organisms, focusing on stretches that are 1 Mb
long, in the course of developing and validating new and
improved DNA sequencing technology.
3. Informatics_Data Collection and Analysis
Develop effective software and database designs to support
large-scale mapping and sequencing projects.
Create database tools that provide easy access to up-to-date
physical mapping, genetic mapping, chromosome mapping, and
sequencing information and allow ready comparison of the
data in these several data sets.
Develop algorithms and analytical tools that can be used in
the interpretation of genomic information.
4. Ethical, Legal, and Social Considerations
Develop programs directed toward understanding the ethical,
legal, and social implications of Human Genome Project data.
Identify and define the major issues and develop initial
policy options to address them.
5. Research Training
Support research training of pre- and postdoctoral fellows
starting in FY 1990. Increase the number of trainees
supported until a steady state of about 600 per year is
reached by the fifth year.
Examine the need for other types of research training in the
next year (FY 1991).
6. Technology Development
Support automated instrumentation and innovative and
high-risk technological developments as well as improvements
in current technology to meet the needs of the genome
project as a whole.
7. Technology Transfer
Enhance the already close working relationships with
Encourage and facilitate the transfer of technologies and of
medically important information to the medical community.
*Understanding Our Genetic Inheritance; The U.S. Human Genome
Project: The First Five Years FY 1991-1995, DOE/ER-0452P, U.S.
Department of Health and Human Services and U.S. Department of
Energy, April 1990.
Highlights of Research Progress
A major goal for DOE and NIH, as stated in the Five Year Plan (p.
5) for the Human Genome Project officially implemented in FY
1991, is to develop refined physical maps of chromosomes.
Increasingly detailed maps will provide biomedical scientists
with rapid access to important areas on chromosomes through their
specific markers and ordered sets of DNA clones.
Page numbers for research abstracts of investigators noted in
parentheses can be located in the "Index to Principal and
Coinvestigators Listed in Abstracts," p. 243.
Physical Map Construction
DOE sponsors both extensive physical mapping studies and
supportive resource and technology development. Physical mapping
of chromosomes 5, 11, 16, 17, 19, 21, 22, and X has been or is
being supported directly. Increasingly detailed maps facilitate
access to important chromosomal loci through their constituent
markers and ordered DNA clones.
The earliest concerted mapping efforts began on chromosome 16 at
the Los Alamos National Laboratory (LANL) Center for Human Genome
Studies and on chromosome 19 at the Lawrence Livermore National
Laboratory (LLNL) Human Genome Center. These efforts have
achieved excellent progress (see detailed narratives, pp. 46 and
36, respectively) through the development of effective
multidisciplinary teams and efficient methods for generating
clone "fingerprints." The fingerprints provide data for
recognizing clone pairs that overlap, facilitating the
construction of increasingly larger sets of overlapping clones,
called contigs. Approximately 90% of chromosomes 16 and 19 is now
represented by fingerprinted clones, and multiclone contigs span
at least 80% of their length. Initial contig assembly
methodologies are complemented by strategies designed to finish
the physical maps and align them with genetic maps. This
progress, together with the many contributions from other
research groups (presented in the Abstracts section of this
report), shows that resources and technologies required to
achieve the mapping goals stated in the Five Year Plan are
rapidly being realized.
National Laboratory Gene Library Project (NLGLP)
Among the resources most crucial to mapping progress are the
libraries of clones representing each of the human chromosomes.
Their availability reduces the total genome map ping effort to 24
smaller, more-manageable mapping projects. This
chromosome-specific clone library production from physically
purified chromosomes depends on the unique LANL and LLNL
chromosome-sorting facilities maintained through the DOE NLGLP.
These library resources are either distributed from the
laboratories or through the American Type Culture Collection. As
of December 1991 over 620 chromosome-specific libraries were
distributed as resources for entire chromosome mapping efforts
and for more-selective gene hunts. Current library production is
focused on the needs of the major chromosome mapping projects (L.
Deaven, LANL; P. de Jong, LLNL).
Recombinant Clone Types
Other biological resources are also being developed to further
chromosome mapping progress. These resources include several
useful genetic elements or recombinant DNAs and their cellular
hosts. The largest elements are the intact, single human
chromosomes maintained in somatic cell hybrids, such as single
human chromosome/hamster-host cell hybrids. They are valuable for
sorting out the human chromosomes for construction of
single-chromosome libraries. Insert sizes of recombinants range
from millions to a few hundred bases. Recombinant cosmid clones
with 40- to 50-kb human DNA inserts predominated in the early
contig-building efforts and continue to be a basic resource
(refer to Abstracts: Resource Development, p. 82).
Monochromosomal Yeast Artificial Chromosomes (YACs)
YACs with inserts of 200 kb and larger, whose initial development
was pioneered with NIH support, are now widely used in physical
mapping projects. The recently developed capability to produce
YACs from flow-sorted chromosomes is making available
mono-chromosomal YAC libraries to speed mapping projects (M.
McCormick, L. Deaven, and R. Moyzis, LANL). These libraries are
made up of YACs containing human DNA inserts. This contrasts with
libraries made from somatic cell hybrids, which are made up of
YACs that contain mostly nonhuman DNA inserts.
Clone Library Array and Analysis
When user laboratories maintain clone libraries in the same
arrayed-format addressing system, the information obtained from
these libraries is maximized because the accumulated data from
different laboratories can be readily combined. The tedious task
of arraying thousands of DNA clones has been greatly alleviated
through the development and implementation of automated or
robotic processing systems (T. Beugelsdijk and P. Medvick, LANL;
J. Jaklevic, Lawrence Berkeley Laboratory (LBL); and A. Olsen,
LLNL). These systems are being increasingly utilized in clone
analyses and in comparisons needed for overlap detection.
Multiplexed Clone Overlap Detection
Overlap detection of sequence homologies by DNA hybridization is
speeded by multiplexing strategies in which the processing of
pools of clones or their derivative probes replaces the more
tedious analysis of individual clones. Multiplexing was first
implemented by the chromosome 11 mapping group (G. Evans, Salk
Institute for Biological Studies). Several second-generation
multiplexing schemes are now being implemented to speed overlap
detection both within libraries and between members of different
types of libraries (J. F. Cheng, LBL; P. de Jong, LLNL).
Messenger RNA/cDNAs Used To Generate Sequence Tagged Sites (STSs)
STS marking of DNA clones provides a common language for uniting
the results obtained with different types of recombinant DNAs and
varied approaches to map generation. An STS is a short, unique
DNA sequence (generally 100 to 300 bp) that distinguishes a
chromosomal locus. The STS segment can be selectively amplified
within the entire genome by the polymerase chain reaction to
provide an identifying tag for any DNA clone containing the site.
DOE is emphasizing the use of STSs for expressed genes, as
represented by their derivative cDNAs. Mapping these STSs onto
contigs and to their chromosomal loci is thus rapidly placing
genes on the developing chromosome maps (refer to Abstracts:
Resource Development, p. 82).
Chromosome microdissection can facilitate region-specific mapping
efforts, such as the localized ordering of clones on the much
longer chromosomes, by identifying sets of clones derived from
the specific region. Region-specific probes can also serve in the
identification of locally expressed genes by selectively
displaying their counterparts within complex cDNA libraries
(F.-T. Kao, Eleanor Roosevelt Institute).
Libraries of Hybrid Somatic Cells with Partial Human Chromosomes
Aberrant chromosomes arising from rearrangement processes can be
moved into host rodent cells, providing for the maintenance of a
human subchromosomal segment. A large hybrid set has been
assembled for chromosome 16 (G. Sutherland, Adelaide Children's
Hospital, South Australia). These partial chromosomes together
define over 100 chromosomal segment "bins" to which clones,
contigs, and other DNA markers can be assigned by DNA
hybridization tests. This resource system is greatly speeding the
completion of the chromosome 16 map.
Fluorescence In Situ Hybridization (FISH)
The previous mapping of DNA clones by FISH onto metaphase
chromosomes has now been extended to the much less condensed
interphase and pronuclear DNAs. Mapping onto less-condensed
chromosomes increases spatial resolution and the capacity to
order closely spaced markers. As a component of evolving mapping
strategies, FISH is serving to locate and orient cosmid contigs
on intact chromosomes and measure distances between the cosmids
as well as to mapped cDNAs. (J. Gray, University of California;
J. Korenberg, Cedars-Sinai Medical Center; B. Trask, LLNL).
Fragile X Locus Cloned
The fragile X locus has been cloned and its mode of action is
being characterized (C. T. Caskey and D. L. Nelson, Baylor
College of Medicine; and collaborators). Fragile X syndrome may
be the most common form of inherited mental retardation. About 1
in 1500 males and 1 in 2500 females are affected by the syndrome,
which is caused by a high mutation frequency at the fragile X
Myotonic Dystrophy Locus Cloned
The gene responsible for myotonic dystrophy, an autosomal
dominant disease, has been identified and cloned. The structural
defect is characterized by a tandemly repeated segment of DNA
within or close to the coding region on 19q13.3. The extent of
the amplified region appears to be associated with the severity
of the disease (C. T. Caskey, Baylor College of Medicine; P. de
Jong and A. Carrano, LLNL; and collaborators).
Multiple informatics capabilities will be crucial to the
successful application of data derived from the genome project.
Informatics expertise, software, and hardware are being developed
in the following areas: chromosome map assembly, databases, DNA
sequence analysis, and laboratory automation.
Algorithms for automatically assembling physical maps from cloned
fingerprint data have been further improved (E. Branscomb, LLNL;
M. Cinkosky, V. Faber, J. Fickett, and D. Torney, LANL).
Software permitting fast parallel computations on multiple
computers was developed to speed computation-intensive mapping
(E. Branscomb, LLNL).
A computer communication and interrogation system is being
assembled to minimize redundancy during the production of STS
chromosomal markers from cDNAs. Participating laboratories will
rapidly query distant databases to determine the novelty of a
candidate mRNA/cDNA before further pursuing the STS-generation
Graphical interfaces for mapping databases were constructed to
display several different types of aligned chromosomal data and
provide expandable views [R. Douthart, Pacific Northwest
Laboratory (PNL); J. Fickett, LANL; S. Lewis, Lawrence Berkeley
Laboratory (LBL); R. Overbeek, Argonne National Laboratory
The electronic Laboratory Notebook database and similar databases
are being continuously expanded to include new data types as
mapping strategies evolve (J. Fickett, LANL).
The internationally available Genome Data Base (GDB), housed at
Johns Hopkins University and cofunded since September 1991 by DOE
and NIH, is the primary reference data-base for human chromosome
mapping data produced in the United States and abroad. The
organizational structure of GDB is shown on the opposite page (P.
In a collaboration between LLNL and GDB, computer system
interfaces have been devised for automatically transferring large
amounts of data from mapping centers to GDB for integration into
and updating of chromosome maps.
Enhancements of the GenBankr DNA sequence database located at
LANL continue. Primarily supported by NIH with contributions from
DOE, GenBank exchanges data daily with European and Japanese
databases. GenBank has expanded its electronic data-publishing
facilities and has reached agreements with a number of journals
to facilitate electronic publication of large volumes of DNA
sequence data (J. Cassatt, NIH).
gm, developed at New Mexico State University, is the first DNA
sequence analysis algorithm capable of recognizing and ordering
the set of protein-coding regions (exons) from among the
noncoding regions (introns) comprising a gene, rather than
predicting isolated protein-coding sequences. gm has been
distributed to laboratories worldwide (C. Fields, now at NIH, and
C. Soderlund, now at LANL).
Gene Recognition and Analysis Internet Link (GRAIL), a novel
neural network-based algorithm for identifying exons within DNA
sequences, is online at Oak Ridge National Laboratory (ORNL) to
serve the biological community by automatically analyzing
sequences. From a number of examples, this artificial
intelligence system learns several distinct sequence
characteristics through which exons can be recognized. GRAIL
automatically accepts input sequences sent to ORNL over Internet
and returns the output analysis to the sender (R. Mural and E.
Advances continue in the linking of laboratory instruments
directly to data-acquisition computers and analysis software at
the LANL, LLNL, and LBL human genome centers.
The DOE Human Genome Program has supported both evolutionary
(incremental, gel-based) improvements to classical sequencing
methods and several revolutionary (completely novel, gel-less)
technologies. Steady advances have occurred in the evolutionary
area with the implementation of automated sample preparation,
multiplex sequencing, and strategies that minimize the need for
Gel Sequencing Approaches
Multiplex sequencing systems have matured enough for transfer to
the commercial sector (G. Church, Harvard Medical School; R.
Gesteland, University of Utah).
The readout of multiplexed gels and blots using stable isotopes
as nucleic acid labels has the potential to increase sequencing
speeds by at least a factor of 10 because resonance ionization
mass spectroscopy is capable of differentiating many isotopes (H.
Arlinghaus, Atom Sciences, Inc.; K. B. Jacobson, ORNL).
Chemiluminescent label systems are now substituting for the
less-desirable radioactive labels in many applications (I.
Bronstein, Tropix, Inc.).
Systems have been developed to retain chromosome continuity
information by bypassing the customary subcloning step in the
sequencing of recombinant DNAs (D. Berg, Washington University;
C. Berg and L. Strausbaugh, University of Connecticut; J. Dunn
and F. Studier, Brookhaven National Laboratory; R. Gesteland and
R. Weiss, University of Utah).
Fractionation speeds on capillary and very thin slab gels are
10-fold faster than on traditional thick gels (N. Dovichi,
University of Alberta, Canada; B. Karger, Northeastern
University; L. Smith, University of Wisconsin).
The fluorescence/luminescence detection of fractionated nucleic
acids has been significantly improved to allow detection of the
smaller amounts of DNA loaded on capillary and thin slab gels (N.
Dovichi, University of Alberta; R. Mathies, University of
California; E. Yeung, Ames Laboratory).
Over 300 kb have been sequenced from human and mouse T-cell
receptors, providing fundamental new insights into the molecular
biology of the immune response (L. Hood and T. Hunkapiller,
California Institute of Technology).
Gel-less Sequencing Technologies
The technology for interrogating or sequencing clones by
hybridization with short oligomers has passed a second
proof-of-concept test. Three unknown DNA fragments were fully and
accurately sequenced (R. Crkvenjakov and R. Drmanac, ANL).
In research and development for single-molecule sequencing by
processive nucleotide release, the capacity to detect single
nucleotides by laser-induced fluorescence has been demonstrated
(R. Keller and J. Jett, LANL).
Progress is being made in developing methods to sequence DNA
using lasers coupled to a mass spectrometer. The great advantage
of these approaches is that the mass spectrum can be acquired in
milliseconds (C. Chen, ORNL; J. Jaklevic, W. Benner, and J. Katz,
LBL; L. Smith and B. Chait, University of Wisconsin; R. Smith,
PNL; P. Williams and N. Woodbury, Arizona State University).
Activities Addressing Ethical, Legal, and Social Issues Related
to Human Genome Project Data
In FY 1991, DOE activities on ethical, legal, and social issues
(ELSI) included two conferences, three education projects, and
three research projects. The first conference, Justice and the
Human Genome, held in November 1991 at the University of Illinois
College of Medicine, considered discrimination that could result
from the use of genetic information about ethnic and other
groups. The second conference, held in March 1992 at the Texas
Medical Center Institute of Religion, focused on Genetics,
Religion, and Ethics.
The three education projects on the science and the societal
implications of data produced in the Human Genome Project, listed
with their preparers, include (1) a module to be developed and
distributed to all U.S. high school biology teachers (Biological
Sciences Curriculum Study); (2) an educational television series,
"Medicine at the Crossroads," which will address the role of
genetics in understanding and treating disease (WNET, New York,
cofunded with NIH and the National Science Foundation); and (3) a
program of hands-on workshops for public officials and other
nonscientists (Cold Spring Harbor Laboratory).
The three ongoing research projects, listed with the institutions
developing them, are (1) a study of ethical issues arising from
the rapid proliferation of genetic tests that can predict future
disease in otherwise healthy individuals [National Academy of
Sciences (NAS) Institute of Medicine, cofunded with NIH]; (2) a
legal study of confidentiality protection for genetic data
(Shriver Center); and (3) a study to consider problems in funding
young investigators in biological and biomedical sciences (NAS).
In its first 2 years, the DOE Human Genome Program funded a
variety of ELSI activities, noted above. To avoid being spread
too thinly, the ELSI component of the DOE Program now focuses on
confidentiality and privacy concerns raised by increased genetic
data about individuals. This sensitive, personal information,
which may predict disorders before symptoms occur or treatments
are available, can affect a person's self-image, employability,
status in the eyes of others, and ability to obtain health
insurance. Since genetic knowledge can also lead to better
understanding of disease causation and to more-accurate
assessments of environmental affronts, a balance must be achieved
between the health of the public and the privacy interests of the
The DOE Human Genome Program is funding six new projects covering
ELSI activities in research and education. One of the three
projects investigating genetic discrimination will compare two
states (Florida and Georgia), contrasting their genetic testing,
screening, and counseling programs and the impact on different
ethnic and socioeconomic communities. Another will examine the
impact of two genetic conditions (cystic fibrosis and sickle cell
disease) on African-Americans and Caucasians. A third will
identify particular social institutions that may engage in
discrimination and will consider whether the discrimination, if
present, is the result of ignorance or systematic policy. A
fourth project will explore in detail (a) the effect of genetic
knowledge on the right of privacy and (b) the uses of genetic
information in public health planning. A fifth project will
develop a program of educational workshops for secondary and high
school science teachers, focused on both the science and the
ethical, legal, and social issues arising from data generated by
human genome research. A six the project will involve a second
educational television series, "The Secret of Life" (WGBH,
Boston), which will address the current revolution in molecular
biology and genetics.
Other activities include conferences on Genes and Human Behavior:
A New Era? (October 1991); Computers, Freedom, and Privacy (March
1992); and Science, Technology, and Ethical Responsibility
(scheduled for June 1992).
While very challenging issues are raised by genome research,
solutions are not simple; defensible rights often exist on both
sides of any issue. Further research is needed, as well as
activities to promote public awareness and assist in policy
development. Also, with the increasing use of computers to
assemble, store, and organize data (including genetic data) into
large databases, the issues of security and access control become
more acute. To begin reorienting and better defining the scope of
ELSI activities in the DOE program, the DOE-NIH Joint ELSI
Working Group has established a collaborative effort on privacy
to identify an ELSI research agenda and develop a more detailed
approach to some of these concerns.
Technology Transfer and Industrial Collaboration
Technology transfer, considered one of the three most important
facets of the DOE mission (along with meeting the nation's
defense and energy needs), is enhancing U.S. investment in
research and technological competitiveness. By creating new
products, markets, and jobs, the rapid deployment of technology
from the research laboratory to the marketplace can play an
important role in vitalizing the U.S. economy. A vast potential
exists for commercial development of genome resources and
technology; applications to clinical medicine have already begun.
All participants in the Human Genome Program are encouraged to
engage in active collaborations with the private sector and
transfer their resources and technologies for commercial
Each national laboratory has a technology transfer office. The
LLNL, LBL, and LANL human genome centers provide a variety of
opportunities for collaborations on joint projects or for
obtaining direct access to technology. They are also exploring
additional ways to increase cooperation with the private sector;
a number of interactive projects are now under way, and
additional interactions are in the preliminary stages. In some
instances, private industries are marketing technologies
developed at DOE-sponsored research laboratories and are
providing research funds or other resources to the centers; other
collaborative programs involve joint development of technologies
and their applications to achieve project goals.
One mechanism being used by the DOE national laboratories is the
Cooperative Research and Development Agreement (CRADA). The first
CRADA in the genome project, established by DOE in the spring of
1991, was between Life Technologies, Inc. (LTI) and the LANL
Center for Human Genome Studies for technologies developed in the
single-molecule sequencing project. In this project an
LTI-modified DNA polymerase will be used to label a single DNA
strand with four different fluorescent, base-specific tags. After
an exonuclease cuts the labeled nucleic acid base pairs from the
DNA, the labeled bases will be induced to fluoresce as they pass
sequentially through a focused laser beam. The bases can be
identified and counted by a sensitive photodetector (see figure
on p. 25 for more information). If successful, the technology
will allow sequencing of 50,000-bp DNA fragments at 1000 bp/s.
LTI will have the first opportunity to license products resulting
from the joint effort and would pay royal ties to LANL under such
Potential commercial advancements in the Human Genome Program
have also been recognized outside the genome community. Research
and Development magazine selected an achievement by Edward Yeung
and other Ames Laboratory scientists as one of the 100 most
significant developments of 1991. This R&D 100 Award was given
for the development of a user-friendly instrument that detects
with extremely high sensitivity the fluorescent molecule
concentration (based on laser-excited fluorescence), an
improvement that may lead to routine high-speed DNA sequencing by
capillary gel electrophoresis. A U.S. patent for portions of this
technology has been issued, and several commercial manufacturers
are considering the possibilities of marketing the instrument.
A technology pioneered by LLNL to identify chromosomal
abnormalities (e.g., aneuploidy, translocations, and deletions)
has been licensed to Imagenetics, Inc., a medical diagnostics
firm that will manufacture the technology and provide funding for
future research and development. This technology involves the use
of specially developed fluorescent dyes called Whole Chromosome
Paints™ to detect diseases such as cancers and leukemia. Whole
Chromosome Paints are being marketed by LTI.
Some other technology transfers from DOE-sponsored genome
research, both at the national laboratories and extramurally, are
highlighted below. In progress or awaiting finalization are many
more developments and agreements, some of which cannot be
disclosed at this time because of their proprietary nature.
Resources. Collaborative agreements have aided in the further
development of several new technologies used in genome research,
as well as in their commercial applications. New methods are
being evaluated for use in isolating mRNA, chromosomes, and
restriction fragments; in amplifying hybridization signals; and
in extending DNA molecules. In addition, bacterial host strains
have been developed that give greater stability to cosmid
constructs containing human DNAs. Improvements are being made in
DNA detection methods by the development of new probes, stains,
and fluorescent dyes.
As a result of the recent cloning of the fragile X gene, several
companies are negotiating for licenses to develop assays for
diagnosing fragile X syndrome, probably the most frequently
inherited form of mental retardation.
Hardware. Automation and enhancement of data collection and
analysis has been the goal of many collaborations with the
commercial sector. Equipment is being designed to automate (1)
the production of high-density arrays on agarose or filters and
(2) clone fingerprinting by gel electrophoresis (as well as the
data collection and analysis software). Advanced applications for
robotic systems are also being developed.
The resolution of DNA fragments is being enhanced by improvements
in pulsed-field gel electrophoresis. Resonance ionization
spectroscopy is being modified to enable rapid detection of
stable isotope labels on DNA following gel electrophoresis. A
commercial gel scanner is being developed for reading DNA gels.
Software. To aid physical map construction, programs are being
designed for efficient clone analysis. Several other
image-analysis programs are being developed, including
data-capture software for images from video screens in
combination with a DNA molecule imaging system.
Sequencing. Multiplex sequencing technologies are being used to
sequence pathogenic microbes.
Human Genome Center Research Narratives
Lawrence Berkeley Laboratory
Since its inception in 1987, the Lawrence Berkeley Laboratory
(LBL) Human Genome Center has focused on developing the necessary
research and analytical technology to speed genome mapping and
decrease the cost of sequencing. Over the last year, LBL has
strengthened its ties with the University of California,
Berkeley, particularly in the biological sciences. This
collaboration fosters interdisciplinary activities in biology,
instrumentation, and informatics.
The biology component at LBL is concentrating on developing and
improving mapping and sequencing strategies for human chromosome
21. To achieve these goals, investigators in each biology project
draw on the expertise of the center's instrumentation and
Two major biology projects are under way, and a third is in
development. Physical mapping at LBL is focused on a 10-Mb region
of human chromosome 21, and over 90 unique chromosome 21-specific
yeast artificial chromosomes (YACs) have been located by
fluorescence in situ hybridization (FISH). A new method has been
developed that permits rapid isolation of chromosome-specific
YACs, using probes isolated from flow-sorted chromosome libraries
from Lawrence Livermore National Laboratory. In addition, cDNAs
specific to a given YAC are being isolated by an automatable
procedure based on magnetic beads.
The second major biology effort involves testing new approaches
to physical mapping and genomic sequencing. These projects
exploit current methods, such as FISH and appropriate pooling
strategies, for efficient isolation of overlapping clones. In
addition, new work has begun on subcloning and ordering libraries
of clones for mapping and on the use of gamma delta transposons
as the primer site for sequencing studies. Increased efficiency
in constructing physical maps results from a clone-limited
strategy for generating maps based on sequence tagged sites
(STSs). This nonrandom selection strategy reduces the number of
STS assays required and produces contigs that cover a larger
fraction of the genome.
The third biology project is aimed at developing automated
methods for generating genetic maps. A simple filter assay will
be used to detect heterozygosity at mapped loci in yeast, mice,
and human DNA samples.
The instrumentation program within the LBL Human Genome Center
has two major areas of effort: (1) biology and instrumentation
development and support and (2) new instrumentation development
based on emerging technologies. Supporting activities include the
design and fabrication of gel boxes, automation of protocols on
existing robotic frameworks, and the installation and networking
of a variety of image-acquisition systems. In addition, advanced
robotic [high-speed colony picking, robotic-based polymerase
chain reaction, and DNA synthesis] and laboratory systems
integration is under development.
Efforts to produce new, adaptable technologies for the genome
program include optimizing large-molecule detection systems;
designing versatile optical fluorescence systems for multiplex
labeling; and developing microfabricated arrays for application
to large-scale clone libraries, sequencing by hybridization, and
other procedures. The use of computer-controlled robotic systems
provides a mechanism for automatically capturing the vast amount
of data generated by laboratory operations. This requires a close
coordination between hardware and software development in
laboratory system design that goes far beyond automation of a few
A major part of the computing and instrumentation effort is
driven by biology projects. The center's computing group focuses
on specific applications in four major areas: raw data
acquisition and analysis, information tracking and management,
data interpretation and comparison analysis, and development of
software tools. Visual data for mapping (including in situ
pictures, autoradiograms, ethidium gels, and chemiluminescent
staining) are handled by BioPix, a set of programs that assemble
and integrate data from image capture to analysis. A similar
system is being developed for sequence data. The Chromosome
Information System (CIS) allows biologists to search, edit, and
compare various maps, markers, and related reference information
and to interact with other programs to exchange data. The
laboratory data analysis system uses existing software packages
and provides system management and support throughout the center.
New, in-house analysis packages are being devised for sequence
alignment and assembly. Software development tools permit rapid
design and modification of database management systems, thus
facilitating increased productivity, vendor independence, and
* Over 90 independent YACs averaging 100 kb were regionally
assigned to human chromosome 21 by FISH. These YACs include
genetic markers to help integrate maps.
* Two hundred unique probes were isolated for chromosome 21
and are being used to identify YACs from genomic libraries.
* A rapid cDNA clone-screening method uses immobilized YAC
clones to screen cDNA libraries, which are then localized on
specific chromosomes. An alternative screening method uses
individual YACs or cosmids attached to magnetic beads to
isolate specific cDNAs, a method that can be readily
automated to speed identification of coding sequences for
* Marker-selected libraries, highly enriched for clones
containing (CA)n repeats, were constructed from primary
genomic libraries. These enriched libraries increase the
efficiency of screening almost 50-fold.
* A probe-mapping procedure determines the distance between
the probe and the chromosome or YAC end. This method, which
uses X rays to break large DNA pieces randomly, can be used
to map cDNAs and to estimate the length of entire genes.
* A double-ended, clone-limited strategy for physical mapping
of chromosomes was devised. This strategy maps chromosomes
on the order of 100 Mb and should result in larger contigs
with a minimum of assays.
* CIS, developed by the genome center computing group, was
used to produce consensus maps at workshops on human
chromosomes 3 and 21 and is being expanded for use with a
number of plant species in the Plant Genome Program of the
U.S. Department of Agriculture.
* High-level database design tools have been developed to
permit molecular biologists to define data objects in a way
that captures biological concepts. The software
automatically generates low-level commands for a commercial
database management system, facilitating the evolutionary
development of modular system components. These tools are
also being used by researchers to design the Superconducting
Super Collider database and the Integrated Genome Database.
* A variety of mechanical, electrical, and chemical means have
been used to manipulate DNA molecules; these methods include
stretching molecules physically by externally applied
electrical fields and guiding the molecules through grooves
in a glass surface; digesting and separating single
molecules; and picking up, transporting, and releasing DNA
with scanning tunneling microscope (STM) tips.
* Investigation of the feasibility of using STM for
visualizing the individual bases of single-stranded DNA has
shown that while purines and pyrimidines can be
distinguished from each other, two bases in the same class
cannot be differentiated by this method.
* A fast, filter-based assay was developed to identify single
base-pair polymorphisms, eliminating the need for gel
* Higher throughput was achieved through the construction of a
dedicated high-speed colony-picking workstation. The pick
rate is 10 to 20 times faster than the initial picking
system and both faster and more accurate than a highly
qualified human. The new picker arrayed an entire library of
over 10,000 clones in 1 day.
* Robots have been modified for use with a number of chemistry
protocols, including cosmid and YAC library replication,
various pooling schemes, and high-density filter array
production. Using the robot to replicate libraries has made
copies available to researchers in the private sector and in
other national laboratories.
* Construction of a 10-Mb contig of human chromosome 21 based
on overlapping YACs. The sequence will be determined by the
most efficient strategy available.
* Sequencing of a P1 clone. Subclone assembly will use a
nonrandom strategy, and primer sequences will originate in
the transposon gamma delta.
* Construction of chromosome genetic maps of human chromosomes
16 and 19 in collaboration with other DOE genome centers. A
simple gel-based heterozygosity assay is being developed to
support this research.
* Development of a computational biology program within the
computing group to design and implement new algorithms for
sequence assembly. Preliminary data will come from
collaborations with other genome centers.
* Design and implementation of a software tool suite for
managing information and for optimizing the unique strategy
of particular research groups. As large-scale sequencing
projects develop, new acquisition and analysis software will
be integrated into CIS.
* Implementation of QUEST, a database tool that will provide a
single entry point to the conceptual data model. QUEST will
then implement automatically any changes in the user
interface, the database query procedures, and the database
* Optimization of improved detectors and the associated mass
spectrometry system for large biological molecules.
* Automation of handling and analysis of dot-blot
hybridization experiments and the implementation of a
high-speed colony-picking apparatus.
For more information on the LBL Human Genome Center, contact
Jasper Rine, Director, or Sylvia Spengler, Deputy Director, at
Lawrence Livermore National Laboratory
The Human Genome Center at Lawrence Livermore National Laboratory
(LLNL) is a multidisciplinary team effort that brings together
chemists, biologists, molecular biologists, physicists,
mathematicians, computer scientists, and engineers in an
interactive research environment. Many of these individuals have
previously collaborated on research projects in molecular
biology, cytogenetics, mutagenesis, and instrumentation, as well
as in the National Laboratory Gene Library Project (NLGLP). These
projects have contributed substantially to the identification and
characterization of human DNA repair genes, specifically the
three on chromosome 19 that are a focus of interest at LLNL.
The short- and long-term goals of the LLNL effort are to (1)
develop biological and physical resources useful for genome
research, (2) model and evaluate DNA mapping and sequencing
strategies, (3) couple these resources and strategies in an
optimal way to construct ordered clone maps and DNA sequences of
human chromosomes, and (4) use the map and sequence information
to study genome organization and variation. To achieve these
goals, the Human Genome Center is organized into three broad
research and support areas, each consisting of multiple projects
led by a principal investigator. Extensive interaction occurs
within and among all projects that have as their common goal the
construction of ordered clone maps of the human genome. The
program structure of the center includes a core facility and
projects that focus on physical mapping and enabling
Research and Support Areas
Coordination and collaboration take place with other research
groups throughout the world that are involved in the genome
initiative or other mutual scientific interests. The role of LLNL
in the Human Genome Project is seen as encompassing several
areas, including technology development, map construction, map
interpretation, and integration with ongoing and new programs in
structural biology and mutagenesis. The following three
components are highly interactive; individual staff members often
have responsibilities in more than one component.
Core facilities. The administrative group is concerned with
budget oversight, external and internal meeting coordination,
preparation of center reports, training coordination, property
and space management, safety oversight, and secretarial support.
The scientific core provides general support to the physical
mapping effort, including cell culture and DNA extraction;
library, probe, and clone management; oligonucleotide synthesis;
fluorescence-based restriction mapping; and DNA sequencing. The
core also facilitates material distribution to collaborators in
the external community.
Mapping activities. Five projects represent the coordinated
effort to obtain an overlapping set of clones for human
chromosome 19 and to further characterize genomic organization:
* Assembly, closure, and characterization of a chromosome 19
contig map. The goal of this project is to construct an
overlapping set of cosmid clones using a variety of
techniques. An automated fluorescence-based
restriction-fragment fingerprinting strategy is used to
establish a foundation map of cosmid contigs. The contig
closure effort will focus on using yeast artificial
chromosomes (YACs) and cosmids with two hybridization-based
techniques; one is based on fragments generated from Alu
sequence primers or sequence tagged sites (STSs) by the
polymerase chain reaction (PCR) and the second on RNA
transcripts generated from the ends of cloned inserts.
* Interdigitation of the physical and genetic maps of human
chromosome 19. The goals of this effort are to locate known
genetic markers on the expanding contig map, to coordinate
the isolation of chromosome 19-specific STSs, and to
localize them on the cosmid map.
* DNA sequence mapping by fluorescence in situ hybridization
(FISH). This project exploits the power of FISH on metaphase
chromosomes, interphase cells, and pronuclear DNA. FISH will
be used to determine the location of genes of interest and
the relative order and orientation of the cosmid contigs.
* cDNA mapping. The goal of this project is to isolate,
sequence, and map cDNAs-expressed in a variety of human
tissues_that will become the STSs on which future studies of
genetic organization and gene function will be based.
* New mapping strategies. New methods useful for library
construction, contig closure, and overlap detection will be
developed and validated. Focus is on improving Alu-PCR-based
technology and pooling schemes to achieve closure of the
chromosome 19 map with cosmids and YACs.
Enabling technologies. The following groups provide
computational, resource, and instrumentation support for research
ùComputational support for the Human Genome Center. This
group is responsible for mathematical modeling of mapping
and sequencing strategies and the development and
application of data analysis algorithms and software. They
are also responsible for the construction and maintenance of
interactive relational databases that enable internal and
external data access, including development of graphical
* NLGLP. This project, a joint effort with Los Alamos National
Laboratory, draws upon LLNL experience in flow
instrumentation and chromosome sorting to construct human
chromosome-specific libraries in lambda and cosmid vectors
for use in physical mapping and other studies.
* Instrumentation for cytogenetics and gene mapping. This
group is responsible for developing instrumentation to
facilitate flow systems analysis and chromosome sorting and
to support FISH.
The LLNL Human Genome Center has made excellent progress in the
construction of an ordered set of cosmids for chromosome 19, the
development and application of new biochemical and mathematical
approaches for constructing ordered clone maps, the automation of
fingerprinting chemistries, and high-resolution imaging of DNA.
Major accomplishments are highlighted below.
* Considerable progress has been made toward the closure of
the chromosome 19 physical map. More than 10,000 cosmids
have been analyzed by an automated fluorescence-based
fingerprinting approach and assembled into over 870 contigs
that span about 80% of the chromosome. FISH has been used to
locate over 400 cosmids and 117 contigs on the cytological
map, and more than 70 known genetic markers have been
located on cosmid contigs. Closure of the gaps between
contigs is under way using YACs and cosmids.
* Cosmid contigs analyzed in the carcinoembryonic antigen
(CEA) gene family region of chromosome 19 were found to be
tightly linked over relatively short stretches of DNA. This
gene family of about 22 members appears to span a contiguous
region of about 1 Mb. With probes made from the ends of
these contigs, hybridization techniques were applied to join
contigs established by fingerprinting into larger contigs.
In addition, almost 2 Mb surrounding the myotonic dystrophy
locus were linked with cosmids and YACs.
* More than 20 clones containing DNA sequences corresponding
to a number of important genes and regions that map to chromosome
19 were isolated from two separate YAC libraries. Among these
clones were the region encoding the LDL receptor and ApoE gene,
two important components of the regulation of cholesterol and
triglyceride metabolism in humans. Similarly, a region was
isolated that encodes a family of serine proteases called
Kallikreins, whose role is the specific proteolytic activation of
peptide hormones and growth factors. Clones of these regions are
being used for the structural analysis and mapping of these
* A structural defect found in the cloned gene linked to the
autosomal dominant disease myotonic dystrophy has been
identified through an international collaboration. This
chromosome 19 defect, which is characterized by a tandemly
repeated segment of DNA within or close to the coding region
on q13.3, is similar to that seen in the fragile X syndrome.
The extent of the amplified region appears to be associated
with the severity of the disease.
* The gene for DNA ligase 1 was mapped to the long arm of
chromosome 19. A defect in this gene may be associated with
increased cancer risk. This is the fourth gene involved in
DNA metabolism that has been mapped to this region of
* Significant progress was accomplished in defining the
organization of the cytochrome P450 genes mapping to
chromosome 19. Multiple members of each of the three
subfamilies were identified. The cosmids containing these
genes will be useful resources for studies of the function
and physiological importance of the genes.
* Three levels of resolution of FISH have been developed and
applied to localize and orient cosmids. Localizing cosmids
to metaphase chromosomes provides a resolution of about 1 to
3 Mb. Localization to somatic interphase cells gives a
resolution of from 50 kb to 1 Mb and hybridization to sperm
pronuclei a 20-kb to 1-Mb resolution. With FISH, a linear
relationship was demonstrated between physical distance and
genomic distance of 20 kb up to at least 800 kb in pronuclei
derived from human spermatozoa. With a single probe, the
presence of multiple copies of the closely related genes of
the CEA family has been detected in human sperm pronuclei.
Single and multicolor hybridizations are routinely
* A reproducible method of mapping YACs by FISH has been
developed. This procedure involves isolating YACs with
pulsed-field gels, digesting with the restriction enzyme Mbo
I, ligating to oligonucleotide linker adapters, and
amplifying with PCR. The products are then mapped onto human
metaphase chromosomes by standard FISH methods.
* The technique of Alu-PCR has been further exploited. To
isolate region-specific DNA probes from human-rodent hybrid
cell lines, previously developed PCR procedures were
expanded. Human sequences are preferentially amplified using
PCR primers specific for repeats of the human Alu repeat
family. Several new primers have been developed that amplify
human DNA sequences very efficiently, further facilitating
probe isolation from human genome regions present in the
available hybrids. Many different human sequences amplify
from the hybrids; individual probe sequences are obtained by
subsequent cloning in plasmid vectors in Escherichia coli.
To expedite this, ligation-independent cloning has been
developed to increase efficiency of cloning and eliminate
the common background of clones that do not contain
recombinant DNA molecules. In addition, an efficient
procedure has been developed to clone the PCR products
common to two cell lines. This method _coincidence
cloning_permits a further enrichment for sequences derived
from defined regions of the genome.
* Clone-pooling schemes have been developed to facilitate
screening of both cosmid and YAC libraries. Each clone is
present in a number of different pools, reducing the number
of DNA samples that must be deposited on a high-density
filter for hybridization-based screening and the number of
tubes needed for PCR-based screening. Since each clone is
defined by a unique combination of pools, the screening of
pools by probe hybridization permits identification of the
recombinants shared by a number of pools. This approach was
used very successfully to screen a 10,000-clone cosmid
library. The idea also was used to consolidate a
60,000-clone YAC library into about 1800 sample pools.
Results demonstrated that hybridization-positive YAC pools
can, indeed, be distinguished from hybridization-negative
YAC pools, thus allowing the efficient identification of YAC
* Human YACs were isolated from a library constructed using a
monochromosomal 19 hybrid cell line. The YACs vary in size
between 120 and 350 kb. One of the analyzed YACs carries
sequences from the telomere region of chromosome 19, and
another maps to the centromere region of chromosome 19 by
* A second-generation suite of robust, reliable computer
programs was completed for signal preparation and analysis
of chromosome 19 restriction fragment fingerprints. These
programs implement methods for random noise suppression,
background subtraction, and color decorrelation. A new
program (TIMEWARP) was also completed to map peak locations
in a gel to a common coordinate system by dynamic
programming and shape-preserving spline interpolation.
* The Sybase database has been enhanced to contain all the
laboratory notebook and experimental data important to
physical map construction. This includes clone repository
information, restriction fragment fingerprinting, and data
on probe hybridization and FISH. The database is coupled to
the graphical browser so the end user can retrieve many of
the experimental results in graphical form.
* The graphical database browser was enhanced to run Human
Genome Project data remotely over Internet. The browser's
ability to link to multiple databases at external
collaborator sites has been demonstrated.
* In a collaborative effort, automatic transnetwork methods
for transferring physical mapping results to the central
Genome Data Base (GDB) at Johns Hopkins were built, tested,
and implemented by GDB and LLNL. This work was in support of
DOE concerns that all laboratories should effect mechanisms
to ensure that data are made available to the appropriate
public databases after a suitable time period. Prototype
methods were implemented, tested, and publicly demonstrated
for logically linking our database with the major sequence
and mapping databases (GenBankr and GDB). Direct
transnetwork queries that logically integrate these data
sets are now feasible.
* As part of NLGLP, high-speed flow sorting was used to purify
individual human chromosomes for cloning. Large-insert phage
and cosmid libraries have been made for chromosomes 9, 12,
18, 19, 21, 22, and Y. Several libraries have been
distributed to users and evaluation sites. In addition, the
high-speed sorter has been rebuilt with new fluidics to
optimize sterility and with new electronics to increase the
purity of the sorted material.
* Construction of a new high-speed chromosome sorter was
completed. This instrument has new digital acquisition
electronics, a new fluidic system, and a more stable sample
stream. The instrument analyzes chromosomes at the rate of
up to 20,000/s and can reliably produce 250 to 1000 ng of
sorted chromosome DNA equivalents per day.
* Using scanning tunneling microscopy (STM), individual images
of the bases adenine and thymine were obtained at atomic
resolution, indicating that a scanning-probe microscopy
technique can discriminate between purines and pyrimidines.
* Several technologies have been transferred to industry. They
include software for analysis and graphical display of
physical map data, sequence information for the
commercialization of Alu-PCR primers, and vectors for the
construction of cosmid libraries. In addition, collaborative
research programs with industry have continued in the areas
of fluorescence-based restriction fragment analysis,
development of pulsed-field gel systems, development and
testing of automated and high-throughput plasmid/cosmid DNA
extraction, and development and testing of a robot for
high-density colony replication on filters.
The LLNL genome center's first priority is to complete, to the
extent possible, an ordered clone map of chromosome 19; this
physical map will likely be a composite linear array of cosmid,
lambda, and YAC clones. It will be correlated with the genetic
map to assist the scientific community in localizing and
isolating all genes from chromosome 19. State-of-the-art
technology will be used to sequence selected high-interest
regions of the chromosome. Once the technology has been validated
for map construction of a large portion of chromosome 19, efforts
will be directed to chromosome 2.
When Human Genome Project emphasis shifts from mapping to
sequencing, exploration will turn to rapid automated DNA
sequencing methods that can use large fragments such as cosmids
or YACs as templates. STM and X-ray imaging technologies under
development at LLNL are expected to contribute to advancements in
Automation is an essential element of physical mapping. New
processes and instruments will be explored to reduce the need for
human intervention in highly repetitive tasks. A number of
instruments for clone manipulation and biochemical processes will
be considered for automation.
An effort to map and sequence the cDNAs expressed in a variety of
human tissues has recently been initiated. These cDNAs will be
used to generate STSs and will serve as the foundation for future
studies of gene organization and gene function.
Assisting the scientific community in completing ordered clone
maps is critical and will remain a high priority. LLNL intends to
serve as a resource laboratory for clones and for map information
on chromosomes of interest. Ultimately, map and sequence
information will be used to study the global architecture of the
chromosome and also to evaluate human somatic and genetic
variation, both spontaneous and induced.
For more information on the LLNL Human Genome Center, contact
Anthony Carrano, Director, at 510/422-5698 or Leilani Corell,
Administrator, at 510/423-3841.
Los Alamos National Laboratory
The Center for Human Genome Studies at Los Alamos National
Laboratory (LANL) provides direction, coordination, and technical
oversight for the LANL portion of the DOE Human Genome Program.
The center draws scientific talent from six technical divisions
at LANL. Molecular biologists, chemists, physicists,
mathematicians, computer scientists, and engineers are
contributing to progress in physical mapping, technology
development, and informatics. Although a specific goal is the
assembly of a complete physical map for human chromosome 16, much
of the work is broadly supportive of the worldwide Human Genome
Project. Collaborative research and development programs have
also been initiated with private-sector and other institutions
involved in human genome research. The major technical
subdivisions of the center are physical mapping, technology
development, and informatics. Activities are also under way at
the center to explore ethical, legal, and social issues arising
from genome research data and to transfer technology developed
within the center's projects.
Physical mapping includes the development of conceptual advances
in mapping strategy and the construction of a physical map of
chromosome 16. The physical map will be composed of phage,
cosmid, and YAC contigs ordered by repetitive sequence
fingerprinting. These ordered contigs will be integrated with the
genetic linkage map, the cytogenetic map, and known gene
sequences on chromosome 16. The final map, along with its
eventual translation into a sequence tagged site (STS) map, will
provide the means for rapid access to any region of the
chromosome for further analysis. In addition, the ordered clone
sets will be available for eventual sequencing.
Technology development efforts include the application of
robotics to the handling and storage of DNA fragments, the
development and application o f methods for the construction of
DNA libraries from flow-sorted chromosomes, and the development
of new methods for rapid, inexpensive, large-scale sequencing.
All these projects are or will be supportive of the physical
mapping of chromosome 16, and they also contribute to the larger
genome program. For example, the construction and distribution of
various kinds of libraries from sorted chromosomes is playing a
significant role at many of the genome research centers.
Informatics efforts involving the collection and analysis of
genome-related data will play an increasingly important role in
the genome project. LANL has a long history of expertise in this
research area and will continue to lead in providing these
Ethical, Legal, and Social Issues (ELSI) Activities
The center also sponsors active participation in ELSI studies
related to data produced by human genome research and is
compiling a comprehensive literature bibliography in
collaboration with Georgetown University. LANL scientists
participated in a series of discussions on ELSI issues sponsored
by the University of California Humanities Research Institute.
LANL will continue to put a high priority on collaborations with
private industry to use the skills and resources of the private
sector and to ensure effective technology transfer to the U.S.
commercial sector. The first Cooperative Research and Development
Agreement (CRADA) involving human genome research activity was
signed in 1991 by LANL and Life Technologies, Inc. (LTI).
Recent Progress and Future Directions
Construction of a physical map of chromosome 16. The
chromosome-mapping strategy at LANL involves the rapid generation
of cosmid contigs representing around 60% of the target
chromosome, followed by directed gap closure with yeast
artificial chromosomes (YACs). The first phase of this goal, the
rapid generation of nucleation contigs on chromosome 16, has been
completed [Stallings et al., Proc. Natl. Acad. Sci. USA 87:
6218-22 (1990)]. An approach for identifying overlapping cosmid
clones by exploiting the high density of repetitive sequences in
human DNA was used to generate 553 contigs following the
fingerprinting of over 4500 individual cosmid clones. These
contigs represent more than 80% of the euchromatic arms of
chromosome 16 and were constructed with about one-fourth as many
cosmid fingerprints as random strategies requiring 50% minimum
Nucleating at specific regions allows (a) the rapid generation of
large (>100 kb ) contigs in the early stages of contig mapping
and (b) the production of a contig map with useful landmarks
[i.e., (GT)n repeats] for rapid integration of the genetic and
physical maps. All 4500 fingerprinted cosmids in contigs and
singlets have been rearrayed on high-density filters. Such
filters already provide investigators with access to more than
90% of chromosome 16, with a 60% probability that any region is
already present in a contig. These high-density
chromosome-specific cosmid filter arrays have also proved useful
for YAC fingerprinting with repetitive sequence polymerase chain
reaction (PCR) techniques. In collaboration with the laboratories
of David Ward (Yale University) and David Callen (Adelaide
Children's Hospital, Australia), 130 of these arrayed cosmids
have been regionally localized via in situ hybridization or
somatic cell hybrid panels. The average gap (containing only
singlets), approximately 65 kb in length, can be easily closed
with YACs. A single walk from each end of current contigs should,
statistically, reduce the number of contigs to approximately 50,
one of the 5-year goals of the Human Genome Project (i.e., 1- to
2-Mb contigs; >95% coverage). To facilitate closure, LANL
investigators are constructing from monochromosomal hybrids and
flow-sorted material both a total genomic YAC library (from cell
line GM130, using the vectors pJS97 and pJS98; currently onefold
representation) and chromosome 16 YAC clones. One hundred STS
markers are being generated to key contigs. Extensive analyses of
the DNA sequences obtained from contig ends are in progress using
multiple approaches to identify potential coding regions. These
approaches include nucleotide and translated amino acid sequence
homology searches against GenBank, using BLAST and FASTA, and the
new adaptive network program, GRAIL, developed and made available
by the Oak Ridge National Laboratory. Current progress with YAC
closure indicates that the complete physical map of chromosome 16
will be achieved in the next few years.
Low-abundance repetitive DNA sequences identified on chromosome
16. Chromosome 16-specific, low-abundance repetitive DNA
sequences (designated CH16LARs) have been identified during
construction of the cosmid contig map of this chromosome.
CH16LARs were initially identified by in situ hybridization of
cosmid and YAC clones to normal human chromosomes (in
collaboration with David Ward). The cosmid clones all came from
contig 55. The hybridization signals were unusually intense and
occurred on four regions of human chromosome 16: bands p13, p12,
p11, and q22. Contig 55 contains more clones than any other
contig (78 clones or 2% of all clones fingerprinted thus far).
Ordering clones within contig 55 is not possible because the
presence of these low-abundance repetitive DNA sequences has
generated false overlaps. The regions containing CH16LARs may
cover as much as 5% of the euchromatic arms of chromosome 16 (~5
Mb of DNA). One CH16LAR sequence (CH16LAR1) was cloned and
sequenced, and a minisatellite type of repetitive sequence was
identified. The region containing CH16LARs is of biological
interest since the pericentric inversion breakpoints commonly
found in myelomonocytic leukemia fall within these regions
[Mitelman, Hereditas 104: 113 (1986)]. Alternative strategies for
mapping and ordering clones from this region are being
Construction and distribution of DNA libraries from flow-sorted
chromosomes: National Laboratory Gene Library Project (NLGLP).
NLGLP is a cooperative project between LANL and Lawrence
Livermore National Laboratory. Investigators at LANL have cloned
a set of complete digest libraries into the EcoR I insertion site
of Charon 21A; they are available from the American Type Culture
Collection, Rockville, Maryland. Sets of partial digest libraries
in the cosmid vector sCos1 and in the phage vector Charon 40 are
being constructed for human chromosomes 4, 5, 6, 8, 10, 11, 13,
14, 15, 16, 17, 20, and X. Individual human chromosomes are first
sorted from rodent-human hybrid cell lines until about 1 æg of
DNA has been accumulated. The sorted chromosomes are then
examined for purity by in situ hybridization, and the DNA is
extracted and partially digested with the restriction enzyme Sau
3AI, dephosphorylated, and cloned into vectors. Partial digest
libraries have been constructed for chromosomes 4, 5, 6, 8, 11,
13, 16, 17, and X. Purity estimates from sorted chromosomes,
flow-karyotype analysis, and plaque or colony hybridization
indicate that most of these libraries are 90 to 95% pure.
Additional cosmid library constructions and arrays of libraries
having five- to tenfold genomic coverage into microtiter plates
are in progress. Libraries have been constructed in M13 or
bluescript vectors to generate STS markers for selecting
chromosome-specific inserts from a genomic YAC library. LANL has
also cloned sorted DNA into YAC vectors and expects to construct
a series of YAC libraries representing individual chromosomes
A YAC library for human chromosome 21. YACs have been constructed
using DNA isolated from aliquots of flow-sorted human chromosome
21. Chromosomes were prepared from the somatic cell hybrid
WAV-17, which contains chromosome 21 as the only human
chromosome. DNA isolated from sorted chromosomes was restricted
with either Cla I or Eag I or both Not I and Nhe I, ligated to
YAC vectors pJS97 and pHS98, and transformed into Saccharomyces
cerevisiae strain YPH 250. The transformation efficiency of YACs
ranged from 600 to 2500 cfu/æg of sorted DNA. About 1200 human
YACs with an average size of 200 kb have been identified. The
locations of 20 random YACs on chromosome 21 were confirmed by
hybridization to somatic cell hybrid mapping panels. Three YACs
that hybridize to D21S55 have been identified and are being used
to initiate construction of a physical map of the Down's syndrome
region of chromosome 21. Sixty YAC clones from the chromosome 21
library were localized on chromosome 21 by in situ hybridization.
The results indicate that the library contains inserts that are
well distributed along the length of the chromosome and that the
frequency of chimeric inserts is low (below 3%). A collaboration
between the genome centers at LANL and Lawrence Berkeley
Laboratory (LBL) will use the library for comprehensive physical
mapping of chromosome 21 . The ability to construct
chromosome-specific YAC libraries from sorted chromosomes will
facilitate isolation of disease genes and construction of
long-range physical maps of complex genomes. LBL is working on
chromosome 21 in cooperation with LANL.
Chromosome-specific STS libraries. Specific STSs have been
systematically generated using flow-sorted chromosomes. DNA from
about 200,000 chromosomes was digested with either one or two
restriction enzymes (usually BamH I and Hind III) and cloned
directly into bacteriophage M13mp18. One-pass sequencing was
conducted, either manually or with a Dupont Genesis 2000
automated sequencer. DNA sequences were analyzed for the presence
of sequence similarity to common human repetitive sequences, and
appropriate PCR oligomers were synthesized. An acceptable STS-PCR
assay yielded the appropriately sized product from both the
hybrid cell line DNA containing only the human chromosome of
interest and the pools of 384 anonymous YAC clones, spiked with 5
ng/ml total human DNA. To date, over 340 kb of anonymous DNA
sequence from human chromosomes 5 and 7 have been analyzed. Two
hundred STS markers for chromosome 7 have been generated in
collaboration with Maynard Olson's laboratory at Washington
University [Green et al., Genomics (in press)], and the first 100
STS markers for chromosome 5 are currently being generated in
collaboration with John Wasmuth's laboratory at the University of
California, Irvine; 50 STSs for chromosome 5 have been regionally
localized. The overall efficiency of PCR reactions yielding
appropriate products, with the anonymous genomic sequences from
flow-sorted chromosomes, has been approximately 75%. GRAIL
analyses indicate that approximately 15% of both the chromosome
16 STSs and the randomly selected STSs for chromosomes 5 and 7
contain putative coding regions.
Informatics. The Laboratory Notebook database, designed to manage
all information necessary for map assembly, has been expanded to
include sequences, STS mapping information, and grid
hybridization data, as well as clone fingerprints and completed
maps. The forms-based interface is being expanded to provide easy
access to the new tables. Graphical interfaces and innovative
algorithms to aid map assembly have been prototyped and are being
refined. Integrated, multilevel maps are increasing in
importance. A strong emphasis for the coming year will be to
implement the Software for Integrated Genome Map Assembly (SIGMA)
system, which was designed to aid in display, assembly,
evaluation, and editing of integrated maps.
DNA sequencing based upon single-molecule detection in flow
cytometry. This project addresses the problem of rapidly
sequencing bases in large fragments of DNA. A DNA fragment of
about 40 kb will be labeled with base-identifying tags and
suspended in the flow stream of a flow cytometer capable of
single-molecule detection. The tagged bases will be sequentially
cleaved from the single fragment and identified as the liberated
tag passes through the laser beam. A sequencing rate of 100 to
1000 bases/s on DNA strands of around 40 kb is projected [Genet.
Anal. 8: 1 (1991)]. Accomplishments of this project are as
* Signed CRADA with LTI for joint research on DNA sequencing.
LTI will offer expertise in nucleic acid chemistry and
enzymology, and LANL will specialize in detection technology
and DNA handling. LTI will commercialize the technique [for
more information, refer to the figure on p. 25 and to Human
Genome News, 3(1): 5 (May 1991)].
* Detected several different kinds of single fluorescing
molecules with ~85% efficiency and low error rates [Chem.
Phys. Lett. 174: 553 (1990)].
* Observed photon bursts simultaneously from rhodamine-6G and
Texas Red, using both a doubled Nd/YAG and a synchronously
pumped dye laser for excitation and dual-wavelength
* Synthesized DNA fragments up to 500 nucleotides long that
contain one fluorescent nucleotide and three normal
nucleotides. DNA synthesis was observed with rhodamine-dCTP,
rhodamine-dATP, rhodamine-dUTP, fluorescein-dATP, and
fluorescein-dUTP. This work was a collaboration with LTI.
* Digested the fluoresceinated DNAs described above by six
different exonucleases: native T4 polymerase, native T7
polymerase, Klenow fragment of Escherichia coli pol I, exo
III, E. coli pol III holoenzyme, and snake venom
phosphodiesterase. LTI also participated in these
Robotic workcell for DNA filter array construction. A gantry
robot-based workcell has been assembled to array small spots of
DNA in an interleaved format. Grid densities on these membrane
filters can be varied from 576 to 9216 spots per 22 cm2. The
robot picks a microtiter plate from a dispenser, scans a barcode
label, removes the plate cover, and inserts a 96-pin gridding
tool into the plate wells. The tool is then positioned at the
appropriate place on the membrane, and the solutions on the pins
are transferred as spots. The gridding tool is washed and
sterilized, the lid replaced on the microtiter plate, and the
plate placed into a receiving stacker. The entire sequence is
repeated with new plates until the desired array has been
For more information on the LANL Center for Human Genome Studies,
contact Robert K. Moyzis, Director, or Larry Deaven, Deputy
Director, at 505/667-3912.
Program Management Infrastructure
DOE OHER Mission
Genetics and radiation biology have been a long-term concern of
the DOE Office of Health and Environmental Research (OHER) and
DOE forerunners_the Atomic Energy Commission (AEC) and the Energy
Research and Development Administration (ERDA). In the United
States, the first federal support for genetics research was
through AEC. In the early days of nuclear energy development, the
focus was on radiation effects and later broadened under ERDA and
DOE to include the health implications of all energy technologies
and their by-products (see "Enabling Legislation" in box below).
Today, an extensive program of OHER-sponsored research on genomic
structure, maintenance, damage, and repair continues at the
national laboratories and universities. Some major components of
OHER genetics research are (1) molecular cloning and
characterization of DNA repair genes, (2) improvement of
methodologies and resources for quantitating and characterizing
mutations, and (3) the focused resource and technology
development needed to map and sequence the human genome_the Human
The Atomic Energy Act of 1946 (P.L. 79-585) provided the
initial charter for a comprehensive program of research and
development related to the utilization of fissionable and
radioactive materials for medical, biological, and health
The Atomic Energy Act of 1954 (P.L. 83-703) further
authorized AEC "to conduct research on the biologic effects
of ionizing radiation."
The Energy Reorganization Act of 1974 (P.L. 93-438) provided
that responsibilities of ERDA shall include "engaging in and
supporting environmental, biomedical, physical and safety
research related to the development of energy resources and
The Federal Nonnuclear Energy Research and Development Act
of 1974 (P.L. 93-577) authorized ERDA to conduct a
comprehensive nonnuclear energy research, development, and
demonstration program to include the environmental and
social consequences of the various technologies.
The DOE Organization Act of 1977 (P.L. 95-91) instructed the
department "to assure incorporation of national
environmental protection goals in the formulation and
implementation of energy programs; and to advance the goal
of restoring, protecting, and enhancing environmental
quality, and assuring public health and safety," and to
conduct "a comprehensive program of research and development
on the environmental effects of energy technology and
Human exposure to environmental factors and the body's response
to such factors are a major concern. Unavoidable genome-damaging
agents in the environment include natural radiation sources, such
as the components of sunlight, cosmic rays from space, and radon
from the earth. Both inorganic and organic chemicals, some
natural to the environment and others generated by human commerce
and energy-related processes, put people at risk. Normal
biological functions also contribute to the risk of genetic
damage when the body's own cells produce potentially damaging
molecules in the course of metabolic processes such as defensive
actions against microbes, detoxification of harmful environmental
substances, and cell proliferation. Even DNA is not completely
stable chemically; its normal methylcytosine constituent has a
low but measurable rate of spontaneous mutagenic change.
Systems that reverse many types of DNA damage have evolved to
include a wide range of repair mechanisms within cells of all
species. Humans show great diversity in this capacity, with
repair-gene deficiencies showing up as sensitivity to DNA damage
from low-level radiation and in diseases such as cancer. Some
human genes that contribute to DNA repair processes have been
characterized, and others await detection and molecular cloning.
A goal of the OHER program is to improve the capabilities for
diagnosing individual susceptibility to genome damage.
The genome program is providing fundamental information about the
linear structure of chromosomes and genes, but understanding gene
function requires other types of knowledge. Elucidating the
three-dimensional (3-D) structure of proteins is crucial in
explicating their functions. To advance these studies, several
unique facilities for 3-D microstructure research, developed and
maintained at DOE laboratories (see box on DOE facilities), are
increasingly in demand by molecular biologists.
To carry out its national research and development obligations,
OHER conducts the following activities:
* Sponsors research and development projects at universities,
in the private sector, and at DOE national laboratories;
* Uses the unique capabilities of multidisciplinary DOE
national laboratories for the nation's benefit;
* With advice from the scientific community and other sectors
of government, considers novel, beneficial initiatives; and
* Provides expertise on various governmental working groups.
David J. Galas has directed OHER, an office of the DOE Office of
Energy Research, since April 1990. He also serves under the White
House Office of Science and Technology Policy as Cochair of the
Committee on Life Sciences and Health and as Chairman of its
Subcommittee on Biotechnology Research. John C. Wooley became
OHER Deputy Associate Director in June 1992.
The Human Genome Program, conceived as an Initiative within OHER,
is administered primarily through the Health Effects and Life
Science Research Division, directed by David A. Smith. The
Medical Applications and Biophysical Research Division, directed
by Robert W. Wood, monitors the instrumentation sector of the
Human Genome Program and, more broadly, sponsors research and
development of resources and instrumentation having biomedical
and biotechnological applications.
Major DOE Facilities and Resources Relevant to Molecular Biology
Center for X-Ray Optics LBL
GenBankr Data Sequence Repository LANL
High Flux Beam Reactor BNL
Los Alamos Neutron Scattering Center LANL
National Flow Cytometry Resource LANL
National Laboratory Gene Library Project LANL, LLNL
Protein Structure Data Bank BNL
National Synchrotron Light Source BNL
Scanning Transmission Electron Microscope Resource BNL
Stanford Synchrotron Radiation Laboratory Stanford
GRAIL, Online Sequence Interpretation Service ORNL
Program Management Task Group
The Human Genome Program Management Task Group (see box for list
of members) reports to the OHER Director and works to coordinate
the following within OHER:
* peer review of research proposals, using both prospective
and retrospective evaluations and
* administration of awards, collaboration with all concerned
agencies and organizations, organization of periodic
workshops, and responses to the needs of the developing
DOE Human Genome Program Management Task Group in 1992
David A. Smith, Chair Molecular biologist
Ann M. Barber Computational biologist
Benjamin J. Barnhart Geneticist
Daniel W. Drell Biologist
Gerald Goldstein Physical scientist
Murray Schulman Radiation biologist
Jay Snoddy* Molecular biologist
Marvin Stodolsky Molecular biologist
John C. Wooley Biophysicist
*On detail from Argonne National Laboratory.
Human Genome Coordinating Committee (HGCC)
Another component of the OHER management structure, HGCC was
formed in October 1988 to represent DOE genome program
researchers along with observers from other government and
private agencies (see box for list of HGCC members). Members of
the Human Genome Program Management Task Group are ex-officio
members of HGCC, and they participate in the regularly scheduled
HGCC meetings. HGCC responsibilities include the following:
* assisting OHER with overall coordination of DOE-funded
* facilitating the development and dissemination of novel
* ensuring proper management and sharing of data and samples;
* participating with other national and international efforts;
* recommending establishment of ad hoc task groups to analyze
specific areas, such as ethical, legal, and social issues;
informatics requirements; mapping and sequencing
technologies; use of the mouse as a model organism; cost of
resource distribution; and use of chromosome flow-sorting
Human Genome Coordinating Committee Members in 1992
Elbert W. Branscomb, Computational Biologist, Human Genome
Center, Lawrence Livermore National Laboratory
Charles R. Cantor, Principal Scientist, DOE Human Genome Program,
Lawrence Berkeley Laboratory
Anthony V. Carrano, Director, Human Genome Center and Leader,
Biomedical Sciences Division, Lawrence Livermore National
C. Thomas Caskey, Director, Institute for Molecular Genetics,
Baylor College of Medicine
David J. Galas, Office of Health and Environmental Research, DOE
Raymond F. Gesteland, Professor and Cochair, Department of Human
Genetics, University of Utah; Investigator, Howard Hughes Medical
Institute Laboratory for Genetic Studies at the Eccles Institute,
University of Utah
Leroy E. Hood, Director, Center for Integrated Protein and
Nucleic Acid Chemistry and Biological Computation; Director,
Cancer Center, California Institute of Technology
Robert K. Moyzis, Director, Center for Human Genome Studies, Los
Alamos National Laboratory
Jasper Rine, Director, Human Genome Center, Lawrence Berkeley
Robert J. Robbins, Director, Welch Medical Library for Applied
Research in Academic Information, Johns Hopkins University
David A. Smith, Office of Health and Environmental Research, DOE
Lloyd M. Smith, Assistant Professor, Analytical Division,
Department of Chemistry, University of Wisconsin, Madison
John C. Wooley, Office of Health and Environmental Research, DOE
HGCC Executive Officer: Sylvia J. Spengler, Deputy Director Human
Genome Center, Lawrence Berkeley Laboratory
A Principal Scientist is a member of HGCC, reports to the Human
Genome Program Task Group regarding the responsibility of keeping
the program at the leading edge of genome research, and conveys
recommendations on broad scientific policies to HGCC. Currently
serving as a Principal Scientist is Charles R. Cantor, Lawrence
Human Genome Management Information System (HGMIS)
As an aid to the DOE Human Genome Program Task Group,
communication and information services are provided by HGMIS at
Oak Ridge National Laboratory. In this role HGMIS facilitates
international communication among management and research
personnel and informs other interested persons about genome
research. HGMIS publications, such as the bimonthly newsletter
Human Genome News and technical and program reports, are
available to anyone interested in the genome project. Human
Genome News is jointly supported by OHER and the NIH National
Center for Human Genome Research (NCHGR).
Subscribers to the newsletter number over 13,000 and include
genome and basic researchers at national laboratories,
universities, and other research institutions; professors and
teachers; industry representatives; legal personnel; ethicists;
students; genetic counselors; physicians; the press; and other
interested individuals. In the first quarter of 1992, over 5000
Genome Data Base users were added to the mailing list.
Subscribers outside the United States include more than 3000
individuals and institutions in 48 countries.
Human Genome Distinguished Postdoctoral Fellowships
In 1990 OHER established the Human Genome Distinguished
Postdoctoral Research Program to support research on projects
related to the DOE Human Genome Program. The postdoctoral program
developed from a 1988 recommendation of the DOE Energy Research
Advisory Board to "increase support through expansion of the
targeted (science and engineering) graduate and postgraduate
research fellowship programs with emphasis given to
energy-related areas of greatest projected human resource
shortages." Recipients of the first fellowships, awarded in FY
1991, are listed below.
1991 DOE Human Genome Distinguished Postdoctoral Fellows*
Xiaohua Huang (Stanford University, Biophysical Chemistry)
Host: University of California, Berkeley
Ben Koop (Wayne State University, Molecular Biology and Genetics)
Host: California Institute of Technology
Carol Soderlund (New Mexico State University, Computer Science)
Host: Los Alamos National Laboratory
Harold Swerdlow (University of Utah, Bioengineering)
Host: University of Utah
*Contact: Linda Holmes: 615/576-3192, Fax: 615/576-0202.
Fellowship appointments are tenable at DOE and university
laboratories having substantial DOE-sponsored research projects
supportive of the Human Genome Program. Fellows will participate
in advanced genetics-related research, interact with outstanding
professionals, and become familiar with major issues while making
personal contributions to the program's goal of mapping and
sequencing the human genome. This interaction, involving the
exchange of ideas, skills, and technologies, will benefit the
fellow, the host laboratory, and the DOE program.
These fellowships complement the Alexander Hollaender
Distinguished Postdoctoral Fellowships initiated by OHER. The
Hollaender Fellowships, established in memory of the 1983
recipient of the prestigious DOE Enrico Fermi Award, provide
support in all areas of OHER-sponsored research. Both
postdoctoral programs are administered by Oak Ridge Associated
Universities, which is a university consortium and DOE
Reports by the Health and Environmental Research Advisory
Committee (HERAC) and the National Research Council (NRC)
recommended that national funding for the Human Genome Project
increase to a sustaining yearly level of $200 million. DOE
program expenditures were $5.5 million in FY 1987, $10.7 million
in FY 1988, $17.5 million in FY 1989, $25.9 million in FY 1990,
$46 million in FY 1991, and $59 million in FY 1992. The proposed
presidential budget for the DOE Human Genome Program in FY 1993
is $64.7 million (graph). DOE-sponsored research is conducted in
a variety of institutions (upper table). The lower table
categorizes research expenditures for FY 1992.
Types of Institutions Conducting DOE-Sponsored Genome Research
8 National laboratories
3 Other federal organizations
41 Academic institutions
10 Private-sector institutions
12 Nonacademic, commercial organizations
Human Genome Program Funds Distribution in FY 1992 (in $K)
(Commitments as of May 1, 1992)
| Organization Mapping Instrumenta Informa ELSI Totals Percent |
| Type & tion tics of |
| Sequencing Development 568001 |
| DOE Labs 23671 7559 5122 236 36588 64.4 |
| Academic 5462 3341 4528 736 14067 24.8 |
| Institutions 2173 0 602 847 3622 6.4 |
| (nonprofit) |
| NIH Labs 680 0 0 0 680 1.2 |
| Companies 1550 0 314 392 2256 3.9 |
| and SBIR2 |
| All 33536 10900 10566 2211 57213 |
| Organizations |
| [Percent [59.0] [19.2] [18.6] [3.9] [100.7]^3 |
| of 56800] |
1 Total allocation of $59 million less capital equipment funds of $2.2
2 Small Business Innovation Research grants.
3 Excess occurs because funding for genome SBIR projects is received from
the DOE-wide SBIR program, to which OHER contributes.
Joint DOE-NIH Activities
The NIH Human Genome Program, led by NIH NCHGR, has emphasized
the study of disease genes in the construction of complete
genetic and physical maps of the genomes of humans and selected
model organisms. NIH is also developing new technologies and
information systems to manage mapping and sequencing data.
In the fall of 1988 DOE and NIH began coordinating their human
genome research programs under the Memorandum of Understanding,
an outgrowth of the HERAC and NRC reports, "to foster interagency
cooperation that will enhance the human genome research
capabilities of both agencies." More information on
NCHGR-sponsored projects and infrastructure may be obtained by
contacting the NCHGR Office of Communications at 301/402-0911.
Joint DOE-NIH Subcommittee on the Human Genome in 1992
Paul Berg (PACHG) Stanford University School of Medicine
Sheldon Wolff (HERAC) University of California, San Francisco
Charles R. Cantor Lawrence Berkeley Laboratory (HGCC)
Anthony V. Carrano Lawrence Livermore National Laboratory (HGCC)
Joseph L. Goldstein University of Texas Southwestern Medical Center
Leroy E. Hood California Institute of Technology
Leonard S. Lerman Massachusetts Institute of Technology (HERAC)
Victor A. McKusick Johns Hopkins Hospital
Robert K. Moyzis Los Alamos National Laboratory (HGCC)
Maynard V. Olson Washington University School of Medicine (PACHG)
MaryLou Pardue Massachusetts Institute of Technology (HERAC)
Mark L. Pearson E. I. du Pont de Nemours & Company (PACHG)
Diane C. Smith Xerox Corporation (PACHG)
Robert T. Tjian University of California, Berkeley
Nancy S. Wexler Columbia University (PACHG)
John C. Wooley Office of Health and Environmental Research, DOE
Ex Officio Members:
David J. Galas Office of Health and Environmental Research, DOE
Mark S. Guyer National Center for Human Genome Research, NIH
Elke Jordan National Center for Human Genome Research, NIH
David A. Smith Office of Health and Environmental Research, DOE
Michael Gottesman National Center for Human Genome Research, NIH
A national plan, primarily authored by NIH and DOE, for a
coordinated multiyear research project was presented to Congress
in early 1990. Understanding Our Genetic Inheritance, The U.S.
Human Genome Project: The First Five Years (1991-1995) detailed a
comprehensive spending plan and optimal strategies for mapping
and sequencing the human genome. Referred to as the Five Year
Plan, it calls for open biannual meetings of the DOE-NIH Joint
Subcommittee on the Human Genome. The joint subcommittee invites
reports from experts, including those on national and
international genome efforts; medical genetics; and related
ethical, legal, and social issues as they pertain to data
produced in the project. The subcommittee is made up of members
from the NIH Program Advisory Committee on the Human Genome
(PACHG) and from the DOE HERAC or the HGCC members appointed by
HERAC. The subcommittee reports to its parent committees_PACHG
Many workshops and meetings have since been cosponsored by the
two agencies (see Appendix B). In addition, the Joint
Subcommittee on the Human Genome has established five joint
working groups that meet regularly to address specific areas of
genome research and make recommendations to the joint
subcommittee. The objectives of these five joint working groups,
listed below, include establishing research priorities;
identifying research, training, and technical needs; and
coordinating U.S. research activities with those of other
countries. Members of the working groups represent various
disciplines. (Membership lists of the working groups are included
in Appendix D.)
Joint Mapping Working Group. The mapping working group encourages
development and use of methodologies to integrate genetic linkage
and physical maps, meet project mapping goals, and identify
informatics needs associated with map generation and completion.
Joint Informatics Task Force (JITF). An ad hoc committee, JITF
prepared a comprehensive report on genome information needs and
data analysis tools. The report was presented to the DOE-NIH
Joint Subcommittee on the Human Genome in January 1992.
Joint Sequencing Working Group. The sequencing working group
investigates and makes recommendations on research and technology
development priorities to enable the sequencing of 3 billion
nucleotides of human DNA within 15 years.
Joint Working Group on Ethical, Legal, and Social Issues (ELSI).
ELSI identifies and addresses the social concerns that may arise
as genome technology is developed and genetic data becomes
available; stimulates bioethics research; promotes education of
professional and lay groups; and collaborates with international
groups such as the Human Genome Organization (HUGO), United
Nations Educational, Scientific, and Cultural Organization
(UNESCO), and the European Community (see next section).
Joint Working Group on the Mouse. The mouse working group was
established to develop a strategy for efficiently using the mouse
to accomplish mapping project goals as outlined in the Five Year
Plan. This strategy will take advantage of the extensive genetic
map data amassed on the mouse. Because of numerous similarities
between mouse and human genomes, these studies are considered
essential to understanding human biology and to interpreting more
complex data obtained in studies of humans.
Other U.S. Genome Research
U.S. Department of Agriculture (USDA). USDA has implemented a
Plant Genome Research Program to foster and coordinate research
on single and multigenic traits related to agricultural,
forestry, and environmental concerns. The goal of this 5-year
program is to improve plant varieties by locating important genes
and markers on chromosomes, determining gene structure, and
transferring genes to improve the performance of economically
important crops such as corn, wheat, soybeans, and pine. Use of
these "molecular breeding" techniques will increase U.S.
competitiveness in the world marketplace.
National Science Foundation (NSF). NSF coordinates an interagency
research effort to map and sequence the small genome of
Arabidopsis thaliana, a simple weed that provides an ideal model
for studying plant biochemistry, genetics, and physiology.
Knowledge of the function of every Arabidopsis gene will be
applicable to the understanding and manipulation of higher plants
and to genome research in general. These studies are also
supported by DOE, NIH, and USDA as part of their own genome
initiatives, and the four agencies coordinate their Arabidopsis
activities. NSF also has instrumentation, computational, and
informatics programs that support genomics research, in addition
to individual awards in genetics and molecular biology.
Howard Hughes Medical Institute (HHMI). HHMI, a private medical
research organization, contributes to the genome effort through
its support of biomedical research primarily at university
molecular biology and genetics laboratories. In addition, HHMI
has cosponsored several genomics conferences and, between 1985
and September 1991, supported the collection and dissemination of
genome mapping data through a network of databases.
Genomic research is being carried out in countries throughout the
world. The two international organizations described on the next
two pages are working to coordinate and facilitate national
efforts. HUGO includes a number of DOE and NIH genome
investigators and administrators. HUGO and UNESCO have been
informed of dedicated genome programs in the following nations
and international agencies: Commonwealth of Independent States
(formerly U.S.S.R.), Denmark, European Community, France,
Germany, Hungary, Italy, Japan, Netherlands, United Kingdom, and
HUGO: Worldwide Genome Research Coordination
HUGO, formed by scientists to coordinate worldwide genome mapping
and sequencing, now has regional offices in the United States
(Bethesda, Maryland) and Europe (London) and a satellite office
in Moscow. A Pacific office is under development in Osaka, Japan.
HUGO offices were funded initially by several charitable
organizations. In 1990 HHMI awarded HUGO a 4-year, $1 million
grant to support the HUGO Americas office; in that same year The
Wellcome Trust provided a 3-year grant, with the first year's
funds amounting to over $400,000, to assist with activities in
the European office. The Imperial Cancer Research Fund (U.K.)
provides support for the HUGO president's office, and the Osaka
office has received private support as well. To support future
activities, HUGO directors intend to raise funds from various
countries that have active genome research programs.
HUGO members are elected; there are over 400 members from 32
countries. The international officers in 1992: Sir Walter Bodmer
(United Kingdom), President; Charles R. Cantor (United States),
Vice-President; Andrei Mirzabekov (Russia), Vice-President;
Kenichi Matsubara (Japan), Vice-President; Bronwen Loder (United
Kingdom), Secretary; and Robert Sparkes (United States),
Treasurer. Each office operates with its own trustees.
The objectives of HUGO include
* fostering collaboration to avoid unnecessary competition or
duplication of effort and to coordinate human genome
research with model organism studies;
* coordinating exchanges of relevant data and materials;
* educating researchers and the public on the scientific,
ethical, social, legal, and commercial implications of the
* acting as a clearinghouse for genome-related information,
such as relevant conferences, worldwide genome programs and
researchers, and database and material availability. A
training program may be initiated to encourage the spread of
new and promising technologies.
HUGO has established expert international ad hoc advisory
committees on mapping workshops and databases, informatics,
ethics, mouse mapping, and intellectual property and ownership.
Single-chromosome workshops are crucial to the success of the
Human Genome Project. Working with the funding agencies, HUGO is
playing a central role in the coordinated development of such
meetings and has assisted in planning workshops for chromosomes
2, 3, 13, 16, 19, and X in 1992. HUGO expects to work with the
scientific community to select workshop chairs and to assist in
fundraising and organizing and running these and future meetings.
Chromosome workshops and other meetings are listed in Appendix B.
UNESCO: Promoting the Interests of Developing Countries
A UNESCO Human Genome Program was approved for 1990-91 at the
25th session of the UNESCO General Conference. Attendees
concluded that full knowledge of the human genome is vitally
important and that UNESCO could be influential in stimulating
governments and agencies to support coordinated programs. UNESCO
expects to play a key role in promoting the interests of
developing countries. The Scientific Coordinating Committee
(SCC), composed of 13 scientists, plans and implements the
program, which was budgeted at $350,000 for the first year; SCC
members include representatives selected from geographic regions
and from international genome organizations such as HUGO. Members
of SCC and of the UNESCO Secretariat agreed that UNESCO will
concentrate its activities on access to and use of data obtained
from human genome mapping and sequencing research, as well as on
related ethical and social issues.
UNESCO emphasizes the use of training programs as one of the best
means of obtaining cooperation and diminishing the gap between
developed and developing countries. The Third World Academy of
Sciences (TWAS) joined UNESCO in sponsoring a training program
that provided 19 fellowships in 1991 to awardees from Algeria,
Argentina, Cameroon, Chile, China, Costa Rica, Cyprus,
Czechoslovakia, Egypt, Guinea, India, Indonesia, Myanmar, the
Republic of Korea, Peru, Spain, Ukraine, Russia, and Yugoslavia.
The 1- to 3-month fellowships enable scientists from developing
countries to carry out research in well-established scientific
centers and to learn new research techniques. UNESCO and TWAS are
also jointly compiling a directory to identify third-world genome
researchers and their needs.
To avoid overlap with other genome projects, UNESCO focuses on
communication among countries about major trends and regional
efforts, one of which, the Latin American Human Genome Program,
was established during a UNESCO-supported symposium in Chile in
1990. The first annual UNESCO South-North Human Genome Conference
was held in 1992 in Caxambu, Brazil, to increase interaction
between scientists from developed countries and those of the
third world. The second conference is planned for Thailand in
1993, and the third will probably take place in China in 1994.
Appendix A: Primer on Molecular Genetics
Appendix B: Conferences, Meetings, and Workshops Sponsored by
Appendix C: Members of the DOE Health and Environmental
Research Advisory Committee
Appendix D: Members of DOE-NIH Joint Working Groups
Appendix E: Glossary
Appendix A: Primer on Molecular Genetics
Revised and expanded by Denise Casey (HGMIS) from the primer
contributed by Charles Cantor and Sylvia Spengler (Lawrence
Berkeley Laboratory) and published in the Human Genome 1989_90
Mapping and Sequencing the Human Genome
Genetic Linkage Maps
Low-Resolution Physical Mapping
High-Resolution Physical Mapping
Macrorestriction maps: Top-down mapping
Contig maps: Bottom-up mapping
Current Sequencing Technologies
Sequencing Technologies Under Development
Partial Sequencing to Facilitate Mapping, Gene Identification
End Games: Completing Maps and Sequences; Finding Specific Genes
Model Organism Research
Informatics: Data Collection and Interpretation
Collecting and Storing Data
Nucleic Acids (DNA and RNA)
Impact of the Human Genome Project
The complete set of instructions for making an organism is called
its genome. It contains the master blueprint for all cellular
structures and activities for the lifetime of the cell or
organism. Found in every nucleus of a person's many trillions of
cells, the human genome consists of tightly coiled threads of
deoxyribonucleic acid (DNA) and associated protein molecules,
organized into structures called chromosomes (Fig. 1).
If unwound and tied together, the strands of DNA would stretch
more than 5 feet but would be only 50 trillionths of an inch
wide. For each organism, the components of these slender threads
encode all the information necessary for building and maintaining
life, from simple bacteria to remarkably complex human beings.
Understanding how DNA performs this function requires some
knowledge of its structure and organization.
In humans, as in other higher organisms, a DNA molecule consists
of two strands that wrap around each other to resemble a twisted
ladder whose sides, made of sugar and phosphate molecules, are
connected by "rungs" of nitrogen-containing chemicals called
bases. Each strand is a linear arrangement of repeating similar
units called nucleotides, which are each composed of one sugar,
one phosphate, and a nitrogenous base (Fig. 2). Four different
bases are present in DNA_adenine (A), thymine (T), cytosine (C),
and guanine (G). The particular order of the bases arranged along
the sugar-phosphate backbone is called the DNA sequence; the
sequence specifies the exact genetic instructions required to
create a particular organism with its own unique traits.
The two DNA strands are held together by weak bonds between the
bases on each strand, forming base pairs (bp). Genome size is
usually stated as the total number of base pairs; the human
genome contains roughly 3_billion bp (Fig. 3).
Each time a cell divides into two daughter cells, its full genome
is duplicated; for humans and other complex organisms, this
duplication occurs in the nucleus. During cell division the DNA
molecule unwinds and the weak bonds between the base pairs break,
allowing the strands to separate. Each strand directs the
synthesis of a complementary new strand, with free nucleotides
matching up with their complementary bases on each of the
separated strands. Strict base-pairing rules are adhered
to_adenine will pair only with thymine (an A-T pair) and cytosine
with guanine (a C-G pair). Each daughter cell receives one old
and one new DNA strand (Figs. 1 and 4). The cell's adherence to
these base-pairing rules ensures that the new strand is an exact
copy of the old one. This minimizes the incidence of errors
(mutations) that may greatly affect the resulting organism or its
Each DNA molecule contains many genes_the basic physical and
functional units of heredity. A gene is a specific sequence of
nucleotide bases, whose sequences carry the information required
for constructing proteins, which provide the structural
components of cells and tissues as well as enzymes for essential
biochemical reactions. The human genome is estimated to comprise
at least 100,000 genes.
Human genes vary widely in length, often extending over thousands
of bases, but only about 10% of the genome is known to include
the protein-coding sequences (exons) of genes. Interspersed
within many genes are intron sequences, which have no coding
function. The balance of the genome is thought to consist of
other noncoding regions (such as control sequences and intergenic
regions), whose functions are obscure. All living organisms are
composed largely of proteins; humans can synthesize at least
100,000 different kinds. Proteins are large, complex molecules
made up of long chains of subunits called amino acids. Twenty
different kinds of amino acids are usually found in proteins.
Within the gene, each specific sequence of three DNA bases
(codons) directs the cell's protein-synthesizing machinery to add
specific amino acids. For example, the base sequence ATG codes
for the amino acid methionine. Since 3 bases code for 1_amino
acid, the protein coded by an average-sized gene (3000 bp) will
contain 1000 amino acids. The genetic code is thus a series of
codons that specify which amino acids are required to make up
The protein-coding instructions from the genes are transmitted
indirectly through messenger ribonucleic acid (mRNA), a transient
intermediary molecule similar to a single strand of DNA. For the
information within a gene to be expressed, a complementary RNA
strand is produced (a process called transcription) from the DNA
template in the nucleus. This mRNA is moved from the nucleus to
the cellular cytoplasm, where it serves as the template for
protein synthesis. The cell's protein-synthesizing machinery then
translates the codons into a string of amino acids that will
constitute the protein molecule for which it codes (Fig. 5). In
the laboratory, the mRNA molecule can be isolated and used as a
template to synthesize a complementary DNA (cDNA) strand, which
can then be used to locate the corresponding genes on a
chromosome map. The utility of this strategy is described in the
section on physical mapping, p. 201.
The 3 billion bp in the human genome are organized into 24
distinct, physically separate microscopic units called
chromosomes. All genes are arranged linearly along the
chromosomes. The nucleus of most human cells contains 2 sets of
chromosomes, 1 set given by each parent. Each set has 23 single
chromosomes_22 autosomes and an X or Y sex chromosome. (A normal
female will have a pair of X chromosomes; a male will have an X
and Y pair.) Chromosomes contain roughly equal parts of protein
and DNA; chromosomal DNA contains an average of 150 million
bases. DNA molecules are among the largest molecules now known.
Chromosomes can be seen under a light microscope and, when
stained with certain dyes, reveal a pattern of light and dark
bands reflecting regional variations in the amounts of A and T vs
G and C. Differences in size and banding pattern allow the 24
chromosomes to be distinguished from each other, an analysis
called a karyotype. A few types of major chromosomal
abnormalities, including missing or extra copies of a chromosome
or gross breaks and rejoinings (translocations), can be detected
by microscopic examination; Down's syndrome, in which an
individual's cells contain a third copy of chromosome 21, is
diagnosed by karyotype analysis (Fig. 6). Most changes in DNA,
however, are too subtle to be detected by this technique and
require molecular analysis. These subtle DNA abnormalities
(mutations) are responsible for many inherited diseases such as
cystic fibrosis and sickle cell anemia or may predispose an
individual to cancer, major psychiatric illnesses, and other
Mapping and Sequencing the Human Genome
A primary goal of the Human Genome Project is to make a series of
descriptive diagrams _maps_of each human chromosome at
increasingly finer resolutions. Mapping involves (1) dividing the
chromosomes into smaller fragments that can be propagated and
characterized and (2) ordering (mapping) them to correspond to
their respective locations on the chromosomes. After mapping is
completed, the next step is to determine the base sequence of
each of the ordered DNA fragments. The ultimate goal of genome
research is to find all the genes in the DNA sequence and to
develop tools for using this information in the study of human
biology and medicine. Improving the instrumentation and
techniques required for mapping and sequencing_a major focus of
the genome project_will increase efficiency and
cost-effectiveness. Goals include automating methods and
optimizing techniques to extract the maximum useful information
from maps and sequences.
A genome map describes the order of genes or other markers and
the spacing between them on each chromosome. Human genome maps
are constructed on several different scales or levels of
resolution. At the coarsest resolution are genetic linkage maps,
which depict the relative chromosomal locations of DNA markers
(genes and other identifiable DNA sequences) by their patterns of
inheritance. Physical maps describe the chemical characteristics
of the DNA molecule itself.
Geneticists have already charted the approximate positions of
over 2300 genes, and a start has been made in establishing
high-resolution maps of the genome (Fig. 7). More-precise maps
are needed to organize systematic sequencing efforts and plan new
Genetic Linkage Maps
A genetic linkage map shows the relative locations of specific
DNA markers along the chromosome. Any inherited physical or
molecular characteristic that differs among individuals and is
easily detectable in the laboratory is a potential genetic
marker. Markers can be expressed DNA regions (genes) or DNA
segments that have no known coding function but whose inheritance
pattern can be followed. DNA sequence differences are especially
useful markers because they are plentiful and easy to
Markers must be polymorphic to be useful in mapping; that is,
alternative forms must exist among individuals so that they are
detectable among different members in family studies.
Polymorphisms are variations in DNA sequence that occur on
average once every 300 to 500 bp. Variations within exon
sequences can lead to observable changes, such as differences in
eye color, blood type, and disease susceptibility. Most
variations occur within introns and have little or no effect on
an organism's appearance or function, yet they are detectable at
the DNA level and can be used as markers. Examples of these types
of markers include (1)_restriction fragment length polymorphisms
(RFLPs), which reflect sequence variations in DNA sites that can
be cleaved by DNA restriction enzymes (see box, p. 203), and
(2)_variable number of tandem repeat sequences, which are short
repeated sequences that vary in the number of repeated units and,
therefore, in length (a characteristic easily measured). The
human genetic linkage map is constructed by observing how
frequently two markers are inherited together.
Two markers located near each other on the same chromosome will
tend to be passed together from parent to child. During the
normal production of sperm and egg cells, DNA strands
occasionally break and rejoin in different places on the same
chromosome or on the other copy of the same chromosome (i.e., the
homologous chromosome). This process (called meiotic
recombination) can result in the separation of two markers
originally on the same chromosome (Fig. 8). The closer the
markers are to each other_the more "tightly linked"_the less
likely a recombination event will fall between and separate them.
Recombination frequency thus provides an estimate of the distance
between two markers.
On the genetic map, distances between markers are measured in
terms of centimorgans (cM), named after the American geneticist
Thomas Hunt Morgan. Two markers are said to be 1_cM apart if they
are separated by recombination 1% of the time. A genetic distance
of 1_cM is roughly equal to a physical distance of 1 million bp
(1 Mb). The current resolution of most human genetic map regions
is about 10 Mb.
The value of the genetic map is that an inherited disease can be
located on the map by following the inheritance of a DNA marker
present in affected individuals (but absent in unaffected
individuals), even though the molecular basis of the disease may
not yet be understood nor the responsible gene identified.
Genetic maps have been used to find the exact chromosomal
location of several important disease genes, including cystic
fibrosis, sickle cell disease, Tay-Sachs disease, fragile X
syndrome, and myotonic dystrophy.
One short-term goal of the genome project is to develop a
high-resolution genetic map (2 to 5_cM); recent consensus maps of
some chromosomes have averaged 7 to 10_cM between genetic
markers. Genetic mapping resolution has been increased through
the application of recombinant DNA technology, including in vitro
radiation-induced chromosome fragmentation and cell fusions
(joining human cells with those of other species to form hybrid
cells) to create panels of cells with specific and varied human
chromosomal components. Assessing the frequency of marker sites
remaining together after radiation-induced DNA fragmentation can
establish the order and distance between the markers. Because
only a single copy of a chromosome is required for analysis, even
nonpolymorphic markers are useful in radiation hybrid mapping.
[In meiotic mapping (described above), two copies of a chromosome
must be distinguished from each other by polymorphic markers.]
Restriction Enzymes: Microscopic Scalpels
Isolated from various bacteria, restriction enzymes
recognize short DNA sequences and cut the DNA molecules at
those specific sites. (A natural biological function of
these enzymes is to protect bacteria by attacking viral and
other foreign DNA.) Some restriction enzymes (rare-cutters)
cut the DNA very infrequently, generating a small number of
very large fragments (several thousand to a million bp).
Most enzymes cut DNA more frequently, thus generating a
large number of small fragments (less than a hundred to more
than a thousand bp).
On average, restriction enzymes with
* 4-base recognition sites will yield pieces 256 bases long,
* 6-base recognition sites will yield pieces 4000 bases long, and
* 8-base recognition sites will yield pieces 64,000 bases long.
Since hundreds of different restriction enzymes have been
characterized, DNA can be cut into many different small
Different types of physical maps vary in their degree of
resolution. The lowest-resolution physical map is the chromosomal
(sometimes called cytogenetic) map, which is based on the
distinctive banding patterns observed by light microscopy of
stained chromosomes. A cDNA map shows the locations of expressed
DNA regions (exons) on the chromosomal map. The more detailed
cosmid contig map depicts the order of overlapping DNA fragments
spanning the genome. A macrorestriction map describes the order
and distance between enzyme cutting (cleavage) sites. The
highest-resolution physical map is the complete elucidation of
the DNA base-pair sequence of each chromosome in the human
genome. Physical maps are described in greater detail below.
Low-Resolution Physical Mapping
Chromosomal map. In a chromosomal map, genes or other
identifiable DNA fragments are assigned to their respective
chromosomes, with distances measured in base pairs. These markers
can be physically associated with particular bands (identified by
cytogenetic staining) primarily by in situ hybridization, a
technique that involves tagging the DNA marker with an observable
label (e.g., one that fluoresces or is radioactive). The location
of the labeled probe can be detected after it binds to its
complementary DNA strand in an intact chromosome.
As with genetic linkage mapping, chromosomal mapping can be used
to locate genetic markers defined by traits observable only in
whole organisms. Because chromosomal maps are based on estimates
of physical distance, they are considered to be physical maps.
The number of base pairs within a band can only be estimated.
Until recently, even the best chromosomal maps could be used to
locate a DNA fragment only to a region of about 10 Mb, the size
of a typical band seen on a chromosome. Improvements in
fluorescence in situ hybridization (FISH) methods allow
orientation of DNA sequences that lie as close as 2 to 5 Mb.
Modifications to in situ hybridization methods, using chromosomes
at a stage in cell division (interphase) when they are less
compact, increase map resolution to around 100,000 bp. Further
banding refinement might allow chromosomal bands to be associated
with specific amplified DNA fragments, an improvement that could
be useful in analyzing observable physical traits associated with
cDNA map. A cDNA map shows the positions of expressed DNA regions
(exons) relative to particular chromosomal regions or bands.
(Expressed DNA regions are those transcribed into mRNA.) cDNA is
synthesized in the laboratory using the mRNA molecule as a
template; base-pairing rules are followed (i.e., an A on the mRNA
molecule will pair with a T on the new DNA strand). This cDNA can
then be mapped to genomic regions.
Because they represent expressed genomic regions, cDNAs are
thought to identify the parts of the genome with the most
biological and medical significance. A cDNA map can provide the
chromosomal location for genes whose functions are currently
unknown. For disease-gene hunters, the map can also suggest a set
of candidate genes to test when the approximate location of a
disease gene has been mapped by genetic linkage techniques.
High-Resolution Physical Mapping
The two current approaches to high-resolution physical mapping
are termed "top-down" (producing a macrorestriction map) and
"bottom-up" (resulting in a contig map). With either strategy
(described below) the maps represent ordered sets of DNA
fragments that are generated by cutting genomic DNA with
restriction enzymes (see previously discussed Restriction
Enzymes). The fragments are then amplified by cloning or by
polymerase chain reaction (PCR) methods (see DNA Amplification
below). Electrophoretic techniques are used to separate the
fragments according to size into different bands, which can be
visualized by direct DNA staining or by hybridization with DNA
probes of interest. The use of purified chromosomes separated
either by flow sorting from human cell lines or in hybrid cell
lines allows a single chromosome to be mapped (see Separating
A number of strategies can be used to reconstruct the original
order of the DNA fragments in the genome. Many approaches make
use of the ability of single strands of DNA and/or RNA to
hybridize_to form double-stranded segments by hydrogen bonding
between complementary bases. The extent of sequence homology
between the two strands can be inferred from the length of the
double-stranded segment. Fingerprinting uses restriction map data
to determine which fragments have a specific sequence
(fingerprint) in common and therefore overlap. Another approach
uses linking clones as probes for hybridization to chromosomal
DNA cut with the same restriction enzyme.
Macrorestriction maps: Top-down mapping. In top-down mapping, a
single chromosome is cut (with rare-cutter restriction enzymes)
into large pieces, which are ordered and subdivided; the smaller
pieces are then mapped further. The resulting macro-restriction
maps depict the order of and distance between sites at which
rare-cutter enzymes cleave (Fig. 9a). This approach yields maps
with more continuity and fewer gaps between fragments than contig
maps (see below), but map resolution is lower and may not be
useful in finding particular genes; in addition, this strategy
generally does not produce long stretches of mapped sites.
Currently, this approach allows DNA pieces to be located in
regions measuring about 100,000 bp to 1_Mb.
The development of pulsed-field gel (PFG) electrophoretic methods
has improved the mapping and cloning of large DNA molecules.
While conventional gel electrophoretic methods separate pieces
less than 40 kb (1 kb = 1000 bases) in size, PFG separates
molecules up to 10 Mb, allowing the application of both
conventional and new mapping methods to larger genomic regions.
Contig maps: Bottom-up mapping. The bottom-up approach involves
cutting the chromosome into small pieces, each of which is cloned
and ordered. The ordered fragments form contiguous DNA blocks
(contigs). Currently, the resulting "library" of clones varies in
size from 10,000 bp to 1 Mb (Fig. 9b). An advantage of this
approach is the accessibility of these stable clones to other
researchers. Contig construction can be verified by FISH, which
localizes cosmids to specific regions within chromosomal bands.
Contig maps thus consist of a linked library of small overlapping
clones representing a complete chromosomal segment. While useful
for finding genes localized to a small area (under 2 Mb), contig
maps are difficult to extend over large stretches of a chromosome
because all regions are not clonable. DNA probe techniques can be
used to fill in the gaps, but they are time consuming. Figure 10
is a diagram relating the different types of maps.
Technological improvements now make possible the cloning of large
DNA pieces, using artificially constructed chromosome vectors
that carry human DNA fragments as large as 1 Mb. These vectors
are maintained in yeast cells as artificial chromosomes (YACs).
(For more explanation, see DNA Amplification below) Before YACs
were developed, the largest cloning vectors (cosmids) carried
inserts of only 20 to 40 kb. YAC methodology drastically reduces
the number of clones to be ordered; many YACs span entire human
genes. A more detailed map of a large YAC insert can be produced
by subcloning, a process in which fragments of the original
insert are cloned into smaller-insert vectors. Because some YAC
regions are unstable, large-capacity bacterial vectors (i.e.,
those that can accommodate large inserts) are also being
Pioneered at Los Alamos National Laboratory (LANL), flow
sorting employs flow cytometry to separate, according to
size, chromosomes isolated from cells during cell division
when they are condensed and stable. As the chromosomes flow
singly past a laser beam, they are differentiated by
analyzing the amount of DNA present, and individual
chromosomes are directed to specific collection tubes.
Somatic cell hybridization
In somatic cell hybridization, human cells and rodent tumor
cells are fused (hybridized); over time, after the
chromosomes mix, human chromosomes are preferentially lost
from the hybrid cell until only one or a few remain. Those
individual hybrid cells are then propagated and maintained
as cell lines containing specific human chromosomes.
Improvements to this technique have generated a number of
hybrid cell lines, each with a specific single human
The ultimate physical map of the human genome is the complete DNA
sequence_the determination of all base pairs on each chromosome.
The completed map will provide biologists with a Rosetta stone
for studying human biology and enable medical researchers to
begin to unravel the mechanisms of inherited diseases. Much
effort continues to be spent locating genes; if the full sequence
were known, emphasis could shift to determining gene function.
The Human Genome Project is creating research tools for
21st-century biology, when the goal will be to understand the
sequence and functions of the genes residing therein.
Achieving the goals of the Human Genome Project will require
substantial improvements in the rate, efficiency, and reliability
of standard sequencing procedures. While technological advances
are leading to the automation of standard DNA purification,
separation, and detection steps, efforts are also focusing on the
development of entirely new sequencing methods that may eliminate
some of these steps. Sequencing procedures currently involve
first subcloning DNA fragments from a cosmid or bacteriophage
library into special sequencing vectors that carry shorter pieces
of the original cosmid fragments (Fig. 11). The next step is to
make the subcloned fragments into sets of nested fragments
differing in length by one nucleotide, so that the specific base
at the end of each successive fragment is detectable after the
fragments have been separated by gel electrophoresis. Current
sequencing technologies are discussed later.
DNA Amplification: Cloning and Polymerase Chain Reaction
Cloning (in vivo DNA amplification)
Cloning involves the use of recombinant DNA technology to
propagate DNA fragments inside a foreign host. The fragments are
usually isolated from chromosomes using restriction enzymes and
then united with a carrier (a vector). Following introduction
into suitable host cells, the DNA fragments can then be
reproduced along with the host cell DNA. Vectors are DNA
molecules originating from viruses, bacteria, and yeast cells.
They accommodate various sizes of foreign DNA fragments ranging
from 12,000 bp for bacterial vectors (plasmids and cosmids) to 1
Mb for yeast vectors (yeast artificial chromosomes). Bacteria are
most often the hosts for these inserts, but yeast and mammalian
cells are also used.
Cloning procedures provide unlimited material for experimental
study. A random (unordered) set of cloned DNA fragments is called
a library. Genomic libraries are sets of overlapping fragments
encompassing an entire genome. Also available are
chromosome-specific libraries, which consist of fragments derived
from source DNA enriched for a particular chromosome. (See
Separating Chromosomes, above.)
PCR (in vitro DNA amplification)
Described as being to genes what Gutenberg's printing press was
to the written word, PCR can amplify a desired DNA sequence of
any origin (virus, bacteria, plant, or human) hundreds of
millions of times in a matter of hours, a task that would have
required several days with recombinant technology. PCR is
especially valuable because the reaction is highly specific,
easily automated, and capable of amplifying minute amounts of
sample. For these reasons, PCR has also had a major impact on
clinical medicine, genetic disease diagnostics, forensic science,
and evolutionary biology.
PCR is a process based on a specialized polymerase enzyme, which
can synthesize a complementary strand to a given DNA strand in a
mixture containing the 4 DNA bases and 2 DNA fragments (primers,
each about 20 bases long) flanking the target sequence. The
mixture is heated to separate the strands of double-stranded DNA
containing the target sequence and then cooled to allow (1) the
primers to find and bind to their complementary sequences on the
separated strands and (2) the polymerase to extend the primers
into new complementary strands. Repeated heating and cooling
cycles multiply the target DNA exponentially, since each new
double strand separates to become two templates for further
synthesis. In about 1 hour, 20 PCR cycles can amplify the target
by a millionfold.
Current Sequencing Technologies
The two basic sequencing approaches, Maxam-Gilbert and Sanger,
differ primarily in the way the nested DNA fragments are
produced. Both methods work because gel electrophoresis produces
very high resolution separations of DNA molecules; even fragments
that differ in size by only a single nucleotide can be resolved.
Almost all steps in these sequencing methods are now automated.
Maxam-Gilbert sequencing (also called the chemical degradation
method) uses chemicals to cleave DNA at specific bases, resulting
in fragments of different lengths. A refinement to the
Maxam-Gilbert method known as multiplex sequencing enables
investigators to analyze about 40 clones on a single DNA
sequencing gel. Sanger sequencing (also called the chain
termination or dideoxy method) involves using an enzymatic
procedure to synthesize DNA chains of varying length in four
different reactions, stopping the DNA replication at positions
occupied by one of the four bases, and then determining the
resulting fragment lengths (Fig. 12).
These first-generation gel-based sequencing technologies are now
being used to sequence small regions of interest in the human
genome. Although investigators could use existing technology to
sequence whole chromosomes, time and cost considerations make
large-scale sequencing projects of this nature impractical. The
smallest human chromosome (Y) contains 50 Mb; the largest
(chromosome 1) has 250 Mb. The largest continuous DNA sequence
obtained thus far, however, is approximately 350,000 bp, and the
best available equipment can sequence only 50,000 to 100,000
bases per year at an approximate cost of $1 to $2 per base. At
that rate, an unacceptable 30,000 work-years and at least
$3_billion would be required for sequencing alone.
Sequencing Technologies Under Development
A major focus of the Human Genome Project is the development of
automated sequencing technology that can accurately sequence
100,000 or more bases per day at a cost of less than $.50 per
base. Specific goals include the development of sequencing and
detection schemes that are faster and more sensitive, accurate,
and economical. Many novel sequencing technologies are now being
explored, and the most promising ones will eventually be
optimized for widespread use.
Second-generation (interim) sequencing technologies will enable
speed and accuracy to increase by an order of magnitude (i.e., 10
times greater) while lowering the cost per base. Some important
disease genes will be sequenced with such technologies as (1)
high-voltage capillary and ultrathin electrophoresis to increase
fragment separation rate and (2) use of resonance ionization
spectroscopy to detect stable isotope labels.
Third-generation gel-less sequencing technologies, which aim to
increase efficiency by several orders of magnitude, are expected
to be used for sequencing most of the human genome. These
developing technologies include (1) enhanced fluorescence
detection of individual labeled bases in flow cytometry, (2)
direct reading of the base sequence on a DNA strand with the use
of scanning tunneling or atomic force microscopies, (3) enhanced
mass spectrometric analysis of DNA sequence, and (4) sequencing
by hybridization to short panels of nucleotides of known
sequence. Pilot large-scale sequencing projects will provide
opportunities to improve current technologies and will reveal
challenges investigators may encounter in larger-scale efforts.
Partial Sequencing To Facilitate Mapping, Gene Identification
Correlating mapping data from different laboratories has been a
problem because of differences in generating, isolating, and
mapping DNA fragments. A common reference system designed to meet
these challenges uses partially sequenced unique regions (200 to
500 bp) to identify clones, contigs, and long stretches of
sequence. Called sequence tagged sites (STSs), these short
sequences have become standard markers for physical mapping.
Because coding sequences of genes represent most of the
potentially useful information content of the genome (but are
only a fraction of the total DNA), some investigators have begun
partial sequencing of cDNAs instead of random genomic DNA. (cDNAs
are derived from mRNA sequences, which are the transcription
products of expressed genes.) In addition to providing unique
markers, these partial sequences [termed expressed sequence tags
(ESTs)] also identify expressed genes. This strategy can thus
provide a means of rapidly identifying most human genes. Other
applications of the EST approach include determining locations of
genes along chromosomes and identifying coding regions in genomic
End Games: Completing Maps and Sequences; Finding Specific Genes
Starting maps and sequences is relatively simple; finishing them
will require new strategies or a combination of existing methods.
After a sequence is determined using the methods described above,
the task remains to fill in the many large gaps left by current
mapping methods. One approach is single-chromosome
microdissection, in which a piece is physically cut from a
chromosomal region of particular interest, broken up into smaller
pieces, and amplified by PCR or cloning (see DNA Amplification
above). These fragments can then be mapped and sequenced by the
methods previously described.
Chromosome walking, one strategy for filling in gaps, involves
hybridizing a primer of known sequence to a clone from an
unordered genomic library and synthesizing a short complementary
strand (called "walking" along a chromosome). The complementary
strand is then sequenced and its end used as the next primer for
further walking; in this way the adjacent, previously unknown,
region is identified and sequenced. The chromosome is thus
systematically sequenced from one end to the other. Because
primers must be synthesized chemically, a disadvantage of this
technique is the large number of different primers needed to walk
a long distance. Chromosome walking is also used to locate
specific genes by sequencing the chromosomal segments between
markers that flank the gene of interest (Fig. 13).
The current human genetic map has about 1000 markers, or 1 marker
spaced every 3_million bp; an estimated 100 genes lie between
each pair of markers. Higher-resolution genetic maps have been
made in regions of particular interest. New genes can be located
by combining genetic and physical map information for a region.
The genetic map basically describes gene order. Rough information
about gene location is sometimes available also, but these data
must be used with caution because recombination is not equally
likely at all places on the chromosome. Thus the genetic map,
compared to the physical map, stretches in some places and
compresses in others, as though it were drawn on a rubber band.
The degree of difficulty in finding a disease gene of interest
depends largely on what information is already known about the
gene and, especially, on what kind of DNA alterations cause the
disease. Spotting the disease gene is very difficult when disease
results from a single altered DNA base; sickle cell anemia is an
example of such a case, as are probably most major human
inherited diseases. When disease results from a large DNA
rearrangement, this anomaly can usually be detected as
alterations in the physical map of the region or even by direct
microscopic examination of the chromosome. The location of these
alterations pinpoints the site of the gene.
Identifying the gene responsible for a specific disease without a
map is analogous to finding a needle in a haystack. Actually,
finding the gene is even more difficult, because even close up,
the gene still looks like just another piece of hay. However,
maps give clues on where to look; the finer the map's resolution,
the fewer pieces of hay to be tested.
Once the neighborhood of a gene of interest has been identified,
several strategies can be used to find the gene itself. An
ordered library of the gene neighborhood can be constructed if
one is not already available. This library provides DNA fragments
that can be screened for additional polymorphisms, improving the
genetic map of the region and further restricting the possible
gene location. In addition, DNA fragments from the region can be
used as probes to search for DNA sequences that are expressed
(transcribed to RNA) or conserved among individuals. Most genes
will have such sequences. Then individual gene candidates must be
examined. For example, a gene responsible for liver disease is
likely to be expressed in the liver and less likely in other
tissues or organs. This type of evidence can further limit the
search. Finally, a suspected gene may need to be sequenced in
both healthy and affected individuals. A consistent pattern of
DNA variation when these two samples are compared will show that
the gene of interest has very likely been found. The ultimate
proof is to correct the suspected DNA alteration in a cell and
show that the cell's behavior reverts to normal.
Model Organism Research
Most mapping and sequencing technologies were developed from
studies of nonhuman genomes, notably those of the bacterium
Escherichia coli, the yeast Saccharomyces cerevisiae, the fruit
fly Drosophila melanogaster, the roundworm Caenorhabditis
elegans, and the laboratory mouse Mus musculus. These simpler
systems provide excellent models for developing and testing the
procedures needed for studying the much more complex human
A large amount of genetic information has already been derived
from these organisms, providing valuable data for the analysis of
normal gene regulation, genetic diseases, and evolutionary
processes. Physical maps have been completed for E. coli, and
extensive overlapping clone sets are available for S. cerevisiae
and C. elegans. In addition, sequencing projects have been
initiated by the NIH genome program for E. coli, S. cerevisiae,
and C. elegans.
Mouse genome research will provide much significant comparative
information because of the many biological and genetic
similarities between mouse and man. Comparisons of human and
mouse DNA sequences will reveal areas that have been conserved
during evolution and are therefore important. An extensive
database of mouse DNA sequences will allow counterparts of
particular human genes to be identified in the mouse and
extensively studied. Conversely, information on genes first found
to be important in the mouse will lead to associated human
studies. The mouse genetic map, based on morphological markers,
has already led to many insights into human biology. Mouse models
are being developed to explore the effects of mutations causing
human diseases, including diabetes, muscular dystrophy, and
several cancers. A genetic map based on DNA markers is presently
being constructed, and a physical map is planned to allow direct
comparison with the human physical map.
Informatics: Data Collection and Interpretation
Collecting and Storing Data
The reference map and sequence generated by genome research will
be used as a primary information source for human biology and
medicine far into the future. The vast amount of data produced
will first need to be collected, stored, and distributed. If
compiled in books, the data would fill an estimated 200 volumes
the size of a Manhattan telephone book (at 1000 pages each), and
reading it would require 26 years working around the clock (Fig.
Because handling this amount of data will require extensive use
of computers, database development will be a major focus of the
Human Genome Project. The present challenge is to improve
database design, software for database access and manipulation,
and data-entry procedures to compensate for the varied computer
procedures and systems used in different laboratories. Databases
need to be designed that will accurately represent map
information (linkage, STSs, physical location, disease loci) and
sequences (genomic, cDNAs, proteins) and link them to each other
and to bibliographic text databases of the scientific and medical
New tools will also be needed for analyzing the data from genome
maps and sequences. Recognizing where genes begin and end and
identifying their exons, introns, and regulatory sequences may
require extensive comparisons with sequences from related species
such as the mouse to search for conserved similarities
(homologies). Searching a database for a particular DNA sequence
may uncover these homologous sequences in a known gene from a
model organism, revealing insights into the function of the
corresponding human gene.
Correlating sequence information with genetic linkage data and
disease gene research will reveal the molecular basis for human
variation. If a newly identified gene is found to code for a
flawed protein, the altered protein must be compared with the
normal version to identify the specific abnormality that causes
disease. Once the error is pinpointed, researchers must try to
determine how to correct it in the human body, a task that will
require knowledge about how the protein functions and in which
cells it is active.
Correct protein function depends on the three-dimensional (3D),
or folded, structure the proteins assume in biological
environments; thus, understanding protein structure will be
essential in determining gene function. DNA sequences will be
translated into amino acid sequences, and researchers will try to
make inferences about functions either by comparing protein
sequences with each other or by comparing their specific 3-D
structures (Fig. 15).
Because the 3-D structure patterns (motifs) that protein
molecules assume are much more evolutionarily conserved than
amino acid sequences, this type of homology search could prove
more fruitful. Particular motifs may serve similar functions in
several different proteins, information that would be valuable in
genome analyses. Currently, however, only a few protein motifs
can be recognized at the sequence level. Continued development of
analytic capabilities to facilitate grouping protein sequences
into motif families will make homology searches more successful.
The Genome Data Base (GDB), located at Johns Hopkins University
(Baltimore, Maryland), provides location, ordering, and distance
information for human genetic markers, probes, and contigs linked
to known human genetic disease. GDB is presently working on
incorporating physical mapping data. Also at Hopkins is the
Online Mendelian Inheritance in Man database, a catalog of
inherited human traits and diseases.
The Human and Mouse Probes and Libraries Database (located at the
American Type Culture Collection in Rockville, Maryland) and the
GBASE mouse database (located at Jackson Laboratory, Bar Harbor,
Maine) include data on RFLPs, chromosomal assignments, and probes
from the laboratory mouse.
Nucleic Acids (DNA and RNA)
GenBank, the European Molecular Biology Laboratory (EMBL)
sequence database, and the DNA Database of Japan (DDBJ) house
over 70 Mb of sequence from more than 2500 different organisms.
Compiled from both direct submissions and journal scans, GenBank
is supported at IntelliGenetics (Mountain View, California) and
LANL through a contract from the NIH National Institute of
General Medical Sciences. Although responsibility for GenBank
will move to the National Center for Biotechnology Information
(NCBI) of the National Library of Medicine in September 1992,
LANL will continue to handle direct data submissions from
authors. International collaborations with EMBL and DDBJ will
also continue. NCBI is also developing GenInfo, a data archive
that will eventually offer integrated access to other databases.
The major protein sequence databases are the Protein
Identification Resource (National Biomedical Research
Foundation), Swissprot, and GenPept (both distributed with
GenBank). In addition to sequence information, they contain
information on protein motifs and other features of protein
Impact of the Human Genome Project
The atlas of the human genome will revolutionize medical practice
and biological research into the 21st century and beyond. All
human genes will eventually be found, and accurate diagnostics
will be developed for most inherited diseases. In addition,
animal models for human disease research will be more easily
developed, facilitating the understanding of gene function in
health and disease.
Researchers have already identified single genes associated with
a number of diseases, such as cystic fibrosis, Duchenne muscular
dystrophy, myotonic dystrophy, neurofibromatosis, and
retinoblastoma. As research progresses, investigators will also
uncover the mechanisms for diseases caused by several genes or by
a gene interacting with environmental factors. Genetic
susceptibilities have been implicated in many major disabling and
fatal diseases including heart disease, stroke, diabetes, and
several kinds of cancer. The identification of these genes and
their proteins will pave the way to more-effective therapies and
preventive measures. Investigators determining the underlying
biology of genome organization and gene regulation will also
begin to understand how humans develop from single cells to
adults, why this process sometimes goes awry, and what changes
take place as people age.
New technologies developed for genome research will also find
myriad applications in industry, as well as in projects to map
(and ultimately improve) the genomes of economically important
farm animals and crops.
While human genome research itself does not pose any new ethical
dilemmas, the use of data arising from these studies presents
challenges that need to be addressed before the data accumulate
significantly. To assist in policy development, the ethics
component of the Human Genome Project is funding conferences and
research projects to identify and consider relevant issues, as
well as activities to promote public awareness of these topics.
Appendix B: Conferences, Meetings, and Workshops Sponsored by DOE
4/89 Second Cold Spring Harbor Meeting on Genome Mapping and
Sequencing: Cold Spring Harbor, NY
6/89 Chromosome 16 Workshop: New Haven, CT
10/89 First Annual Genome Sequencing Conference: Wolf Trap,
12/89 Large Insert Cloning Workshop: Houston, TX
12/89 Human X Chromosome Workshop: Houston, TX
2/90 Chromosome 3 Workshop: San Antonio, TX
3/90 First Conference on Genetics, Religion, and Ethics: Houston,
4/90 Application of Mass Spectrometry to DNA Sequencing Workshop:
4/90 Chromosome 21 Workshop: Bethesda, MD
4/90 Workshop on Mapping Human Chromosome 22: Paris
8/90 DOE-NIH Annual Planning and Evaluation Retreat: Hunt Valley,
8/90 Chromosome 19 Workshop: Charleston, SC
9/90 Genome Sequencing Conference II: Hilton Head, SC
9/90 First International Workshop on Human Chromosome 5: London
11/90 Fourth International Workshop on Mouse Genome Mapping:
1/91 Second X Chromosome Workshop: Oxford, England
2/91 Second DOE Contractor-Grantee Workshop: Santa Fe, NM
3/91 Chromosome 17 Workshop: Salt Lake City, UT
4/91 Workshop on Computational Molecular Biology: Seattle, WA
4/91 Chromosome 3 Workshop: Denver, CO
4/91 Chromosome 21 Workshop: Denver, CO
5/91 Sequencing by Hybridization Workshop: Gaithersburg, MD
5/91 Chromosome 11 Workshop: Paris
6/91 Workshop on Open Problems of Computational Molecular
Biology: Telluride, CO
6/91 Chromosome 4 Workshop: Philadelphia, PA
9/91 DOE-NIH Annual Planning and Evaluation Retreat: Lafayette,
9/91 ELSI Working Group Meeting on Privacy: Bethesda, MD
9/91 First Panel Meeting "Predicting Future Diseases" at the
National Academy of Sciences Institute of Medicine:
9/91 Genome Sequencing III: Hilton Head, SC
9/91 Workshop on Informatics Needs of Large-Scale Sequencing
Projects: Hilton Head, SC
10/91 Conference on Identification of Transcribed Sequences
in the Human Genome: Bethesda, MD
10/91 Workshop on DNA Sequence Acquisition and
Interpretation: Cold Spring Harbor, NY
11/91 Conference on Justice and the Human Genome: Chicago, IL
11/91 Sequencing By Hybridization Workshop: Moscow
12/91 Human Genetics and Genome Analysis: A Practical
Workshop for the Nonscientist: Cold Spring Harbor, NY
1/92 Chromosome 19 Workshop: Nijmegen, Netherlands
2/92 Chromosome 16 Workshop: Adelaide, Australia
3/92 Second Conference on Genetics, Religion, and Ethics:
3/92 Chromosome 17 Workshop: Salt Lake City, UT
3/92 Chromosome 3 Workshop: Tokyo, Japan
3/92 Chromosome 9 Workshop: Cambridge, England
5/92 Chromosome 5 Workshop: Chicago, IL
6/92 Chromosome 4 Workshop: Leiden, Netherlands
6/92 Chromosome 6 Workshop: Ann Arbor, MI
6/92 Chromosome 15 Workshop: Tucson, AZ
6/92 Chromosome 18 Workshop: Chicago, IL
6/92 DOE/NIH Annual Planning and Evaluation Retreat: Bethesda, MD
Partial Listing of Future DOE-Sponsored Workshops
9/92 Chromosome 11 Workshop; San Diego, CA
9/92 Chromosome 12 Workshop: Oxford, England
9/92 Chromosome 13 Workshop: New York, NY
11/92 Chromosome 2 Workshop: Lake Tahoe, CA
2/93 Third DOE Contractor-Grantee Workshop: Santa Fe, NM
Appendix C: Members of the DOE Health and Environmental Research
Sheldon Wolff (Chair) University of California, San Francisco
E. Morton Bradbury Los Alamos National Laboratory
Eville Gorham University of Minnesota
Jonathan Greer Abbott Laboratories
Barbara Ann Hamkalo University of California, Irvine
Sam Hurst Atom Sciences, Inc.
Kenneth K. Kidd Yale University
Leonard S. Lerman Massachusetts Institute of Technology
Gordon J. MacDonald University of California, San Diego
J. Justin McCormick Michigan State University
Mortimer L. Mendelsohn Lawrence Livermore National Laboratory
Mary Lou Pardue Massachusetts Institute of Technology
Theodore L. Phillips University of California, San Francisco
Richard C. Reba University of Chicago
Melvin I. Simon California Institute of Technology
Warren M. Washington National Center for Atmospheric Research
Audrey Wegst Diagnostic Technology Consultants, Inc.
Harel Weinstein Mt. Sinai School of Medicine
Appendix D: Members of NIH-DOE Joint Working Groups
Joint Working Group on Ethical, Legal, and Social Issues
(First met September 1989; first workshop held February 5-6,
Nancy Wexler (Chair) Columbia University
Jonathan R. Beckwith Harvard Medical School
Robert Cook-Deegan National Academy of Sciences Institute
Patricia King Georgetown University Law Center
Victor A. McKusick Johns Hopkins University Hospital
Robert F. Murray Howard University
Thomas H. Murray Case Western Reserve University
Joint Mapping Working Group
(First met December 1989)
David Botstein Stanford University
Anthony V. Carrano Lawrence Livermore National Laboratory
C. Thomas Caskey Baylor College of Medicine
David R. Cox University of California, San Francisco
Robert K. Moyzis Los Alamos National Laboratory
Maynard V. Olson Washington University
Joint Informatics Task Force (ad hoc)
(First met March 7-9, 1990; final meeting January 3, 1992)
Dieter Soll (Chair) Yale University
George I. Bell Los Alamos National Laboratory
David Botstein Stanford University
Elbert Branscomb Lawrence Livermore National Laboratory
John Devereux Genetics Computer Group
Nathan Goodman Whitehead Institute
Gregory Hamm Rutgers University Waksman Institute
Eric Lander Massachusetts Institute of Technology
Frank Olken Lawrence Berkeley Laboratory
Mark L. Pearson E. I. du Pont de Nemours & Company
Sylvia J. Spengler Lawrence Berkeley Laboratory
Michael Waterman University of Southern California
Joint Sequencing Working Group
(First met May 10, 1990)
Ellson Chen Genentech, Inc.
Ronald Davis Stanford University
John Devereux Genetics Computer Group
Walter Gilbert Harvard University
Leroy E. Hood California Institute of Technology
Mark L. Pearson E.I. du Pont de Nemours & Company
Joseph Sambrook University of Texas
Phillip A. Sharp Massachusetts Institute of Technology
William Studier Brookhaven National Laboratory
Joint Working Group on the Mouse
(First met May 6, 1991)
Verne Chapman (Chair) Roswell Park Memorial Institute
Frank Constantini Columbia University
Neal Copeland National Cancer Institute-Frederick Cancer
Research and Development Center
William Dove University of Wisconsin, Madison
Joseph Nadeau Jackson Laboratory
Roger Reeves Johns Hopkins University
Janet Rossant Mt. Sinai Hospital
Oliver Smithies University of North Carolina, Chapel Hill
Richard Woychik Oak Ridge National Laboratory
Appendix E: Glossary
Portions of the glossary text were taken directly or modified
from definitions in the U.S. Congress Office of Technology
Assessment document: Mapping Our Genes_The Genome Projects: How
Big, How Fast? OTA-BA-373, Washington, D.C.: U.S. Government
Printing Office, April 1988.
Adenine (A): A nitrogenous base, one member of the base pair A-T
Alleles: Alternative forms of a genetic locus; a single allele
for each locus is inherited separately from each parent (e.g., at
a locus for eye color the allele might result in blue or brown
Amino acid: Any of a class of 20 molecules that are combined to
form proteins in living things. The sequence of amino acids in a
protein and hence protein function are determined by the genetic
Amplification: An increase in the number of copies of a specific
DNA fragment; can be in vivo or in vitro. See cloning, polymerase
Arrayed library: Individual primary recombinant clones (hosted in
phage, cosmid, YAC, or other vector) that are placed in
two-dimensional arrays in microtiter dishes. Each primary clone
can be identified by the identity of the plate and the clone
location (row and column) on that plate. Arrayed libraries of
clones can be used for many applications, including screening for
a specific gene or genomic region of interest as well as for
physical mapping. Information gathered on individual clones from
various genetic linkage and physical map analyses is entered into
a relational database and used to construct physical and genetic
linkage maps simultaneously; clone identifiers serve to
interrelate the multilevel maps. Compare library, genomic
Autoradiography: A technique that uses X-ray film to visualize
radioactively labeled molecules or fragments of molecules; used
in analyzing length and number of DNA fragments after they are
separated by gel electrophoresis.
Autosome: A chromosome not involved in sex determination. The
diploid human genome consists of 46 chromosomes, 22 pairs of
autosomes, and 1 pair of sex chromosomes (the X and Y
Bacteriophage: See phage.
Base pair (bp): Two nitrogenous bases (adenine and thymine or
guanine and cytosine) held together by weak bonds. Two strands of
DNA are held together in the shape of a double helix by the bonds
between base pairs.
Base sequence: The order of nucleotide bases in a DNA molecule.
Base sequence analysis: A method, sometimes automated, for
determining the base sequence.
Biotechnology: A set of biological techniques developed through
basic research and now applied to research and product
development. In particular, the use by industry of recombinant
DNA, cell fusion, and new bioprocessing techniques.
bp: See base pair.
cDNA: See complementary DNA.
Centimorgan (cM): A unit of measure of recombination frequency.
One centimorgan is equal to a 1% chance that a marker at one
genetic locus will be separated from a marker at a second locus
due to crossing over in a single generation. In human beings, 1
centimorgan is equivalent, on average, to 1 million base pairs.
Centromere: A specialized chromosome region to which spindle
fibers attach during cell division.
Chromosomes: The self-replicating genetic structures of cells
containing the cellular DNA that bears in its nucleotide sequence
the linear array of genes. In prokaryotes, chromosomal DNA is
circular, and the entire genome is carried on one chromosome.
Eukaryotic genomes consist of a number of chromosomes whose DNA
is associated with different kinds of proteins.
Clone bank: See genomic library.
Clones: A group of cells derived from a single ancestor.
Cloning: The process of asexually producing a group of cells
(clones), all genetically identical, from a single ancestor. In
recombinant DNA technology, the use of DNA manipulation
procedures to produce multiple copies of a single gene or segment
of DNA is referred to as cloning DNA.
Cloning vector: DNA molecule originating from a virus, a plasmid,
or the cell of a higher organism into which another DNA fragment
of appropriate size can be integrated without loss of the
vector's capacity for self-replication; vectors introduce foreign
DNA into host cells, where it can be reproduced in large
quantities. Examples are plasmids, cosmids, and yeast artificial
chromosomes; vectors are often recombinant molecules containing
DNA sequences from several sources.
cM: See centimorgan.
Code: See genetic code.
Codon: See genetic code.
Complementary DNA (cDNA): DNA that is synthesized from a
messenger RNA template; the single-stranded form is often used as
a probe in physical mapping.
Complementary sequences: Nucleic acid base sequences that can
form a double-stranded structure by matching base pairs; the
complementary sequence to G-T-A-C is C-A-T-G.
Conserved sequence: A base sequence in a DNA molecule (or an
amino acid sequence in a protein) that has remained essentially
unchanged throughout evolution.
Contig map: A map depicting the relative order of a linked
library of small overlapping clones representing a complete
Contigs: Groups of clones representing overlapping regions of a
Cosmid: Artificially constructed cloning vector containing the
cos gene of phage lambda. Cosmids can be packaged in lambda phage
particles for infection into E. coli; this permits cloning of
larger DNA fragments (up to 45 kb) than can be introduced into
bacterial hosts in plasmid vectors.
Crossing over: The breaking during meiosis of one maternal and
one paternal chromosome, the exchange of corresponding sections
of DNA, and the rejoining of the chromosomes. This process can
result in an exchange of alleles between chromosomes. Compare
Cytosine (C): A nitrogenous base, one member of the base pair G-C
(guanine and cytosine).
Deoxyribonucleotide: See nucleotide.
Diploid: A full set of genetic material, consisting of paired
chromosomes_ one chromosome from each parental set. Most animal
cells except the gametes have a diploid set of chromosomes. The
diploid human genome has 46 chromosomes. Compare haploid.
DNA (deoxyribonucleic acid): The molecule that encodes genetic
information. DNA is a double-stranded molecule held together by
weak bonds between base pairs of nucleotides. The four
nucleotides in DNA contain the bases: adenine (A), guanine (G),
cytosine (C), and thymine (T). In nature, base pairs form only
between A and T and between G and C; thus the base sequence of
each single strand can be deduced from that of its partner.
DNA probes: See probe.
DNA replication: The use of existing DNA as a template for the
synthesis of new DNA strands. In humans and other eukaryotes,
replication occurs in the cell nucleus.
DNA sequence: The relative order of base pairs, whether in a
fragment of DNA, a gene, a chromosome, or an entire genome. See
base sequence analysis.
Domain: A discrete portion of a protein with its own function.
The combination of domains in a single protein determines its
Double helix: The shape that two linear strands of DNA assume
when bonded together.
E. coli: Common bacterium that has been studied intensively by
geneticists because of its small genome size, normal lack of
pathogenicity, and ease of growth in the laboratory.
Electrophoresis: A method of separating large molecules (such as
DNA fragments or proteins) from a mixture of similar molecules.
An electric current is passed through a medium containing the
mixture, and each kind of molecule travels through the medium at
a different rate, depending on its electrical charge and size.
Separation is based on these differences. Agarose and acrylamide
gels are the media commonly used for electrophoresis of proteins
and nucleic acids.
Endonuclease: An enzyme that cleaves its nucleic acid substrate
at internal sites in the nucleotide sequence.
Enzyme: A protein that acts as a catalyst, speeding the rate at
which a biochemical reaction proceeds but not altering the
direction or nature of the reaction.
EST: Expressed sequence tag. See sequence tagged site.
Eukaryote: Cell or organism with membrane-bound, structurally
discrete nucleus and other well-developed subcellular
compartments. Eukaryotes include all organisms except viruses,
bacteria, and blue-green algae. Compare prokaryote. See
Evolutionarily conserved: See conserved sequence.
Exogenous DNA: DNA originating outside an organism.
Exons: The protein-coding DNA sequences of a gene. Compare
Exonuclease: An enzyme that cleaves nucleotides sequentially from
free ends of a linear nucleic acid substrate.
Expressed gene: See gene expression.
FISH (fluorescence in situ hybridization): A physical mapping
approach that uses fluorescein tags to detect hybridization of
probes with metaphase chromosomes and with the less-condensed
somatic interphase chromatin.
Flow cytometry: Analysis of biological material by detection of
the light-absorbing or fluorescing properties of cells or
subcellular fractions (i.e., chromosomes) passing in a narrow
stream through a laser beam. An absorbance or fluorescence
profile of the sample is produced. Automated sorting devices,
used to fractionate samples, sort successive droplets of the
analyzed stream into different fractions depending on the
fluorescence emitted by each droplet.
Flow karyotyping: Use of flow cytometry to analyze and/or
separate chromosomes on the basis of their DNA content.
Gamete: Mature male or female reproductive cell (sperm or ovum)
with a haploid set of chromosomes (23 for humans).
Gene: The fundamental physical and functional unit of heredity. A
gene is an ordered sequence of nucleotides located in a
particular position on a particular chromosome that encodes a
specific functional product (i.e., a protein or RNA molecule).
See gene expression.
Gene expression: The process by which a gene's coded information
is converted into the structures present and operating in the
cell. Expressed genes include those that are transcribed into
mRNA and then translated into protein and those that are
transcribed into RNA but not translated into protein (e.g.,
transfer and ribosomal RNAs).
Gene families: Groups of closely related genes that make similar
Gene library: See genomic library.
Gene mapping: Determination of the relative positions of genes on
a DNA molecule (chromosome or plasmid) and of the distance, in
linkage units or physical units, between them.
Gene product: The biochemical material, either RNA or protein,
resulting from expression of a gene. The amount of gene product
is used to measure how active a gene is; abnormal amounts can be
correlated with disease-causing alleles.
Genetic code: The sequence of nucleotides, coded in triplets
(codons) along the mRNA, that determines the sequence of amino
acids in protein synthesis. The DNA sequence of a gene can be
used to predict the mRNA sequence, and the genetic code can in
turn be used to predict the amino acid sequence.
Genetic engineering technologies: See recombinant DNA
Genetic map: See linkage map.
Genetic material: See genome.
Genetics: The study of the patterns of inheritance of specific
Genome: All the genetic material in the chromosomes of a
particular organism; its size is generally given as its total
number of base pairs.
Genome projects: Research and technology development efforts
aimed at mapping and sequencing some or all of the genome of
human beings and other organisms.
Genomic library: A collection of clones made from a set of
randomly generated overlapping DNA fragments representing the
entire genome of an organism. Compare library, arrayed library.
Guanine (G): A nitrogenous base, one member of the base pair G-C
(guanine and cytosine).
Haploid: A single set of chromosomes (half the full set of
genetic material), present in the egg and sperm cells of animals
and in the egg and pollen cells of plants. Human beings have 23
chromosomes in their reproductive cells. Compare diploid.
Heterozygosity: The presence of different alleles at one or more
loci on homologous chromosomes.
Homeobox: A short stretch of nucleotides whose base sequence is
virtually identical in all the genes that contain it. It has been
found in many organisms from fruit flies to human beings. In the
fruit fly, a homeobox appears to determine when particular groups
of genes are expressed during development.
Homologies: Similarities in DNA or protein sequences between
individuals of the same species or among different species.
Homologous chromosomes: A pair of chromosomes containing the same
linear gene sequences, each derived from one parent.
Human gene therapy: Insertion of normal DNA directly into cells
to correct a genetic defect.
Human Genome Initiative: Collective name for several projects
begun in 1986 by DOE to (1) create an ordered set of DNA segments
from known chromosomal locations, (2) develop new computational
methods for analyzing genetic map and DNA sequence data, and (3)
develop new techniques and instruments for detecting and
analyzing DNA. This DOE initiative is now known as the Human
Genome Program. The national effort, led by DOE and NIH, is known
as the Human Genome Project.
Hybridization: The process of joining two complementary strands
of DNA or one each of DNA and RNA to form a double-stranded
Informatics: The study of the application of computer and
statistical techniques to the management of information. In
genome projects, informatics includes the development of methods
to search databases quickly, to analyze DNA sequence information,
and to predict protein sequence and structure from DNA sequence
In situ hybridization: Use of a DNA or RNA probe to detect the
presence of the complementary DNA sequence in cloned bacterial or
cultured eukaryotic cells.
Interphase: The period in the cell cycle when DNA is replicated
in the nucleus; followed by mitosis.
Introns: The DNA base sequences interrupting the protein-coding
sequences of a gene; these sequences are transcribed into RNA but
are cut out of the message before it is translated into protein.
In vitro: Outside a living organism.
Karyotype: A photomicrograph of an individual's chromosomes
arranged in a standard format showing the number, size, and shape
of each chromosome type; used in low-resolution physical mapping
to correlate gross chromosomal abnormalities with the
characteristics of specific diseases.
kb: See kilobase.
Kilobase (kb): Unit of length for DNA fragments equal to 1000
Library: An unordered collection of clones (i.e., cloned DNA from
a particular organism), whose relationship to each other can be
established by physical mapping. Compare genomic library, arrayed
Linkage: The proximity of two or more markers (e.g., genes, RFLP
markers) on a chromosome; the closer together the markers are,
the lower the probability that they will be separated during DNA
repair or replication processes (binary fission in prokaryotes,
mitosis or meiosis in eukaryotes), and hence the greater the
probability that they will be inherited together.
Linkage map: A map of the relative positions of genetic loci on a
chromosome, determined on the basis of how often the loci are
inherited together. Distance is measured in centimorgans (cM).
Localize: Determination of the original position (locus) of a
gene or other marker on a chromosome.
Locus (pl. loci): The position on a chromosome of a gene or other
chromosome marker; also, the DNA at that position. The use of
locus is sometimes restricted to mean regions of DNA that are
expressed. See gene expression.
Macrorestriction map: Map depicting the order of and distance
between sites at which restriction enzymes cleave chromosomes.
Mapping: See gene mapping, linkage map, physical map.
Marker: An identifiable physical location on a chromosome (e.g.,
restriction enzyme cutting site, gene) whose inheritance can be
monitored. Markers can be expressed regions of DNA (genes) or
some segment of DNA with no known coding function but whose
pattern of inheritance can be determined. See RFLP, restriction
fragment length polymorphism.
Mb: See megabase.
Megabase (Mb): Unit of length for DNA fragments equal to 1
million nucleotides and roughly equal to 1 cM.
Meiosis: The process of two consecutive cell divisions in the
diploid progenitors of sex cells. Meiosis results in four rather
than two daughter cells, each with a haploid set of chromosomes.
Messenger RNA (mRNA): RNA that serves as a template for protein
synthesis. See genetic code.
Metaphase: A stage in mitosis or meiosis during which the
chromosomes are aligned along the equatorial plane of the cell.
Mitosis: The process of nuclear division in cells that produces
daughter cells that are genetically identical to each other and
to the parent cell.
mRNA: See messenger RNA.
Multifactorial or multigenic disorders: See polygenic disorders.
Multiplexing: A sequencing approach that uses several pooled
samples simultaneously, greatly increasing sequencing speed.
Mutation: Any heritable change in DNA sequence. Compare
Nitrogenous base: A nitrogen-containing molecule having the
chemical properties of a base.
Nucleic acid: A large molecule composed of nucleotide subunits.
Nucleotide: A subunit of DNA or RNA consisting of a nitrogenous
base (adenine, guanine, thymine, or cytosine in DNA; adenine,
guanine, uracil, or cytosine in RNA), a phosphate molecule, and a
sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands
of nucleotides are linked to form a DNA or RNA molecule. See DNA,
base pair, RNA.
Nucleus: The cellular organelle in eukaryotes that contains the
Oncogene: A gene, one or more forms of which is associated with
cancer. Many oncogenes are involved, directly or indirectly, in
controlling the rate of cell growth.
Overlapping clones: See genomic library.
PCR: See polymerase_chain reaction.
Phage: A virus for which the natural host is a bacterial cell.
Physical map: A map of the locations of identifiable landmarks on
DNA (e.g., restriction enzyme cutting sites, genes), regardless
of inheritance. Distance is measured in base pairs. For the human
genome, the lowest-resolution physical map is the banding
patterns on the 24 different chromosomes; the highest-resolution
map would be the complete nucleotide sequence of the chromosomes.
Plasmid: Autonomously replicating, extrachromosomal circular DNA
molecules, distinct from the normal bacterial genome and
nonessential for cell survival under nonselective conditions.
Some plasmids are capable of integrating into the host genome. A
number of artificially constructed plasmids are used as cloning
Polygenic disorders: Genetic disorders resulting from the
combined action of alleles of more than one gene (e.g., heart
disease, diabetes, and some cancers). Although such disorders are
inherited, they depend on the simultaneous presence of several
alleles; thus the hereditary patterns are usually more complex
than those of single-gene disorders. Compare single-gene
Polymerase chain reaction (PCR): A method for amplifying a DNA
base sequence using a heat-stable polymerase and two 20-base
primers, one complementary to the (+)-strand at one end of the
sequence to be amplified and the other complementary to the
(-)-strand at the other end. Because the newly synthesized DNA
strands can subsequently serve as additional templates for the
same primer sequences, successive rounds of primer annealing,
strand elongation, and dissociation produce rapid and highly
specific amplification of the desired sequence. PCR also can be
used to detect the existence of the defined sequence in a DNA
Polymerase, DNA or RNA: Enzymes that catalyze the synthesis of
nucleic acids on preexisting nucleic acid templates, assembling
RNA from ribonucleotides or DNA from deoxyribonucleotides.
Polymorphism: Difference in DNA sequence among individuals.
Genetic variations occurring in more than 1% of a population
would be considered useful polymorphisms for genetic linkage
analysis. Compare mutation.
Primer: Short preexisting polynucleotide chain to which new
deoxyribonucleotides can be added by DNA polymerase.
Probe: Single-stranded DNA or RNA molecules of specific base
sequence, labeled either radioactively or immunologically, that
are used to detect the complementary base sequence by
Prokaryote: Cell or organism lacking a membrane-bound,
structurally discrete nucleus and other subcellular compartments.
Bacteria are prokaryotes. Compare eukaryote. See chromosomes.
Promoter: A site on DNA to which RNA polymerase will bind and
Protein: A large molecule composed of one or more chains of amino
acids in a specific order; the order is determined by the base
sequence of nucleotides in the gene coding for the protein.
Proteins are required for the structure, function, and regulation
of the body's cells, tissues, and organs, and each protein has
unique functions. Examples are hormones, enzymes, and antibodies.
Purine: A nitrogen-containing, single-ring, basic compound that
occurs in nucleic acids. The purines in DNA and RNA are adenine
Pyrimidine: A nitrogen-containing, double-ring, basic compound
that occurs in nucleic acids. The pyrimidines in DNA are cytosine
and thymine; in RNA, cytosine and uracil.
Rare-cutter enzyme: See restriction enzyme cutting site.
Recombinant clones: Clones containing recombinant DNA molecules.
See recombinant DNA technologies.
Recombinant DNA molecules: A combination of DNA molecules of
different origin that are joined using recombinant DNA
Recombinant DNA technologies: Procedures used to join together
DNA segments in a cell-free system (an environment outside a cell
or organism). Under appropriate conditions, a recombinant DNA
molecule can enter a cell and replicate there, either
autonomously or after it has become integrated into a cellular
Recombination: The process by which progeny derive a combination
of genes different from that of either parent. In higher
organisms, this can occur by crossing over.
Regulatory regions or sequences: A DNA base sequence that
controls gene expression.
Resolution: Degree of molecular detail on a physical map of DNA,
ranging from low to high.
Restriction enzyme, endonuclease: A protein that recognizes
specific, short nucleotide sequences and cuts DNA at those sites.
Bacteria contain over 400 such enzymes that recognize and cut
over 100 different DNA sequences. See restriction enzyme cutting
Restriction enzyme cutting site: A specific nucleotide sequence
of DNA at which a particular restriction enzyme cuts the DNA.
Some sites occur frequently in DNA (e.g., every several hundred
base pairs), others much less frequently (rare-cutter; e.g.,
every 10,000 base pairs).
Restriction fragment length polymorphism (RFLP): Variation
between individuals in DNA fragment sizes cut by specific
restriction enzymes; polymorphic sequences that result in RFLPs
are used as markers on both physical maps and genetic linkage
maps. RFLPs are usually caused by mutation at a cutting site. See
RFLP: See restriction fragment length polymorphism.
Ribonucleic acid (RNA): A chemical found in the nucleus and
cytoplasm of cells; it plays an important role in protein
synthesis and other chemical activities of the cell. The
structure of RNA is similar to that of DNA. There are several
classes of RNA molecules, including messenger RNA, transfer RNA,
ribosomal RNA, and other small RNAs, each serving a different
Ribonucleotides: See nucleotide.
Ribosomal RNA (rRNA): A class of RNA found in the ribosomes of
Ribosomes: Small cellular components composed of specialized
ribosomal RNA and protein; site of protein synthesis. See
ribonucleic acid (RNA).
RNA: See ribonucleic acid.
Sequence: See base sequence.
Sequence tagged site (STS): Short (200 to 500 base pairs) DNA
sequence that has a single occurrence in the human genome and
whose location and base sequence are known. Detectable by
polymerase chain reaction, STSs are useful for localizing and
orienting the mapping and sequence data reported from many
different laboratories and serve as landmarks on the developing
physical map of the human genome. Expressed sequence tags (ESTs)
are STSs derived from cDNAs.
Sequencing: Determination of the order of nucleotides (base
sequences) in a DNA or RNA molecule or the order of amino acids
in a protein.
Sex chromosomes: The X and Y chromosomes in human beings that
determine the sex of an individual. Females have two X
chromosomes in diploid cells; males have an X and a Y chromosome.
The sex chromosomes comprise the 23rd chromosome pair in a
karyotype. Compare autosome.
Shotgun method: Cloning of DNA fragments randomly generated from
a genome. See library, genomic library.
Single-gene disorder: Hereditary disorder caused by a mutant
allele of a single gene (e.g., Duchenne muscular dystrophy,
retinoblastoma, sickle cell disease). Compare polygenic
Somatic cells: Any cell in the body except gametes and their
Southern blotting: Transfer by absorption of DNA fragments
separated in electrophoretic gels to membrane filters for
detection of specific base sequences by radiolabeled
STS: See sequence tagged site.
Tandem repeat sequences: Multiple copies of the same base
sequence on a chromosome; used as a marker in physical mapping.
Technology transfer: The process of converting scientific
findings from research laboratories into useful products by the
Telomere: The ends of chromosomes. These specialized structures
are involved in the replication and stability of linear DNA
molecules. See DNA replication.
Thymine (T): A nitrogenous base, one member of the base pair A-T
Transcription: The synthesis of an RNA copy from a sequence of
DNA (a gene); the first step in gene expression. Compare
Transfer RNA (tRNA): A class of RNA having structures with
triplet nucleotide sequences that are complementary to the
triplet nucleotide coding sequences of mRNA. The role of tRNAs in
protein synthesis is to bond with amino acids and transfer them
to the ribosomes, where proteins are assembled according to the
genetic code carried by mRNA.
Transformation: A process by which the genetic material carried
by an individual cell is altered by incorporation of exogenous
DNA into its genome.
Translation: The process in which the genetic code carried by
mRNA directs the synthesis of proteins from amino acids. Compare
tRNA: See transfer RNA.
Uracil: A nitrogenous base normally found in RNA but not DNA;
uracil is capable of forming a base pair with adenine.
Vector: See cloning vector.
Virus: A noncellular biological entity that can reproduce only
within a host cell. Viruses consist of nucleic acid covered by
protein; some animal viruses are also surrounded by membrane.
Inside the infected cell, the virus uses the synthetic capability
of the host to produce progeny virus.
VLSI: Very large-scale integration allowing over 100,000
transistors on a chip.
YAC: See yeast artificial chromosome.
Yeast artificial chromosome (YAC): A vector used to clone DNA
fragments (up to 400 kb); it is constructed from the telomeric,
centromeric, and replication origin sequences needed for
replication in yeast cells. Compare cloning vector, cosmid.
Index to Principal and Coinvestigators Listed in Abstracts
To retrieve these abstracts use the following:
--> 8. Search Abstracts of DOE-Funded Genome Research >
You may search by Author Name, Address, or any word that appears in the
abstract. You may narrow your search by using the boolean operators (and. or,
not) or by phrase searches ("....."). For example - if you want to see all
the mouse work funded by the DOE Genome projuct simply search for
But if you want to see only the mouse projects that have proposed to use
Fluorescence In Situ Hybridization (FISH) search for:
mouse and fish
this will narrow the results dramatically.
Adams, Mark 97 Bulger, Ruth 156
Adamson, Anne 164 Burks, Christian 141
Alexander, Peter 182 Callen, David 106, 108
Allen, Michael 177 Campbell, Evelyn 83, 89
Allison, David 125 Campbell, Mary 83, 89
Amemiya, Chris 100, 104 Cantor, Charles 111, 163
Anderson, N. Leigh 168 Carrano, Anthony 84, 88, 94,
Anderson, Norman 168 100, 103,
Anderson, W. Holt 167 104, 109, 139
Andreason, Grai 174 Casey, Denise 164
Antonarakis, Stylianos 172 Caskey, C. Thomas 99, 157
Apostolou, Sinoula 108 Cassatt, James 140
Apsell, Paula 156 Chait, Brian 136
Arenstorf, Hartwig 171 Chedd, Graham 156
Arlinghaus, Heinrich 127, Chen, Chira 104
165, 168, 177 Chen, C. H. Winston 120
Ashworth, Linda 104 Chen, Ed 143
Aslandidis, Charalampos 100 Chen, Jiun 132
Athwal, Raghbir 82 Chen, Liang 108
Bacha, Hamid 185 Chen, Shizhong 174
Baker, Elizabeth 108 Cheng, Jan-Fang 83, 100
Baker, Mark 123 Cherkauer, Kevin 151
Balding, David 152 Church, George 121
Balhorn, Rodney 177, 181 Cinkosky, Michael 141, 141
Balooch, Mehdi 177 Clancy, Suzanne 101
Barker, David 172 Clark, Steven 101
Beckwith, Jonathon 186 Collins, Debra 157
Beeson, Diane 158 Combs, Jesse 104
Benner, W. Henry 128 Copeland, Alex 104
Berg, Claire 137, 178 Corona, Angela 184
Berg, Douglas 178 Crandall, Lee 161
Beugelsdijk, Tony 111, 116 Craven, Mark 151
Birdsall, David 181 Crkvenjakov, Radomir 121
Birren, Bruce 105 Davidson, Jack 122
Black, Lindsay 93 Davidson, K. Alicia 164
Blackwell, Tom 152 Deaven, Larry 83, 84, 89, 106
Bonaldo, Maria 96 de Jong, Pieter 84, 100, 103,
Bouma, Hessel III 157 104
Boyartchuk, Victor 83 Denton, M. Bonner 123
Bradbury, E. Morton 82 Djbali, Malek 174
Brandriff, Brigitte 109 Doggett, Norman 106, 108
Branscomb, Elbert 103, 139 Dougherty, Randall 141
Brase, James 181 Douthart, Richard 140
Bremer, Meire 105 Dovichi, Norman 124
Brennan, Thomas 119 Drmanac, Radoje 121
Bridgers, Michael 141 Dubnick, Mark 97
Brody, Linnea 98 Dunn, John 138
Bronstein, Irena 167, 170 Durkin, Scott 92
Brown, Gilbert 119, 125, 127 Duster, Troy 158
Brown, Henry 141 Earle, Colin 123
Brown, Stephen 96 Edmonds, Charles 136
Brule, James 185 Edwards, Brooks 170
Efstratiadis, Agiris 96 Hofmann, Gunter 174
Einstein, Ralph 154 Hollen, Robert 111, 116
Entine, Gerald 173 Holmes, Linda 163
Epling, Gary 96 Holtzman, Neil 159
Eubanks, James 174 Honda, Sandra 144
Evans, Glen 101, 174 Hood, Leroy 126, 143
Faber, Vance 141 Hopkins, Janet 95
Fader, Betsy 186 Hozier, John 88
Fain, Pamela 172 Huang, Henry 178
Fairfield, Frederic 152 Huang, Xiaohua 132
Fawcett, John 89 Huber, Hans 180
Feitshans, Ilise 159 Huhn, Greg 174
Ferrell, Thomas 125 Hunkapiller, Tim 143, 147,
Fickett, James 141, 152 148
Fields, Christopher 97, 142 Hurst, Gerald 166
Fischer, Peggy 186 Hutchinson, Marge 145, 155
Flatley, Jay 167 Imara, Mwalimu 161
Fockler, Carita 105 Jackson, Cynthia 86
Foote, Robert 119, 122, 125, Jacobson, K. Bruce 119, 120,
127 125, 127
Francomano, Clare 148
Fullarton, Jane 156 Jaklevic, Joseph 113, 115,
Furuya, Frederic 166 117, 128, 128
Gabra, Nashua 175 Jelenc, Pierre 96
Gatewood, Joe 82 Jett, James 114, 129
Generoso, Estela 107 Johnson, Lori 104
Gesteland, Raymond 126 Juo, Rouh-Rong 170
Gibson, William 165 Jurka, Jerzy 144
Giddings, J. Calvin 112 Kandpal, Rajendra 171
Gingrich, Jeffrey 102 Kang, Hee-Chol 179
Giovannini, Marco 174 Kao, Fa-Ten 86
Glazer, Alexander 132 Karger, Barry 114
Goldberg, Mark 141 Katz, Joseph 115, 128
Gong, Kevin 145 Kaufman, Daniel 174
Grad, Frank 159 Keller, Richard 123, 129
Grady, Deborah 84 Kelley, Jenny 97
Gray, Joe 85, 181 Kerlavage, Anthony 97
Greener, Phillip 98 Khan, Akbar 95
Gusfield, Daniel 144, 148 Knoche, Kimberly 166
Hahn, Peter 88 Kolbe, William 113, 115, 128
Hainfeld, James 112, 166 Kopelman, Raoul 130
Hansen, Tony 113 Korenberg, Julie 87
Hart, Reece 174 Kozman, Helen 108
Hartman, John 167, 169 Kozubel, Mark 116
Haugland, Richard 179 Kuo, Wen-Lin 85
Hempfner, Philip 141 Lai, Tran 141
Henderson, Margaret 161 Lane, Michael 88
Hermanson, Gary 101, 174 Lane, Sharon 108
Hewitt, Peter 179 Langmore, John 130
Hieter, Philip 172 Lapp‚, Marc 186
Hildebrand, C. Edgar 106, 108 Larimer, Frank 119, 127
Himawan, Jeff 133 Lawler, Eugene 144, 148
Hoekstra, Merl 101 Lawrence, Charles 144
Hoffman, Lance 186 Lee, Bill 94
Lennon, Gregory 88 Murphy, Timothy 186
Leonard, Lisa 174 Myers, Eugene 144
Lerman, Leonard 115, 175 Nagle, James 97
Lewis, Suzanna 145, 145, 147 Nancarrow, Julie 108
Longmire, Jon 84 Natowicz, Marvin 160
Loo, Joseph 136 Nelson, David 139
Lowery, Robert 166 Nelson, David L. 91, 99
Lowry, Steven 102 Nelson, Debra 141
Lumley, Amanda 163 Nelson, J. Robert 157
Macken, Catherine 152 Nierman, William 92, 171
Maglott, Donna 92, 171 Nikolic, Julia 100
Makarov, Vladimir 130 Noordewier, Michiel 151
Mann, Reinhold 154 Oehler, Chuck 166
Mansfield, Betty 164 Okumura, K. 106
Mark, Hon Fong 86 Olken, Frank 145, 145, 148
Markowitz, Victor 145, 146, Olsen, Anne 104
147 Orpana, Arto 95
Marr, Thomas 148 Orr, Bradford 130
Martin, Christopher 170 Overbeek, Ross 182
Martin, Christopher H. 89, Page, George 160
103, 131 Palazzolo, Michael 89, 103,
Martin, John 114, 129 131
Martin, Sheryl 164 Parimoo, Satish 171
Matheson, Nina 148 Patanjali, Sankhavaram 171
Mathies, Richard 132 Payne, Marvin 120
Maurer, Susanne 174 Pearson, Peter 148
Mayeda, Carol 89, 103, 131 Pecherer, Robert 141, 141
McAllister, Douglass 167 Pelkey, Joanne 140
McCarthy, John 145 Peters, Don 85
McCormick, MaryKay 84, 89, Pfeifer, Gerd 135
106 Phillips, Hilary 108
McElligott, David 101, 174 Phoenix, David 161
McInerney, Joseph 159 Pinkel, Dan 85
McKean, Ronald 175 Pirrung, Michael 133
McKusick, Victor 148 Polymeropoulos, Mihael 92
Mead, David 132 Powell, Richard 166
Medvick, Patricia 111, 116 Pratt, Lorien 151
Meincke, Linda 89 Quesada, Mark 132
Merrill, Carl 92 Radspinner, David 123
Meyne, Julie 91 Ramsey, Roswitha 119
Michael, Sharon 97 Rao, Venigalla 93
Micklos, David 161 Ratliff, Robert 91
Milosavljevic, Aleksandar 144 Reilly, Phillip 161
Mohrenweiser, Harvey 88, 103 Reiner, Andrew 148
Moir, Donald 90 Richards, Robert 108
Moore, Stefan 160 Richardson, Charles 133, 180
Moreno, Ruben 97 Riggs, Arthur 135
Mosley, Ray 161 Rinchik, Eugene 93, 94, 107,
Moyzis, Robert 84, 89, 91, 153
106 Ringold, Gordon 167
Mucenski, Mike 94 Robbins, Robert 148
Mulley, John 108 Roberts, Randy 111, 116
Mundt, Mark 141 Roman, Maria 174
Mural, Richard 153, 154 Romo, Anthony 174
Roszak-MacDonell, Darlene 167 Sutherland, Grant 106, 108
Rush, John 180 Sutherland, Robert 141
Rye, Hays 132 Swaroop, Anand 171
Sachleben, Richard 119, 125, Szeto, Ernest 146
127 Tabor, Stanley 133, 180
Sainz, Jesus 105 Tan, Weihong 130
Saleh, Mary 174 Tang, Jane 100
Schenk, Karen 152 Thakhar, Vishakha 93
Schimke, R. Neil 157 Theil, Edward 113, 117
Schmitt, Eric 175 Thompson, Andrew 108
Schwartz, Stanley 183 Thonnard, Norbert 127, 165
Searls, David 150 Thundat, Thomas 125
Segebrecht, Linda 157 Thurman, David 140
Selleri, Licia 101, 174 Toliver, Greg 174
Sgro, Peichen 141 Torney, David 141, 152
Shavlik, Jude 151 Towell, Geoffrey 151
Shen, Yang 108 Trask, Barbara 84, 103, 109
Shera, E. Brooks 129 Trebes, James 181
Shi, Zhong You 130 Trimmer, David 116
Shizuya, Hiroaki 105 Trottier, Ralph 161
Shoshani, Arie 146 Troup, Charles 141
Siciliano, Michael 94 Tynan, Katherine 103, 109
Siekhaus, Wigbert 177 Uber, Donald 113, 117
Sikela, James 95 Uberbacher, Edward 153, 154
Simon, Melvin 105 van den Engh, Ger 84, 88, 109
Sindelar, Linda 113 Varghese, Alison 170
Slezak, Tom 139 Vaux, Kenneth 186
Smith, Cassandra 105 Venter, J. Craig 97
Smith, Lloyd 132, 135, 136 Vos, Jean-Michel 97
Smith, Michael 101 Voyta, John 170
Smith, Richard 136 Wagner, Caryn 174
Smith, Steven 130 Wahl, Geoffrey 98
Snider, Ken 174 Walichiewicz, Jolanta 144
Soares, Marcelo 96 Wang, Denan 105
Soderlund, Carol 142 Warburton, Dorothy 159
Solomon, David 169 Ward, David 106, 171
Sorenson, Doug 141, 141 Warmack, Robert 125
Speed, Terence 148 Wassom, John 164
Spengler, Sylvia 164 Waterman, Michael 143
Stallings, Raymond 106, 108 Weiss, Robert 126
Stevens, Tamara 95 Weissman, Sherman 171
Stiegman, Jeffrey 183, 184 Wendroff, Burton 152
Stinnett, Donna 164 West, John 185
Stormon, Charles 185 Whitaker, James 179
Storti, George 176 Whitmore, Scott 108
Stovall, Leonard 116 Whitsitt, Andrew 151
Strathmann, Michael 131 Whittaker, Clive 152
Strausbaugh, Linda 137 Wilcox, Andrea 95
Stricker, Jenny 159 Wilder, Mark 114
Stubbs, Lisa 107 Williams, Peter 138
Studier, F. William 138 Winternitz, Katherine 159
Sudar, Damir 85 Witkowski, Jan 161
Sun, Tian-Qiang 97 Wohlpart, Alfred 163
Sutherland, Betsy 96 Woodbury, Neal 138
Woychik, Richard 93, 94, 119,
Wright, James 163
Wyrick, Judy 164
Xiao, Hong 92
Yang, Sherman 144
Yantis, Bonnie 141
Yesley, Michael 164
Yeung, Edward 117
Yokobata, Kathy 100
Yorkey, Thomas 181
Yoshida, Kaoru 105
Youderian, Philip 98
Yu, Jing-Wei 86
Yust, Laura 164
Zhao, Jun 174
Zorn, Manfred 145, 155
AEC Atomic Energy Commission
ANL* Argonne National Laboratory, Argonne, IL
ATCC American Type Culture Collection, Rockville, MD
BNL* Brookhaven National Laboratory, Upton, NY
CEPH Centre d'Etude du Polymorphisme Humain
CRADA Cooperative Research and Development Agreement
DKFZ German Cancer Research Center
DOE Department of Energy
ERDA Energy Research and Development Administration
FCCSET Federal Coordinating Council on Science, Engineering,
GDB* Genome Data Base
HERAC* Health and Environmental Research Advisory Committee
HGCC* Human Genome Coordinating Committee
HGMIS* Human Genome Management Information System (ORNL)
HUGO Human Genome Organization (international)
JHU Johns Hopkins University
JITF* Joint Informatics Task Force
LANL* Los Alamos National Laboratory, Los Alamos, NM
LBL* Lawrence Berkeley Laboratory, Berkeley, CA
LLNL* Lawrence Livermore National Laboratory, Livermore, CA
MRC Medical Research Council (U.K.)
NAS National Academy of Sciences (U.S.)
NCHGR National Center for Human Genome Research
NIH National Institutes of Health, Bethesda, MD
NLGLP* National Laboratory Gene Library Project (LANL, LLNL)
NRC National Research Council (NAS)
NSF National Science Foundation
OHER* Office of Health and Environmental Research
ORNL* Oak Ridge National Laboratory, Oak Ridge, TN
OSTP Office of Scientific and Technology Policy (White
OTA Office of Technology Assessment (U.S. Congress)
PACHG Program Advisory Committee on the Human Genome
PNL* Pacific Northwest Laboratory, Richland, WA
SBIR Small Business Innovation Research
SCC Scientific Coordinating Committee
TWAS Third World Academy of Sciences
UNESCO United Nations Educational, Scientific, and Cultural
USDA U.S. Department of Agriculture
*Denotes U.S. Department of Energy organizations.
Figure and Photograph Captions
This drawing by Leonardo da Vinci symbolizes the quest for
knowledge through exploration of the unknown. In his art,
Leonardo concentrated on illustrating fundamental rules governing
the physical world to reveal the unity underlying the diversity
of nature. Just as the Renaissance brought broadened
intellectural horizons and rapid advances in the natural sciences
and technology, so will the 21st century, 500 years later,
witness a revolution in many sciences as research unlocks the
secrets of the molecular structure governing the human body, one
of nature's masterpieces.
Fig. 1. The Human Genome at Four Levels of Detail. Apart from
reproductive cells (gametes) and mature red blood cells, every
cell in the human body contains 23 pairs of chromosomes, each a
packet of compressed and entwined DNA (1, 2). Each strand of DNA
consists of repeating nucleotide units composed of a phosphate
group, a sugar (deoxyribose), and a base (guanine, cytosine,
thymine, or adenine) (3). Ordinarily, DNA takes the form of a
highly regular double-stranded helix, the strands of which are
linked by hydrogen bonds between guanine and cytosine and between
thymine and adenine. Each such linkage is a base pair (bp); some
3 billion bp constitute the human genome. The specificity of
these base-pair linkages underlies the mechanism of DNA
replication illustrated here. Each strand of the double helix
serves as a template for the synthesis of a new strand; the
nucleotide sequence (i.e., linear order of bases) of each strand
is strictly determined. Each new double helix is a twin, an exact
replica, of its parent. (Figure and caption text provided by the
LBL Human Genome Center.)
Fig. 2. DNA Structure. The four nitrogenous bases of DNA are
arranged along the sugar-phosphate backbone in a particular order
(the DNA sequence), encoding all genetic instructions for an
organism. Adenine (A) pairs with thymine (T), while cytosine (C)
pairs with guanine (G). The two DNA strands are held together by
weak bonds between the bases.
A gene is a segment of a DNA molecule (ranging from fewer than 1
thousand bases to several million), located in a particular
position on a specific chromosome, whose base sequence contains
the information necessary for protein synthesis.
Fig. 3. Comparison of Largest Known DNA Sequence with Approximate
Chromosome and Genome Sizes of Model Organisms and Humans. A
major focus of the Human Genome Project is the development of
sequencing schemes that are faster and more economical.
Comparative Sequence Sizes Bases
Largest known continuous DNA 350 Thousand
sequence (yeast chromosome 3)
Escherichia coli (bacterium) genome 4.6 Million
Largest yeast chromosome now mapped 5.8 Million
Entire yeast genome 15 Million
Smallest human chromosome (Y) 50 Million
Largest human chromosome (1) 250 Million
Entire human genome 3 Billion
Fig. 4. DNA Replication. During replication the DNA molecule
unwinds, with each single strand becoming a template for
synthesis of a new, complementary strand. Each daughter molecule,
consisting of one old and one new DNA strand, is an exact copy of
the parent molecule. [Source: adapted from Mapping Our Genes_The
Genome Projects: How Big, How Fast? U.S. Congress, Office of
Technology Assessment, OTA-BA-373 (Washington, D.C.: U.S.
Government Printing Office, 1988).]
Fig. 5. Gene Expression. When genes are expressed, the genetic
information (base sequence) on DNA is first transcribed (copied)
to a molecule of messenger RNA in a process similar to DNA
replication. The mRNA molecules then leave the cell nucleus and
enter the cytoplasm, where triplets of bases (codons) forming the
genetic code specify the particular amino acids that make up an
individual protein. This process, called translation, is
accomplished by ribosomes (cellular components composed of
proteins and another class of RNA) that read the genetic code
from the mRNA, and transfer RNAs (tRNAs) that transport amino
acids to the ribosomes for attachment to the growing protein.
(Source: see Fig. 4.)
Fig. 6. Karyotype. Microscopic examination of chromosome size and
banding patterns allows medical laboratories to identify and
arrange each of the 24 different chromosomes (22 pairs of
autosomes and one pair of sex chromosomes) into a karyotype,
which then serves as a tool in the diagnosis of genetic diseases.
The extra copy of chromosome 21 in this karyotype identifies this
individual as having Down's syndrome.
Fig. 7. Assignment of Genes to Specific Chromosomes. The number
of genes assigned (mapped) to specific chromosomes has greatly
increased since the first autosomal (i.e., not on the X or Y
chromosome) marker was mapped in 1968. Most of these genes have
been mapped to specific bands on chromosomes. The acceleration of
chromosome assignments is due to (1) a combination of improved
and new techniques in chromosome sorting and band analysis, (2)
data from family studies, and (3) the introduction of recombinant
DNA technology. [Source: adapted from Victor A. McKusick,
"Current Trends in Mapping Human Genes," The FASEB Journal 5(1),
HUMAN GENOME PROJECT GOALS
* Complete a detailed human genetic map 2 Mb
* Complete a physical map 0.1 Mb
* Acquire the genome as clones 5 kb
* Determine the complete sequence 1 bp
* Find all the genes
With the data generated by the project, investigators will
determine the functions of the genes and develop tools for
biological and medical applications.
Fig. 8. Constructing a Genetic Linkage Map. Genetic linkage maps
of each chromosome are made by determining how frequently two
markers are passed together from parent to child. Because genetic
material is sometimes exchanged during the production of sperm
and egg cells, groups of traits (or markers) originally together
on one chromosome may not be inherited together. Closely linked
markers are less likely to be separated by spontaneous chromosome
rearrangements. In this diagram, the vertical lines represent
chromosome 4 pairs for each individual in a family. The father
has two traits that can be detected in any child who inherits
them: a short known DNA sequence used as a genetic marker (M) and
Huntington's disease (HD). The fact that one child received only
a single trait (M) from that particular chromosome indicates that
the father's genetic material recombined during the process of
sperm production. The frequency of this event helps determine the
distance between the two DNA sequences on a genetic map .
Fig. 9. Physical Mapping Strategies. Top-down physical mapping
(a) produces maps with few gaps, but map resolution may not allow
location of specific genes. Bottom-up strategies (b) generate
extremely detailed maps of small areas but leave many gaps. A
combination of both approaches is being used. [Source: Adapted
from P. R. Billings et al., "New Techniques for Physical Mapping
of the Human Genome," The FASEB Journal 5(1), 29 (1991).]
Fig. 10. Types of Genome Maps. At the coarsest resolution, the
genetic map measures recombination frequency between linked
markers (genes or polymorphisms). At the next resolution level,
restriction fragments of 1 to 2 Mb can be separated and mapped.
Ordered libraries of cosmids and YACs have insert sizes from 40
to 400 kb. The base sequence is the ultimate physical map.
Chromosomal mapping (not shown) locates genetic sites in relation
to bands on chromosomes (estimated resolution of 5_Mb); new in
situ hybridization techniques can place loci 100 kb apart. This
direct strategy links the other four mapping approaches. [Source:
see Fig. 9.]
Fig. 11. Constructing Clones for Sequencing. Cloned DNA molecules
must be made progressively smaller and the fragments subcloned
into new vectors to obtain fragments small enough for use with
current sequencing technology. Sequencing results are compiled to
provide longer stretches of sequence across a chromosome.
(Source: adapted from David A. Micklos and Greg A. Freyer, DNA
Science, A First Course in Recombinant DNA Technology,
Burlington, N.C.: Carolina Biological Supply Company, 1990.)
DNA Amplification: Cloning
(a) Cloning DNA in Plasmids. By fragmenting DNA of any origin
(human, animal, or plant) and inserting it in the DNA of rapidly
reproducing foreign cells, billions of copies of a single gene or
DNA segment can be produced in a very short time. DNA to be
cloned is inserted into a plasmid (a small, self-replicating
circular molecule of DNA) that is separate from chromosomal DNA.
When the recombinant plasmid is introduced into bacteria, the
newly inserted segment will be replicated along with the rest of
(b) Constructing an Overlapping Clone Library. A collection of
clones of chromosomal DNA, called a library, has no obvious order
indicating the original positions of the cloned pieces on the
uncut chromosome. To establish that two particular clones are
adjacent to each other in the genome, libraries of clones
containing partly overlapping regions must be constructed. These
clone libraries are ordered by dividing the inserts into smaller
fragments and determining which clones share common DNA
Fig. 12. DNA Sequencing. Dideoxy sequencing (also called
chain-termination or Sanger method) uses an enzymatic procedure
to synthesize DNA chains of varying lengths, stopping DNA
replication at one of the four bases and then determining the
resulting fragment lengths. Each sequencing reaction tube (T, C,
G, and A) in the diagram contains
* a DNA template, a primer sequence, and a DNA polymerase to
initiate synthesis of a new strand of DNA at the point where
the primer is hybridized to the template;
* the four deoxynucleotide triphosphates (dATP, dTTP, dCTP, and
dGTP) to extend the DNA strand;
* one labeled deoxynucleotide triphosphate (using a radioactive
element or dye); and
* one dideoxynucleotide triphosphate, which terminates the growing
chain wherever it is incorporated. Tube A has didATP, tube C
has didCTP, etc.
For example, in the A reaction tube the ratio of the dATP to
didATP is adjusted so that each tube will have a collection of
DNA fragments with a didATP incorporated for each adenine
position on the template DNA fragments. The fragments of varying
length are then separated by electrophoresis (1) and the
positions of the nucleotides analyzed to determine sequence. The
fragments are separated on the basis of size, with the shorter
fragments moving faster and appearing at the bottom of the gel.
Sequence is read from bottom to top (2). (Source: see Fig. 11.)
Fig. 13. Cloning a Disease Gene by Chromosome Walking. After a
marker is linked to within 1_cM of a disease gene, chromosome
walking can be used to clone the disease gene itself. A probe is
first constructed from a genomic fragment identified from a
library as being the closest linked marker to the gene. A
restriction fragment isolated from the end of the clone near the
disease locus is used to reprobe the genomic library for an
overlapping clone. This process is repeated several times to walk
across the chromosome and reach the flanking marker on the other
side of the disease-gene locus. (Source: see Fig. 11.)
HUMAN GENETIC DIVERSITY:
The Ultimate Human Genetic Database
* Any two individuals differ in about 3 x 106 bases (0.1%).
* The population is now about 5 x 109.
* A catalog of all sequence differences would require 15 x 1015
* This catalog may be needed to find the rarest or most complex
Fig. 14. Magnitude of Genome Data. If the DNA sequence of the
human genome were compiled in books, the equivalent of 200
volumes the size of a Manhattan telephone book (at 1000 pages
each) would be needed to hold it all. New data-analysis tools
will be needed for understanding the information from genome maps
Fig. 15. Understanding Gene Function. Understanding how genes
function will require analyses of the 3-D structures of the
proteins for which the genes code.