The “ENCyclopedia Of DNA Elements”, ENCODE, founded in 2003 with grants from the NIH Genome Institute, seeks to identify
all the functional parts of the human genome, assessed by DNA and histone
modifications, chromatin looping, transcription factor binding, chromatin
compaction (DNAse accessibility), and transcripts. The collaboration of ~37 groups, first developed technology. Recently they published their
first salvo of 30 research papers, several published in Nature along with a News & Views.
Sunday, October 7, 2012
ENCODE salvages “junk” DNA
The paper by Djebali and scores of colleagues offers “a
genome-wide catalogue of human transcripts”, together with their location (nucleus
or cytoplasm), and whether they have a 7mG cap 5’ or a poly-A tail 3’. They prepared RNA from 15 human cells lines
after fractionation (whole cell, nucleus and cytosol) and separation of RNA
into short and long (>200 nucleotides).
Long RNAs were further separated into +/- polyA tails. They sequenced these RNAs and determined their initiation sites and their 5’ and 3’ termini (using
technologies felicitously named CAGE and PET). Then they did bioinformatics: compared to annotated
genome (GENCODE) statistics, etc., All these data are
available for your perusal using the RNA Dashboard.
They made many interesting observations; e.g., they conclude
there is very little “junk” DNA. Nearly
75% of the genome is transcribed in at least one of the cell lines, though only
a little over 50% in any given line.
(This is similar to previous findings, albeit not as
“encyclopedic”). Only 28% of the 7,053
small RNAs (including snRNAs, snoRNAs, miRNAs, and tRNAs) annotated by GENCODE
are found in any of these cell lines, suggesting the expression of many
annotated small RNAs is cell type specific.
They also find that protein-coding transcripts are more
abundant than long non-coding RNAs (lncRNAs) and that the same genes are transcribed
in different cells. Figure 3, shown
here, plots the number of transcripts (r.p.k.m., reads per kilobase per million
reads) on the x axis vs. the ratio of nuclear/cytoplasmic for protein-coding
(orange), which are abundant (right) in the cytoplasm (down), non-coding (blue), and novel intergenic (green),
which tend to be expressed at lower levels (left) and mostly nuclear (up). A few individual transcripts are also
identified, giving appreciation for the range of expression.
Also not for the first time, they suggest that shrinking “intergenic”
regions “prompts the
reconsideration of the definition of a gene”. They “propose that the transcript
be considered as the basic atomic unit of inheritance”
and that “gene … denote … all those transcripts …. that contribute to a
given phenotypic trait". Mendel would
approve.
PubMed. Djebali et al. "Landscape of transcription in human cells." 2012 Nature Sep 6;489(7414):101-8.
Subscribe to:
Posts (Atom)