Many more than 23,000 Genes?

A gene, the unit of heredity that encodes a trait, is hard to define in molecular terms. The best known examples of genes encode proteins that contribute to observable phenotypes, such as purple flowers or blue eyes. The genome sequencers have identified about 23,000 genes in the human genome, mostly protein coding. The FANTOM (Functional Annotation of Mouse) Consortium took a broader view, using a number of technologies to identify full length RNA transcripts with unique starts (5’) and stops (3’). They identified 181,047 independent transcripts! Previously unidentified proteins are encoded by 5,154 transcripts. Over one-half of the genome is transcribed on one or both strands without gaps, forming 18,461 transcription “forests” separated by deserts devoid of transcripts. The lengths of these transcripts show 2 major peaks around 2 kb and 20 kb and a minor peak at ~100 Mb. Unbiased analysis of transcription using “tiling arrays”, which probe continuous lengths of the genome exhaustively, have also suggested that there are many more genes than earlier estimates. The authors point out that these results raise concerns about the interpretation of microarray expression studies and gene manipulated mice (knock-ins and -outs). They conclude by estimating the enormous scale of the reevaluated genome code.
Reuel said...

About Average?

Dihydrofolate reductase, DHFR, is an ancient "housekeeping" enzyme that moves a methyl group required for the synthesis of purines and some amino acids. DHFR gene is encoded in only about 3 kbp on 6 exons spread over about 30 kb (1 gene/30kb). This seems to be a sparse code.

Even though this gene is huge, at this rate, a 3 Gb genome would encode 100,000 genes!

The numbers: 3x10^9 bp/genome x 1gene/3x10^4 bp = 10^5 genes/genome.