Tuesday, October 7, 2008

Genome Modules Track Disease

The idea is alluring: take a little blood, measure some gene transcripts, and diagnose disease. The problems have been (1) identifying the genes that signal disease, and (2) overcoming natural and laboratory variations. Chaussabel and colleagues approached these problems with reliable microarray measurements of samples from 239 people and by identifying small sets, “modules”, of genes that are coordinately-expressed across a “wide range” of conditions. These modules are likely to be more reproducible than measurements of individual genes. The (seemingly arbitrary) conditions included juvenile idiopathic arthritis (47 patients), lupus (40), and type I diabetes (20), melanoma (39), immune-suppression after liver transplantation (37), and infections with E. coli (22), Staph aureus (18), and influenza (16). Genes from all 239 samples were clustered using a “K-means algorithm” with K=30 (which yields up to 30 groups) without regard to the magnitude of change in expression. The first round grouped all 8 conditions, the second round, 7, and the third, 6 (procedure). A total of nearly 5,000 transcripts in 28 modules were identified using the sample data. When the data were randomized, no modules were identified in 200 trial clusterings, suggesting that the modules reflect states of health and are not statistical artifacts. Modules range from 22 to 325 transcripts. Genes with known relationships, e.g. particular cell types or pathways, constitute about half the modules, underscoring the functional coherence.

Two examples of how health conditions change these 28 modules are shown: healthy vs. melanoma (top) and healthy vs. lupus (bottom) (from fig. 1B, red=overexpressed, blue=underexpressed). All 8 conditions are clearly distinct from healthy and distinguished from each other. The authors also identified 'biomarker' modules, e.g. M1.2 & M1.8 in melanoma or M1.7 & M3.1 in lupus in the examples shown, and made circular ('spider') graphs that can display several patients or one patient over a course of treatment.

In contrast to typical repository-biomarker analyses, very few patient samples were used to generate these modules. It will be crucial to see how they accommodate more, different samples. Also, I'm curious whether rational groupings of 'conditions', e.g., cancers or infections or autoimmune diseases, might further improve module definition. Finally, in addition to improving patient care, this information should provide invaluable insights into disease origin and progression.
Chaussabel et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity. 2008 Jul;29(1):150-64.