Our manuscript on a combined data and knowledge driven approach to sentinel gene selection is now available on PLOS One! Sentinel genes are a carefully selected subset of genes that can be used to compute extrapolated expression changes for the rest of the transcriptome. Judicious selection of this subset can facilitate targeted gene expression profiling for high-throughput transcriptomics (HTT). This can help to bring down the cost of large-scale transcriptomics experiments by a factor of 10 and allow for the analysis of larger numbers of samples. Our scientists, in collaboration with others, have developed a computational model that first uses a data-driven approach to choose 1500 sentinel genes (S1500) and then supplements this set with additional genes that are selected using a knowledge-driven approach, resulting in the S1500+ gene set. The final S1500+ gene set has undergone additional revisions with input from the research community and the final version can be downloaded here.
We have analyzed the performance of this method on a dataset consisting of all publicly available human microarray data for the Affymetrix platform (HG-U133plus2) and ran experiments to assess extrapolation and pathway coverage. Results show that the S1500+ gene set covers all major pathways in MSigDB 4.0 and outperforms a set of 1500 randomly picked genes with regards to extrapolation.
The next big challenge after selection of sentinel genes is to develop accurate extrapolation methods for gene inference, and our scientists are already hard at work on this problem, with promising early results! Watch this space to learn more about how Sciome is tackling this challenge with cutting-edge informatics.