This page contains various resources relating to the genomic data integration work done in the Inference Research Group at the Department of Computing Science, University of Glasgow
The HMEC dataset was produced by H. Steven Wiley (and collaborators) at the Pacific Northwestern Laboratory, USA. The dataset consists of transcriptomic and proteomic profiles for approximately 500 human genes from the HMEC cell line. Measurements were taken between 0 and 24h after the cells were stimulated with EGF. Below we provide the dataset in matlab form (other formats may be available on request) that was used in Investigating the correspondence between transcriptomic and proteomic profiles using coupled cluster models, Rogers et al., Bioinformatics, 2009. If using the data in your research, please remember to cite the original paper (and the aforementioned Bioinformatics paper if necessary):
Download: hmec.mat
On loading hmec.mat, you will be presented with the following variables
- data : float, data{1} is the G x T matrix of mRNA values, data{2} is the G x T matrix of protein expression values. Note that there is one fewer time-point in the mRNA data due to one array (15 minutes) failing the quality control. The time points therefore correspond to [1hr 4hr 8hr 13hr 18hr 24hr] for mRNA and [15' 1hr 4hr 8hr 13hr 18hr 24hr] for proteins.
- names : a structure containing the various IDs of the genes.
Note that the row order is preserved across the various variables. The data have been normalised to have zero mean and unit standard deviation for each gene in each representation. If you require anything else, please feel free to contact us (srogers@dcs.gla.ac.uk) as we may already have it!
Download: jointmix.m testjointmix.m
Matlab code for the joint mixture model described in Rogers et al. Bioinformatics 2008, and a script that generates and analyses some example data. Note that this code is supplied with the usual disclaimers! Any questions, email us.