Data description
The datasets used in the course are described here. Load them in R by:
mydata <- read.table("datafile.csv",header=TRUE,sep=";")
bosson
The spreadsheet (courtesy of Professor Jean-Luc Bosson) contains aneurysm data from 209 patients coming from France or Vietnam. The variables are:
country: eitherVietnamorFrancegender: eitherMorFaneurysm: size of aneurysm in mmbmi: body mass indexrisk: a number of risk factors in 0, …, 5
dimer
The spreadsheet (courtesy of Professor Jean-Luc Bosson) contains log-dimer data over 530 patients, grouped into 3 age classes. The variables are:
log.dimer: logarithm of d-dimer indices (an indicator for the risk of thrombosis)age: the age class in 1 (younger than 50), 2 between 50 and 70, 3 older than 70.
feretti
The spreadsheet (courtesy of Professor Gilbert Ferretti) contains data from 43 patients with lung tumor. The variables are:
height: the height of the tumor in mmdiameter: the largest diameter of the tumor in mmdensity: the apparent density innegative(less dense than water),null(as dense as water),positive(denser than water)invasive: whether the tumor was invasive (yesorno).
fires
Data on 270 forest fires in Portugal from P. Cortez and A. Morais The variables are:
monthdaytemp: the temperature in Celsius degreesrelhum: the relative humidity in percentagewind: the wind speed in km/harea: the burned area in hectares.
LenzT, LenzI
Lenz et al.'s transcriptome and clinical data on 414 patients with diffuse large B cell lymphoma (GSE10846). LenzT is a transcriptome data matrix over 17290 protein coding genes. LenzI contains 10 variables of clinical information for the same patients.
LenzT is a numeric matrix of size 414 x 17290. The columns are named by gene symbols, the rows by GSM numbers.
LenzI is a matrix 414 x 10, rows are named by the same GSM numbers as LenzT. Columns are:
gender: character inmale,femaleage: numericdiagno: diagnosis described byABC(activated B-cell Double-Hit Diffuse Large B-Cell Lymphoma),GCB(germinal center B-cell Double-Hit Diffuse Large B-Cell Lymphoma),Unclassifiedstatus: character inalive,deadfollup: follow up timeregim: type of chemotherapy:CHOPCHOP-Like Regimen,R-CHOPR-CHOP-Like Regimenecog: ECOG performance status (0 to 4)stage: stage of lymphoma (1 to 4)ldhrat: effusion lactate dehydrogenase (LDH)/serum ratioextnod: number of extranodal sites (0 to 5)
The transcriptome matrix LenzT has been transformed using function rename, the gene symbols match those of dicoH.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10846
References
Lenz G, Wright G, Dave SS, Xiao W et al. Stromal gene signatures in large-B-cell lymphomas. N Engl J Med 2008 Nov 27;359(22):2313-23.
Cardesa-Salzmann TM, Colomo L, Gutierrez G, Chan WC et al. High microvessel density determines a poor outcome in patients with diffuse large B-cell lymphoma treated with rituximab plus chemotherapy. Haematologica 2011 Jul;96(7):996-1001.
tauber
The spreadsheet (courtesy of Professor Ma"{}t'e Tauber) contains heights and weights of 2891 children from 4 to 7 years old. The variables are:
gender: male (M) or female (F)age: in monthsheight: in centimetersweight: in kilograms. Convert centimeters to inches, dividing by 2.54. Convert kilograms to pounds, multiplying by 2.2064.