Data description
The datasets used in the course are described here. Load them in R by:
mydata <- read.table("datafile.csv",header=TRUE,sep=";")
bosson
The spreadsheet (courtesy of Professor Jean-Luc Bosson) contains aneurysm data from 209 patients coming from France or Vietnam. The variables are:
country
: eitherVietnam
orFrance
gender
: eitherM
orF
aneurysm
: size of aneurysm in mmbmi
: body mass indexrisk
: a number of risk factors in 0, …, 5
dimer
The spreadsheet (courtesy of Professor Jean-Luc Bosson) contains log-dimer data over 530 patients, grouped into 3 age classes. The variables are:
log.dimer
: logarithm of d-dimer indices (an indicator for the risk of thrombosis)age
: the age class in 1 (younger than 50), 2 between 50 and 70, 3 older than 70.
feretti
The spreadsheet (courtesy of Professor Gilbert Ferretti) contains data from 43 patients with lung tumor. The variables are:
height
: the height of the tumor in mmdiameter
: the largest diameter of the tumor in mmdensity
: the apparent density innegative
(less dense than water),null
(as dense as water),positive
(denser than water)invasive
: whether the tumor was invasive (yes
orno
).
fires
Data on 270 forest fires in Portugal from P. Cortez and A. Morais The variables are:
month
day
temp
: the temperature in Celsius degreesrelhum
: the relative humidity in percentagewind
: the wind speed in km/harea
: the burned area in hectares.
LenzT, LenzI
Lenz et al.'s transcriptome and clinical data on 414 patients with diffuse large B cell lymphoma (GSE10846). LenzT is a transcriptome data matrix over 17290 protein coding genes. LenzI contains 10 variables of clinical information for the same patients.
LenzT is a numeric matrix of size 414 x 17290. The columns are named by gene symbols, the rows by GSM numbers.
LenzI is a matrix 414 x 10, rows are named by the same GSM numbers as LenzT. Columns are:
gender
: character inmale
,female
age
: numericdiagno
: diagnosis described byABC
(activated B-cell Double-Hit Diffuse Large B-Cell Lymphoma),GCB
(germinal center B-cell Double-Hit Diffuse Large B-Cell Lymphoma),Unclassified
status
: character inalive
,dead
follup
: follow up timeregim
: type of chemotherapy:CHOP
CHOP-Like Regimen,R-CHOP
R-CHOP-Like Regimenecog
: ECOG performance status (0 to 4)stage
: stage of lymphoma (1 to 4)ldhrat
: effusion lactate dehydrogenase (LDH)/serum ratioextnod
: number of extranodal sites (0 to 5)
The transcriptome matrix LenzT has been transformed using function rename, the gene symbols match those of dicoH.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10846
References
Lenz G, Wright G, Dave SS, Xiao W et al. Stromal gene signatures in large-B-cell lymphomas. N Engl J Med 2008 Nov 27;359(22):2313-23.
Cardesa-Salzmann TM, Colomo L, Gutierrez G, Chan WC et al. High microvessel density determines a poor outcome in patients with diffuse large B-cell lymphoma treated with rituximab plus chemotherapy. Haematologica 2011 Jul;96(7):996-1001.
tauber
The spreadsheet (courtesy of Professor Ma"{}t'e Tauber) contains heights and weights of 2891 children from 4 to 7 years old. The variables are:
gender
: male (M
) or female (F
)age
: in monthsheight
: in centimetersweight
: in kilograms. Convert centimeters to inches, dividing by 2.54. Convert kilograms to pounds, multiplying by 2.2064.