Gene expression cancer RNA sequence#

This is a classification data set that comes with the NeurEco installation. It is a collection of data that is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions (giving \(20531\) input features), of patients having different types of tumors (\(5\) output features): BRCA, KIRC, COAD, LUAD and PRAD. Each input is given a dummy name (gene_xx), while the targets are the cancer classes: BRCA, KIRC, COAD, LUAD and PRAD.

The test case is provided with the following files:

Training data set:
- x_train_0.csv: the training inputs file - part 1, containing \(320\) samples
- y_train_0.csv: the training targets file - part 1
- x_train_1.csv: the training inputs file - part 2, containing \(320\) samples
- y_train_1.csv: the training targets file - part 2
testing data set:
- x_test.csv: the testing inputs file, containing \(161\) samples
- y_test.csv: the testing targets file

NeurEco User Manual

Gene expression cancer RNA sequence

Gene expression cancer RNA sequence#