Gene expression cancer RNA sequence#

This is a classification data set that comes with the NeurEco installation. It is a collection of data that is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions (giving \(20531\) input features), of patients having different types of tumors (\(5\) output features): BRCA, KIRC, COAD, LUAD and PRAD. Each input is given a dummy name (gene_xx), while the targets are the cancer classes: BRCA, KIRC, COAD, LUAD and PRAD.

The test case is provided with the following files:

  • Training data set:

    • x_train_0.csv: the training inputs file - part 1, containing \(320\) samples

    • y_train_0.csv: the training targets file - part 1

    • x_train_1.csv: the training inputs file - part 2, containing \(320\) samples

    • y_train_1.csv: the training targets file - part 2

  • testing data set:

    • x_test.csv: the testing inputs file, containing \(161\) samples

    • y_test.csv: the testing targets file