Data preparation for NeurEco Classification with the command line interface
Data preparation for NeurEco Classification with the command line interface#
The command line interface expects the data for model construction or evaluation in form of paths to files containing the data.
The supported formats are:
CSV with “;” or “,” separator;
NumPy .npy
MATLAB MAT-files .mat
Files contain the numerical data, allowed types: int, float, double
Any input file contains a table with:
number of lines equal to a number of samples
number of columns equal to a number of input features
CSV files could have one additional line for a header
Any output file contains a table with:
number of lines equal to a number of samples
number of columns equal to a number of output features, for Classification these features are the classes
the outputs are one-hot encoded: each line contains ‘0’ on all positions, except for one containing ‘1’. This position corresponds to a class to which belongs the sample on the line.
CSV files could have one additional line for a header
input file and the corresponding output file have the same number of samples
The data can be provided in chunks, in multiple input and output files. In this case pay attention to preserving the correspondence between input and output files
There is no need to normalize the data, as the normalization is handled by NeurEco, Data normalization for Tabular Regression.