Build NeurEco Classification model with the command line interface#

To build a NeurEco Classification model, run the following command in the terminal:

neurecoDNN build path/to/build/configuration/file/build.conf

The skeleton of a configuration file required to build NeurEco Classification model, here build.conf, looks as follows. Its fields should be filled according to the problem at hand.

{
 "neurecoDNN_build": {
     "DevSettings": {
         "valid_percentage": 33.33,
         "initial_beta_reg": 0.1,
         "validation_indices": "",
         "final_learning": true,
         "disconnect_inputs_if_possible": true
                 },
     "input_normalization": {
         "shift_type": "auto",
         "scale_type": "auto",
         "normalize_per_feature": true
                       },
     "output_normalization": {
         "shift_type": "none",
         "scale_type": "none",
         "normalize_per_feature": false
                        },
     "UserSettings": {
         "gpu_id": 0,
         "use_gpu": false
                  },
     "classification": true,
     "exc_filenames": [],
     "output_filenames": [],
     "validation_exc_filenames": [],
     "validation_output_filenames": [],
     "write_model_to": "model.ednn",
     "write_compression_model_to": "CompModel.ednn",
     "write_decompression_model_to": "DecompModel.ednn",
     "minimum_compression_coefficient": 1,
     "compress_tolerance": 0.02,
     "build_compress": false,
     "starting_from_checkpoint_address": "",
     "checkpoint_address": "ckpt.checkpoint",
     "resume": false,
 }
     }

Building parameters#

The available building parameters in the configuration file are described in the following table.

NeurEco building parameters in python API#
Name	type	description
valid_percentage	float, min=1.0, max=50.0, default=33.33	defines the percentage of the data that will be used as validation data. (NeurEco will automatically choose the best data for validation, to ensure that the created model will have the best fit on unseen data. The modification of this parameter can be of interest when the data set is small and we have to find a good tradeoff between the learning and the validation sets.). This parameter is ignored if validation_indices is specified or validation_exc_filenames and validation_output_filenames are passed.
validation_indices	string, default = “”	address to a csv/npy file on the disk containing the indices of the samples to be used as validation
initial_beta_reg	float, default=0.1	the initial regularization coefficient. In NeurEco, the main source of regularization is parsimony, the beta_reg coefficient ensures that in the beginning of the learning process, if many weight configurations give the same error, the smallest one are chosen. At the end of the learning process, the model is parsimonious and this coefficient is not needed and it goes to zero.
final_learning	boolean, default=True	True if this training is final, False if not. Every data sample matters, if True neureco will try to learn the validation data very carefully at the end of the learning process.
disconnect_inputs_if_possible	boolean default=True	NeurEco will always try to keep its model as small as possible without losing performance wise, so if it finds inputs that do not contribute to the overall performance, it will try to remove all links to them. Setting this field to False will prevent it from disconnecting inputs.
use_gpu	boolean, default=False	indicates whether or not an NVIDIA GPU card will be used for building the model.
gpu_id	integer, default=0	the id of the GPU card on which the user wants to run the building process (in case many GPU cards are available).
input_normalization: shift_type	string, default “auto”	This is the method used to shift the input data. For more details, see Data normalization for Tabular Regression.
input_normalization: scale_type	string, default “auto”	This is the method used to scale the input data. For more details, see Data normalization for Tabular Regression.
input_normalization: normalize_per_feature	boolean, default True	if True shifting and scaling will be performed on each feature in the inputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the inputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression.
output_normalization: shift_type	string, has to be set to “none” for Classification	This is the method used to shift the output data. For more details, see Data normalization for Tabular Regression.
output_normalization: scale_type	string, has to be set to “none” for Classification	This is the method used to scale the output data. For more details, see Data normalization for Tabular Regression.
output_normalization: normalize_per_feature	boolean, has to be set to False for Classification	if True shifting and scaling will be performed on each feature in the outputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the outputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression.
exc_filenames	list of strings, mandatory, default = []	training data: contains the input data table in form the paths of all the input data files. The format of the files can be csv, npy or mat (matlab files).
output_filenames	mandatory, list of strings, default = []	training data: contains the output data in form of the paths of all the output data files. The format of the files can be csv, npy or mat (matlab files).
validation_exc_filenames	list of strings, default = [] (GUI, .conf)	validation data: contains the validation input data table in form of the paths of all the validation input data files. The format of the files can be csv, npy or mat (matlab files).
validation_output_filenames	list of strings, default = []	validation data: contains the validation output data in form of the paths of all the validation output data files. The format of the files can be csv, npy or mat (matlab files).
write_model_to	string, default = “”	the path where the model will be saved.
checkpoint_address	string, default = “”	the path where the checkpoint model will be saved. The checkpoint model is used for resuming the build of a model, or for choosing an intermediate network with less topological optimization steps.
resume	boolean, default=False	if True, resume the build from its own checkpoint in checkpoint_address
starting_from_checkpoint_address	string, default = “”	the path where the checkpoint model is loaded from. This option is checked if the user wants to continue the build of a model from an existing checkpoint, after changing few settings (additional data for example). To use this option in .conf file, make sure that the option resume has its default value False.
start_build_from_model_number	int, default=-1	When resuming a build, specifies which intermediate model in the checkpoint will be used as starting point. when set to -1, NeurEco will choose the last model created as starting point.
freeze_structure	boolean default=False	When resuming a build, NeurEco will only change the weights (not the network architecture) if this variable is set to True.
links_maximum_number	int, default=0	specifies the maximum number of links (trainable parameters) that NeurEco can create. If set to zero, NeurEco will ignore this parameter. Note that this number will be respected in the limits of what NeurEco finds possible.
build_compress	boolean default=False for Classification	if True, the model will perform a nonlinear compression.
minimum_compression_coefficients	int default=1	checked only if build_compress = True, specifies the minimum number of nonlinear coefficients.
compress_tolerance	float eg 0.01, 0.001…, default=0.02	checked only if build_compress = True, specifies the tolerance of the compressor: the maximum error accepted when performing a compression and a decompression on the validation data.
write_compression_model_to	string, default = “”	checked only if build_compress = True, this is the path where the compression model will be saved.
write_decompression_model_to	string, default = “”	checked only if build_compress = True, this is the path where the decompression model will be saved.
compress_decompress_size_ratio	float default=1.0	checked only if build_compress = True, specifies the ratio between the sizes of the compression block and the decompression block. This number is always bigger than 0 and smaller or equal to 1. Note that this ratio will be respected in the limit of what NeurEco finds possible.
classification	boolean, has to be set to True for Classification	specifies if the problem is a classification problem.

Data normalization for Tabular Classification#

A normalization operation for NeurEco is a combination of a \(shift\) and a \(scale\), so that:

\[x_{normalized} = \frac{x-shift}{scale}\]

Allowed shift methods for NeurEco and their corresponding shifted values are listed in the table below:

NeurEco Tabular shifting methods#
Name	shift value
none	\[0\]
min	\[min(x)\]
min_centered	\[-0.5 * (min(x) + max(x))\]
mean	\[mean(x)\]

Allowed scale methods for NeurEco Tabular and their corresponding scaled values are listed in the table below:

NeurEco Tabular scaling methods#
Name	scale value
none	\[1\]
max	\[max(x)\]
max_centered	\[0.5 * (max(x) - min(x))\]
std	\[std(x)\]

Normalization with auto options:

shift is mean and scale is max if the value of mean is far from 0,
shift is none and scale is max if the calculated value of mean is close to 0

If the normalization is performed by feature, and the auto options are chosen, the normalization is performed by group of features. These groups are created based on the values of mean and std.

NeurEco User Manual

Build NeurEco Classification model with the command line interface

Contents

Build NeurEco Classification model with the command line interface#

Building parameters#

Data normalization for Tabular Classification#