Build NeurEco Regression model with the command line interface
Contents
Build NeurEco Regression model with the command line interface#
To build a NeurEco Regression model, run the following command in the terminal:
neurecoDNN build path/to/build/configuration/file/build.conf
The skeleton of a configuration file required to build NeurEco Regression model, here build.conf, looks as follows. Its fields should be filled according to the problem at hand.
1{
2 "neurecoDNN_build": {
3 "DevSettings": {
4 "valid_percentage": 33.33,
5 "initial_beta_reg": 0.1,
6 "validation_indices": "",
7 "final_learning": true,
8 "disconnect_inputs_if_possible": true
9 },
10 "input_normalization": {
11 "shift_type": "auto",
12 "scale_type": "auto",
13 "normalize_per_feature": true
14 },
15 "output_normalization": {
16 "shift_type": "auto",
17 "scale_type": "auto",
18 "normalize_per_feature": true
19 },
20 "UserSettings": {
21 "gpu_id": 0,
22 "use_gpu": false
23 },
24 "classification": false,
25 "exc_filenames": [],
26 "output_filenames": [],
27 "validation_exc_filenames": [],
28 "validation_output_filenames": [],
29 "write_model_to": "model.ednn",
30 "write_compression_model_to": "CompModel.ednn",
31 "write_decompression_model_to": "DecompModel.ednn",
32 "minimum_compression_coefficient": 1,
33 "compress_tolerance": 0.02,
34 "build_compress": false,
35 "starting_from_checkpoint_address": "",
36 "checkpoint_address": "ckpt.checkpoint",
37 "resume": false,
38 }
39 }
Building parameters#
The available building parameters in the configuration file are described in the following table.
Name 
type 
description 

valid_percentage 
float, min=1.0, max=50.0, default=33.33 
defines the percentage of the data that will be used as validation data. (NeurEco will automatically choose the best data for validation, to ensure that the created model will have the best fit on unseen data. The modification of this parameter can be of interest when the data set is small and we have to find a good tradeoff between the learning and the validation sets.). This parameter is ignored if validation_indices is specified or validation_exc_filenames and validation_output_filenames are passed. 
validation_indices 
string, default = “” 
address to a csv/npy file on the disk containing the indices of the samples to be used as validation 
initial_beta_reg 
float, default=0.1 
the initial regularization coefficient. In NeurEco, the main source of regularization is parsimony, the beta_reg coefficient ensures that in the beginning of the learning process, if many weight configurations give the same error, the smallest one are chosen. At the end of the learning process, the model is parsimonious and this coefficient is not needed and it goes to zero. 
final_learning 
boolean, default=True 
True if this training is final, False if not. Every data sample matters, if True neureco will try to learn the validation data very carefully at the end of the learning process. 
disconnect_inputs_if_possible 
boolean default=True 
NeurEco will always try to keep its model as small as possible without losing performance wise, so if it finds inputs that do not contribute to the overall performance, it will try to remove all links to them. Setting this field to False will prevent it from disconnecting inputs. 
use_gpu 
boolean, default=False 
indicates whether or not an NVIDIA GPU card will be used for building the model. 
gpu_id 
integer, default=0 
the id of the GPU card on which the user wants to run the building process (in case many GPU cards are available). 
input_normalization: shift_type 
string, default “auto” 
This is the method used to shift the input data. For more details, see Data normalization for Tabular Regression. 
input_normalization: scale_type 
string, default “auto” 
This is the method used to scale the input data. For more details, see Data normalization for Tabular Regression. 
input_normalization: normalize_per_feature 
boolean, default True 
if True shifting and scaling will be performed on each feature in the inputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the inputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression. 
output_normalization: shift_type 
string, default “auto” 
This is the method used to shift the output data. For more details, see Data normalization for Tabular Regression. 
output_normalization: scale_type 
string, Tabular default “auto” 
This is the method used to scale the output data. For more details, see Data normalization for Tabular Regression. 
output_normalization: normalize_per_feature 
boolean, default True 
if True shifting and scaling will be performed on each feature in the outputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the outputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression. 
exc_filenames 
list of strings, mandatory, default = [] 
training data: contains the input data table in form the paths of all the input data files. The format of the files can be csv, npy or mat (matlab files). 
output_filenames 
mandatory, list of strings, default = [] (GUI, .conf) 
training data: contains the output data in form of the paths of all the output data files. The format of the files can be csv, npy or mat (matlab files). 
validation_exc_filenames 
list of strings, default = [] (GUI, .conf) 
validation data: contains the validation input data table in form of the paths of all the validation input data files. The format of the files can be csv, npy or mat (matlab files). 
validation_output_filenames 
list of strings, default = [] 
validation data: contains the validation output data in form of the paths of all the validation output data files. The format of the files can be csv, npy or mat (matlab files). 
write_model_to 
string, default = “” 
the path where the model will be saved. 
checkpoint_address 
string, default = “” 
the path where the checkpoint model will be saved. The checkpoint model is used for resuming the build of a model, or for choosing an intermediate network with less topological optimization steps. 
resume 
boolean, default=False 
if True, resume the build from its own checkpoint in checkpoint_address 
starting_from_checkpoint_address 
string, default = “” 
the path where the checkpoint model is loaded from. This option is checked if the user wants to continue the build of a model from an existing checkpoint, after changing few settings (additional data for example). To use this option in .conf file, make sure that the option resume has its default value False. 
start_build_from_model_number 
int, default=1 
When resuming a build, specifies which intermediate model in the checkpoint will be used as starting point. when set to 1, NeurEco will choose the last model created as starting point. 
freeze_structure 
boolean default=False 
When resuming a build, NeurEco will only change the weights (not the network architecture) if this variable is set to True. 
links_maximum_number 
int, default=0 
specifies the maximum number of links (trainable parameters) that NeurEco can create. If set to zero, NeurEco will ignore this parameter. Note that this number will be respected in the limits of what NeurEco finds possible. 
build_compress 
boolean default=False for Regression 
if True, the model will perform a nonlinear compression. 
minimum_compression_coefficients 
int default=1 
checked only if build_compress = True, specifies the minimum number of nonlinear coefficients. 
compress_tolerance 
float eg 0.01, 0.001…, default=0.02 
checked only if build_compress = True, specifies the tolerance of the compressor: the maximum error accepted when performing a compression and a decompression on the validation data. 
write_compression_model_to 
string, default = “” 
checked only if build_compress = True, this is the path where the compression model will be saved. 
write_decompression_model_to 
string, default = “” 
checked only if build_compress = True, this is the path where the decompression model will be saved. 
compress_decompress_size_ratio 
float default=1.0 
checked only if build_compress = True, specifies the ratio between the sizes of the compression block and the decompression block. This number is always bigger than 0 and smaller or equal to 1. Note that this ratio will be respected in the limit of what NeurEco finds possible. 
classification 
boolean default=False for Regression 
specifies if the problem is a classification problem. 
Data normalization for Tabular Regression#
A normalization operation for NeurEco is a combination of a \(shift\) and a \(scale\), so that:
Allowed shift methods for NeurEco and their corresponding shifted values are listed in the table below:
Name 
shift value 

none 
\[0\]

min 
\[min(x)\]

min_centered 
\[0.5 * (min(x) + max(x))\]

mean 
\[mean(x)\]

Allowed scale methods for NeurEco Tabular and their corresponding scaled values are listed in the table below:
Name 
scale value 

none 
\[1\]

max 
\[max(x)\]

max_centered 
\[0.5 * (max(x)  min(x))\]

std 
\[std(x)\]

Normalization with auto options:
shift is mean and scale is max if the value of mean is far from 0,
shift is none and scale is max if the calculated value of mean is close to 0
If the normalization is performed by feature, and the auto options are chosen, the normalization is performed by group of features. These groups are created based on the values of mean and std.