💪 Models training

💪 Models training #

All training of neural networks is carried out from the Python’s project, once the cached training datasets have been created. The configuration of the training session is to be set in src/main/python/highlighter/main.py. By default, the process tests all possible combinations of testing parameters. Hence, training can be launched by running the following from the Python’s project root folder

python main.py

During training, per fold, the same neural network is trained, validated and tested on each dataset fold (3). Hence, three PyTorch neural networks are save to disk in src/main/python/saved_models, and logs for the whole session in src/main/python/save_model_losses. Such generated files do also carry details regarding the configuration of the training session and are necessary for the execution of not only accuracy and speed tests, but also rendering of syntax highlighted files using trained neural networks. Files in saved_models and save_model_losses are named after the configuration they reference in particular, the following substructure may be found:

<lang-name>_<execution-number>_\
<task-id>_<nn-model>_<embedding-layer-dim>embs_\
<input-dim>id_<width-hidden-layers>hd_\
<num-hidden-layers>hl_\
<is-bidirectional-network>bid

hence, for example:

java_1_28_RNNClassifier1_128embs_109id_32hd_1hl_Falsebid

Please note that according to the syntax highlighting coverage tasks described in the paper this replication package is linked to, task 1 through to 4 are <task-id>: 28, 37, 55 and 66 respectively.