💾 Evaluation datasets #
Folds #
Having completed the preprocessing steps to generate an oracle, one must generate the training 66-33
cross-validation folds and snippets. Once again, this is already provided by this distribution in
src/main/resources/<lang-name>/folds
. The Python’s subproject carries out such task, hence the following commands are
to be run within this subproject’s root directory.
For this purpose, the following command should be run:
python oracle_cache_generator.py generate_folds <lang-name>
where <lang-name>
is one of the three languages analyzed, as given above.
Snippets #
Snippets for each fold are generated through the command below and only after folds have been produced.
python oracle_cache_generator.py generate_snippets <lang-name>
Cache #
Cache is a more efficient representation of the training datasets, in the form of collections of PyTorch Tensors,
especially performed for the training of Neural Networks.
Although these are already provided in
src/main/resources/<lang-name>/foldscach
one can generate them by running:
python oracle_cache_generator.py generate_cache <lang-name>
Please note that for this to work all folds need to be generated first.
Run All #
All of the pipelines above can be run in once with the command:
python oracle_cache_generator.py all <lang-name>