Available templates

cookietemple currently has the following templates available:

  1. mlflow-pytorch

  2. mlflow-tensorflow

  3. mlflow-xgboost

  4. package-prediction

In the following every template is devoted its own section, which explains its purpose, design, included frameworks/libraries, usage and frequently asked questions. A set of frequently questions, which all templates share see here: Shared FAQ FAQ. It is recommended to use the sidebar to navigate this documentation, since it is very long and cumbersome to scroll through.

mlflow-pytorch

Purpose

mlflow-pytorch is a MLflow based template designed for Pytorch machine learning models. Additionally, Pytorch Lightning abstracts away boilerplate code. The project is fully CPU and GPU deterministic with system-intelligence integration. Additionally, Conda and Docker are supported out of the box.

Design

The package follows the mlf-core convention of a single environment.yml file in conjunction with an mlf-core based Dockerfile. As required a MLproject file serves as entry point and for parameter definitions.

├── AUTHORS.rst
├── .bandit.yml
├── CHANGELOG.rst
├── CODE_OF_CONDUCT.rst
├── Dockerfile
├── docs
│   ├── authors.rst
│   ├── changelog.rst
│   ├── code_of_conduct.rst
│   ├── conf.py
│   ├── index.rst
│   ├── make.bat
│   ├── Makefile
│   ├── model.rst
│   ├── readme.rst
│   ├── requirements.txt
│   ├── _static
│   │   └── custom_cookietemple.css
│   └── usage.rst
├── .editorconfig
├── environment.yml
├── project_slug
│   ├── data_loading
│   │   ├── data_loader.py
│   ├── project_slug.py
│   ├── mlf_core
│   │   ├── mlf_core.py
│   ├── model
│   │   ├── model.py
├── .flake8
├── .github
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   ├── general_question.md
│   │   └── sync_notify.md
│   ├── pull_request_template.md
│   └── workflows
│       ├── build_docs.yml
│       ├── run_mlf_core_lint.yml
│       ├── pr_to_master_from_patch_release_only.yml
│       ├── run_bandit.yml
│       ├── run_flake8_linting.yml
│       ├── sync.yml
│       └── train_cpu.yml
│       └── publish_docker.yml
├── .gitignore
├── LICENSE
├── mlf_core.cfg
├── .mlf_core.yml
├── MLproject
├── README.rst
└── .readthedocs.yml

Included frameworks/libraries

  1. MLflow as the primary framework for parameter and artifact logging.

  2. Pytorch as the primary machine learning library.

  3. system-intelligence to fetch all hardware related information.

  4. Preconfigured readthedocs

  5. Seven Github workflows:

  1. build_docs.yml, which builds the readthedocs documentation.

  2. run_flake8_linting.yml, which runs flake8 linting.

  3. pr_to_master_from_patch_release_only.yml Please read pr_to_master_from_patch_release_only workflow.

  4. train_cpu.yml, which trains the model on the CPU for a small number of epochs. Requires the data to be accessible.

  5. sync.yml, which checks whether a new version of mlflow-pytorch is available and submits a pull request if so.

  6. run_mlf_core_lint.yml, which runs mlf-core lint to verify that the project adheres to all mlf-core standards.

  7. run_bandit.yml, which runs Bandit to find any security issues.

  8. publish_docker.yml, which builds and pushes a Docker container to Github Packages

Usage

It is strongly advised to use Docker to run mlf-core models, since support for other OS besides Linux is limited and dependency management is greatly simplified.

Building the Docker container

The name (=tag) of the Docker Container is specified in the MLproject file in image:. If you created your project with mlf-core’s Github support your Docker container should automatically be building in your Github Container Registry. Furthermore, it should match the name specified in the MLproject file. If the Docker container is not available in your Github Container Registry for any reason you must build it locally. Run: docker build -t ghcr.io/GITHUB_USERNAME/PROJECT_SLUG:version ., where PROJECT_SLUG is your project’s name and version the current project version. The MLproject file can always be examined to determine the required Docker container tag.

Running the project with Docker

After having build the Docker container you can now launch your project with mlflow run .. The Docker container will automatically spin up.

Note if you want to run your project with GPU support you must have the NVIDIA Container Toolkit installed. Moreover, you need to pass additional Docker runtime arguments e.g. mlflow run . -A gpus=all, which makes all available GPUs accessible to the Docker container.

Running the project with Conda

Running the project using Conda is possible, but discouraged, since system-intelligence currently only really supports Linux. Comment out docker_env and comment in conda_env. Now run the project using e.g. mlflow run .. GPUs will be automatically be detected and used.

FAQ

I am using Docker but no GPUs are used for training!

Please ensure that you have CUDA configured, the NVIDIA Container Toolkit installed and pass -A gpus=all when running the project.

mlflow-tensorflow

Purpose

mlflow-tensorflow is a MLflow based template designed for Tensorflow (version 2+) machine learning models. The project is fully CPU and GPU deterministic with system-intelligence integration. Additionally, Conda and Docker are supported out of the box.

Design

The package follows the mlf-core convention of a single environment.yml file in conjunction with an mlf-core based Dockerfile. As required a MLproject file serves as entry point and for parameter definitions.

├── AUTHORS.rst
├── .bandit.yml
├── CHANGELOG.rst
├── CODE_OF_CONDUCT.rst
├── Dockerfile
├── docs
│   ├── authors.rst
│   ├── changelog.rst
│   ├── code_of_conduct.rst
│   ├── conf.py
│   ├── index.rst
│   ├── make.bat
│   ├── Makefile
│   ├── model.rst
│   ├── readme.rst
│   ├── requirements.txt
│   ├── _static
│   │   └── custom_cookietemple.css
│   └── usage.rst
├── .editorconfig
├── environment.yml
├── exploding_springfield
│   ├── data_loading
│   │   ├── data_loader.py
│   ├── exploding_springfield.py
│   ├── mlf_core
│   │   ├── mlf_core.py
│   ├── model
│   │   ├── model.py
│   └── training
│       └── train.py
├── .flake8
├── .github
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   ├── general_question.md
│   │   └── sync_notify.md
│   ├── pull_request_template.md
│   └── workflows
│       ├── build_docs.yml
│       ├── mlf_core_lint.yml
│       ├── pr_to_master_from_patch_release_only.yml
│       ├── run_bandit.yml
│       ├── run_flake8_linting.yml
│       ├── sync.yml
│       └── train_cpu.yml
│       └── publish_docker.yml
├── .gitignore
├── LICENSE
├── mlf_core.cfg
├── .mlf_core.yml
├── MLproject
├── README.rst
└── .readthedocs.yml

Included frameworks/libraries

  1. MLflow as the primary framework for parameter and artifact logging.

  2. Tensorflow (version 2+) as the primary machine learning library.

  3. system-intelligence to fetch all hardware related information.

  4. Preconfigured readthedocs

  5. Five Github workflows:

  1. build_docs.yml, which builds the readthedocs documentation.

  2. run_flake8_linting.yml, which runs flake8 linting.

  3. pr_to_master_from_patch_release_only.yml Please read pr_to_master_from_patch_release_only workflow.

  4. train_cpu.yml, which trains the model on the CPU for a small number of epochs. Requires the data to be accessible.

  5. sync.yml, which checks whether a new version of mlflow-pytorch is available and submits a pull request if so.

  6. run_mlf_core_lint.yml, which runs mlf-core lint to verify that the project adheres to all mlf-core standards.

  7. run_bandit.yml, which runs Bandit to find any security issues.

  8. publish_docker.yml, which builds and pushes a Docker container to Github Packages

Usage

It is strongly advised to use Docker to run mlf-core models, since support for other OS besides Linux is limited and dependency management is greatly simplified.

Building the Docker container

The name (=tag) of the Docker Container is specified in the MLproject file in image:. If you created your project with mlf-core’s Github support your Docker container should automatically be building in your Github Container Registry. Furthermore, it should match the name specified in the MLproject file. If the Docker container is not available in your Github Container Registry for any reason you must build it locally. Run: docker build -t ghcr.io/GITHUB_USERNAME/PROJECT_SLUG:version ., where PROJECT_SLUG is your project’s name and version the current project version. The MLproject file can always be examined to determine the required Docker container tag.

Running the project with Docker

After having build the Docker container you can now launch your project with mlflow run .. The Docker container will automatically spin up.

Note if you want to run your project with GPU support you must have the NVIDIA Container Toolkit installed. Moreover, you need to pass additional Docker runtime arguments e.g. mlflow run . -A gpus=all, which makes all available GPUs accessible to the Docker container.

Running the project with Conda

Running the project using Conda is possible, but discouraged, since system-intelligence currently only really supports Linux. Comment out docker_env and comment in conda_env. Now run the project using e.g. mlflow run .. GPUs will be automatically be detected and used.

FAQ

I am using Docker but no GPUs are used for training!

Please ensure that you have CUDA configured, the NVIDIA Container Toolkit installed and pass -A gpus=all when running the project.

mlflow-xgboost

Purpose

mlflow-xgboost is a MLflow based template designed for XGBoost machine learning models. The project is fully CPU and GPU deterministic with system-intelligence integration. Additionally, Conda and Docker are supported out of the box.

Design

The package follows the mlf-core convention of a single environment.yml file in conjunction with an mlf-core based Dockerfile. As required a MLproject file serves as entry point and for parameter definitions.

├── AUTHORS.rst
├── .bandit.yml
├── CHANGELOG.rst
├── CODE_OF_CONDUCT.rst
├── Dockerfile
├── docs
│   ├── authors.rst
│   ├── changelog.rst
│   ├── code_of_conduct.rst
│   ├── conf.py
│   ├── index.rst
│   ├── make.bat
│   ├── Makefile
│   ├── model.rst
│   ├── readme.rst
│   ├── requirements.txt
│   ├── _static
│   │   └── custom_cookietemple.css
│   └── usage.rst
├── .editorconfig
├── environment.yml
├── exploding_springfield
│   ├── data_loading
│   │   ├── data_loader.py
│   ├── exploding_springfield.py
│   ├── mlf_core
│   │   ├── mlf_core.py
├── .flake8
├── .github
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   ├── general_question.md
│   │   └── sync_notify.md
│   ├── pull_request_template.md
│   └── workflows
│       ├── build_docs.yml
│       ├── mlf_core_lint.yml
│       ├── pr_to_master_from_patch_release_only.yml
│       ├── run_bandit.yml
│       ├── run_flake8_linting.yml
│       ├── sync.yml
│       └── train_cpu.yml
│       └── publish_docker.yml
├── .gitignore
├── LICENSE
├── mlf_core.cfg
├── .mlf_core.yml
├── MLproject
├── README.rst
└── .readthedocs.yml

Included frameworks/libraries

  1. MLflow as the primary framework for parameter and artifact logging.

  2. XGBoost as the primary machine learning library.

  3. system-intelligence to fetch all hardware related information.

  4. Preconfigured readthedocs

  5. Five Github workflows:

  1. build_docs.yml, which builds the readthedocs documentation.

  2. run_flake8_linting.yml, which runs flake8 linting.

  3. pr_to_master_from_patch_release_only.yml Please read pr_to_master_from_patch_release_only workflow.

  4. train_cpu.yml, which trains the model on the CPU for a small number of epochs. Requires the data to be accessible.

  5. sync.yml, which checks whether a new version of mlflow-pytorch is available and submits a pull request if so.

  6. run_mlf_core_lint.yml, which runs mlf-core lint to verify that the project adheres to all mlf-core standards.

  7. run_bandit.yml, which runs Bandit to find any security issues.

  8. publish_docker.yml, which builds and pushes a Docker container to Github Packages

Usage

It is strongly advised to use Docker to run mlf-core models, since support for other OS besides Linux is limited and dependency management is greatly simplified.

Building the Docker container

The name (=tag) of the Docker Container is specified in the MLproject file in image:. If you created your project with mlf-core’s Github support your Docker container should automatically be building in your Github Container Registry. Furthermore, it should match the name specified in the MLproject file. If the Docker container is not available in your Github Container Registry for any reason you must build it locally. Run: docker build -t ghcr.io/GITHUB_USERNAME/PROJECT_SLUG:version ., where PROJECT_SLUG is your project’s name and version the current project version. The MLproject file can always be examined to determine the required Docker container tag.

Running the project with Docker

After having build the Docker container you can now launch your project with mlflow run .. The Docker container will automatically spin up.

Note if you want to run your project with GPU support you must have the NVIDIA Container Toolkit installed. Moreover, you need to pass additional Docker runtime arguments e.g. mlflow run . -A gpus=all, which makes all available GPUs accessible to the Docker container.

Running the project with Conda

Running the project using Conda is possible, but discouraged, since system-intelligence currently only really supports Linux. Comment out docker_env and comment in conda_env. Now run the project using e.g. mlflow run .. GPUs will be automatically be detected and used.

FAQ

I am using Docker but no GPUs are used for training!

Please ensure that you have CUDA configured, the NVIDIA Container Toolkit installed and pass -A gpus=all when running the project.

package-prediction

Purpose

package-prediction is a template designed to easily distribute PyPI packages of machine learning models. The template only provides the boilerplate code to load models and perform predictions. Data wrangling or model training should be done using the mlflow templates.

Design

The package is closely related to cookietemple’shttps://github.com/cookiejar/cookietemple cli-python template. It is primarily based on setuptools to build the package and uses a Github workflow to easily upload the package to PyPI. Any prediction boiler code is machine learning library specific.

├── AUTHORS.rst
├── .bandit.yml
├── CHANGELOG.rst
├── CODE_OF_CONDUCT.rst
├── .coveragerc
├── Dockerfile
├── docs
│   ├── authors.rst
│   ├── changelog.rst
│   ├── code_of_conduct.rst
│   ├── conf.py
│   ├── index.rst
│   ├── installation.rst
│   ├── make.bat
│   ├── Makefile
│   ├── readme.rst
│   ├── requirements.txt
│   ├── _static
│   │   └── custom_cookietemple.css
│   └── usage.rst
├── .editorconfig
├── exploding_springfield
│   ├── cli_pytorch.py
│   ├── cli_tensorflow.py
│   ├── cli_xgboost.py
│   ├── data
│   │   └── xgboost_test_data.tsv
│   ├── __init__.py
│   ├── models
│   │   └── xgboost_test_model.xgb
├── .gitattributes
├── .github
│   ├── dependabot.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   └── general_question.md
│   ├── pull_request_template.md
│   └── workflows
│       ├── build_docs.yml
│       ├── build_package.yml
│       ├── pr_to_master_from_patch_release_only.yml
│       ├── publish_package.yml
│       ├── run_bandit.yml
│       ├── run_flake8_linting.yml
│       ├── run_mlf_core_lint.yml
│       └── sync.yml
├── .gitignore
├── LICENSE
├── Makefile
├── makefiles
│   ├── Linux.mk
│   └── Windows.mk
├── MANIFEST.in
├── mlf_core.cfg
├── .mlf_core.yml
├── README.rst
├── .readthedocs.yml
├── requirements_dev.txt
├── requirements.txt
├── setup.cfg
└── setup.py

Included frameworks/libraries

  1. setuptools for code packaging

  2. click for the command line interface

  3. One of Pytorch, Tensorflow or XGBoost,

  4. Preconfigured readthedocs

  5. Six Github workflows:

  1. build_docs.yml, which builds the readthedocs documentation.

  2. build_package.yml, which builds the cli-python package.

  3. run_flake8_linting.yml, which runs flake8 linting.

  4. publish_package.yml, which publishes the package to PyPi. Note that it only runs on Github release and requires PyPi secrets to be set up.

  5. run_bandit, run bandit to discover security issues in your python code

  6. pr_to_master_from_patch_release_only: Please read pr_to_master_from_patch_release_only workflow.

Publishing the package to PyPI

Ensure that your package builds and passes any twine checks. The build_package.yml workflow verifies both. If the workflow passes you should open a pull request to master and merge it after reviews. The only thing left to do now is to create a release on Github. Ensure that your PyPI secrets are set. Follow the instructions on Encrypted Secrets if required.

Shared FAQ

How do I access my data when running inside a Docker container?

mlf-core projects by default mount /data to /data inside the Docker container. Hence, add a /data folder and access the files by assuming that the files are in /data.

How do I publish my documentation?

mlf-core ships with a full, production ready Read the Docs setup and with a complete gh-pages setup.

Read the Docs

You need to import your documentation on Read the Docs website. Do not forget to sync your account first to see your repository. Your documentation will then be available on https://repositoryname.readthedocs.io/

Github Pages

Your documentation is automatically pushed to the gh-pages branch. Follow the documentation on configuring a publishing source for your Github pages site and select the gh-pages branch. Your documentation will then be available on https://username.github.io/repositoryname.

How do I add a new template?

Please follow Adding new templates.