Mahler: Orchestration engine for high latency systems and dynamic workflows

Source code github.com/bouthilx/mahler

This project is still a prototype. There is no documentation and no testing yet.

For my research projects I developed a framework to automatically set up all available clusters and deploy my experiments. We typically have very little user rights on super computers which makes it difficult to use the orchestration tools developed for the Cloud. With Mahler, I can wrap the super computer schedulers to gain more control over my workflow, better resiliency, and better automation. Thanks to this framework, I have been able to run all experiments for our paper “Unreproducible Research is Reproducible” in less than 5 days without any interventions. That is slightly more than 39k experiments.