ThunderstormDistributor - Job Queuing System for Dynamic Clusters


ThunderstormDistributor is a queuing system that distributes jobs and computational workload across dynamic clusters in the cloud. It manages the assignment of jobs to maximize CPU and memory usage and prevent oversubscription of compute nodes. It also performs advanced statistics collection on individual compute nodes and jobs to graph the distribution of disk, network, CPU, and memory usage over time, which facilitates the advanced optimization and tuning of computational workflows.


RainforestCluster and ThunderstormDistributor

I have used these two programs together to render a fractal movie based on my fractal art (fullscreen is best):

This is a screenshot of the statistics viewer:


Open-Source Version Download (Beta)

This version of ThunderstormDistributor includes job queuing across multiple machines on dynamic clusters, including user-specific execution, memory, CPU, real time, and wall time reservations and limits, and full statistics collection functionality.  It also supports dynamic resource allocation, to prevent node oversubscription, and job management, including login, cancellation, and termination. RainforestCluster can setup ThunderstormDistributor for a custom cluster on Amazon's EC2 cloud service. Otherwise, ThunderstormDistributor can run on other clusters, or even local or static clusters; however, manual setup is required. It does not have any Amazon dependencies, only requiring the Qt library and Linux (Kernel 2.6.18+).  As it is in beta, debug logging is turned on, and it only supports one queue.

ThunderstormDistributor is available for download under the MIT license on its SourceForge project page.

You can view the README, usage guide, and custom installation instructions (for use with or without RainforestCluster) here.