• Sat. Jan 11th, 2025

The best Python libraries for parallel processing

Byadmin

Oct 24, 2024



Parsl

Short for “Parallel Scripting Library,” Parsl lets you take computing jobs and split them across multiple systems using roughly the same syntax as Python’s existing Pool objects. It also lets you stitch together different computing tasks into multi-step workflows, which can run in parallel, in sequence, or via map/reduce operations.

Parsl lets you execute native Python applications, but also run any other external application by way of commands to the shell. Your Python code is written like normal Python code, save for a special function decorator that marks the entry point to your work. The job-submission system also gives you fine-grained control over how things run on the targets—for example, the number of cores per worker, how much memory per worker, CPU affinity controls, how often to poll for timeouts, and so on.

One excellent feature Parsl offers is a set of prebuilt templates to dispatch work to a variety of high-end computing resources. This not only includes staples like AWS or Kubernetes clusters, but supercomputing resources (assuming you have access) like Blue Waters, ASPIRE 1, Frontera, and so on. (Parsl was co-developed with the aid of many of the institutions that built such hardware.)



Source link