Dask for task scheduling

4:25pm - 5:05pm on Friday, October 6 in PennTop North

Matthew Rocklin

Audience Level:
Intermediate
Watch:
https://youtu.be/hiPvmeLhInw

Overview

Dask is a library for parallel and distributed computing for Python, commonly known for parallelizing libraries like NumPy and pandas. This talk discusses using Dask for task scheduling workloads, such as might be handled by Celery and Airflow, in a scalable and accessible manner.

Description

Dask is a library for parallel and distributed computing for Python, commonly known for parallelizing libraries like NumPy and pandas. This talk discusses using Dask for task scheduling workloads, such as might be handled by Celery and Airflow, in a scalable and accessible manner.

Most previous talks on Dask focus on “big data” collections like distributed pandas dataframes. In this talk we’ll diverge a bit and talk about more real-time and fine-grained settings. We’ll discuss dask’s concurrent.futures interface, integration with await/async syntax, dynamic workload handling, and more. This will focus more on the web-backend crowd than on the data-science crowd.

Want to edit this page?