About the workshop¶
We will be taking a look at the basic concepts of data pipelines as well as practical use cases using Python and libraries like pandas, matplotlib, and tensorflow.
About you:¶
- Some experience using the command line
- Intermediate Python knowledge / use
- Be able to apply what we learn and adopt to your use cases
- Interested in data and systems
- Aspring or current data engineering
- Some knowledge about systems and databases (enough to be dangerous)
Our focus for the day¶
- Greater understanding on how to apply data pipelines using the Python and libraries in the Python scientific ecosystem
- Focus on concepts (rather than complex implementations)
- Practical knowledge application
- Create the building blocks needed for your day-to-day work
Keeping on track¶
You will find 🚦 across the tutorial examples. We will use this to identify how folks are doing over the workshop (if following along in person). These will indicate practical or hands-on portions of the tutorial.
Additional tutorial (PyCon US)¶
For another (much longer) tutorial integrating MYSQL and Twitter stream data check out https://github.com/trallard/airflow-tutorial
Also in the upcoming months I have planned:
- Deploying Airflow in Kubernetes (AKS)
- In depth programmatic report generation with Airflow and papermill
- Airflow + dagster https://github.com/dagster-io/dagster
- Airflow + R?