Soyel Alam

A technical enthusiastic and dedicated IT professional with diverse experience in Big data, data warehousing and cloud computing. Eager to grow and improve my IT skills further.

Latest Posts

⛔️ No more broken dags with Simple Airflow Unit Testing - Aug 11, 2025

📬 Airflow Failure Alert Templates - Jun 01, 2025

✻ Making Airflow DAGs Stateful: When a DAG Should Wait for Its Past Run - Dec 29, 2021

✰ Dr. PySpark: How I Learned to Stop Worrying and Love Data Pipeline Testing - Dec 18, 2020

Highlights

Dr. PySpark: How I Learned to Stop Worrying and Love Data Pipeline Testing

One major challenge in data pipeline implementation is reliably testing the pipeline codes. The outcome of the code is tightly coupled with data and the environment.
One way to overcome the reliability challenge is to use immutable data to run and test the pipeline so that the result of ETL functions can be matched against known outputs. This blog-post focuses on providing a model of self-contained data pipelines with CICD.

I want all the tasks in the DAG to finish! before the next DAG run

In an ideal world, An airflow task should represent an atomic transaction, so that, a failure in the task should not lead to any inconsistency in the system.
But at times more than one task could represent a transaction. In such cases, the entire Airflow DAG needs to be finished before the next DAGRun is triggered.

In this post, we will explain one such scenario. How we added self dependency on the past run of the same DAG in Airflow.

📬 Standardizing Airflow Alert Emails with HTML Templates

Failure emails from Airflow are often sparse and not very actionable. This post introduces a way to standardize and enrich alert emails using custom HTML templates — including metadata like DAG owner, priority, tags, recent failure history, and direct links to the Airflow UI. We show how to configure this in your environment, test it with a failing DAG, and use it as a reusable pattern for better observability and response.

⛔️ No more broken dags with Simple Airflow Unit Testing

Broken Airflow DAGs waste time and block pipelines. This post walks through a simple yet powerful unit test for Airflow dags to catch syntax errors, import issues, and misconfigurations before they hit production. You'll see how to run these tests locally or in CI, and how to build them into a reliable guardrail for your data workflows.

Experience

Cloud Data Engineer

Migrated legacy data-warehouse code and data into AWS and Snowflake using Spark and Airflow. Setup PySpark project template using cookiecutter to standardize the data-pipeline. Developing Airflow dags to orchestrate the tasks, writing custom reusable Airflow operators. Setup an AWS and Airflow environment using Terraform. Maintaining and enhancing the existing data warehouse system in Teradata and Hadoop ecosystem.

May 2018 - Present

Business Analyst

Analyzing Business Intelligence Reporting requirements and translating them into data sourcing and modeling requirements including Facts, Dimensions, Star Schemas, Snowflake Schemas. Re-designed application processes, data interfaces, data retention & aggregation policy to reduce the run time and storage by 30%

March 2016 - August 2017

Data Warehouse Engineer

Developing Oracle packages by implementing advance PL/SQL concepts i.e. Dynamic SQL, Analytical functions, Bulk collect, Cursor and Hierarchical Query.
Producing Logical and Physical data model and data mapping using Erwin and Excel. Re-wrote Legacy Ab initio graphs into PL/SQL standard codes.

August 2011 - March 2016

Career

2023-

2021-2023

2018-2021

2017 - 2018

2007 - 2011

Soyel Alam

Latest Posts

Highlights

Dr. PySpark: How I Learned to Stop Worrying and Love Data Pipeline Testing

I want all the tasks in the DAG to finish! before the next DAG run

📬 Standardizing Airflow Alert Emails with HTML Templates

⛔️ No more broken dags with Simple Airflow Unit Testing