AWS Certified Developer - Associate
Oracle Data Integrator 11g Certified Implementation Specialist
Masters in Computer science
Bachelor In Technology
A technical enthusiastic and dedicated IT professional with diverse experience in Big data, data warehousing and cloud computing. Eager to grow and improve my IT skills further.
One major challenge in data pipeline implementation is reliably testing the pipeline codes. The outcome of the code is tightly coupled with data and the environment.
One way to overcome the reliability challenge is to use immutable data to run and test the pipeline so that the result of ETL functions can be matched against known outputs.
This blog-post focuses on providing a model of self-contained data pipelines with CICD.
In an ideal world, An airflow task should represent an atomic transaction, so that, a failure in the task should not lead to any inconsistency in the system.
But at times more than one task could represent a transaction. In such cases, the entire Airflow DAG needs to be finished before the next DAGRun is triggered.
In this post, we will explain one such scenario. How we added self dependency on the past run of the same DAG in Airflow.
Migrated legacy data-warehouse code and data into AWS and Snowflake using Spark and Airflow. Setup PySpark project template using cookiecutter to standardize the data-pipeline. Developing Airflow dags to orchestrate the tasks, writing custom reusable Airflow operators. Setup an AWS and Airflow environment using Terraform. Maintaining and enhancing the existing data warehouse system in Teradata and Hadoop ecosystem.
Analyzing Business Intelligence Reporting requirements and translating them into data sourcing and modeling requirements including Facts, Dimensions, Star Schemas, Snowflake Schemas. Re-designed application processes, data interfaces, data retention & aggregation policy to reduce the run time and storage by 30%
Developing Oracle packages by implementing advance PL/SQL concepts i.e. Dynamic SQL, Analytical functions, Bulk collect, Cursor and Hierarchical Query.
Producing Logical and Physical data model and data mapping using Erwin and Excel. Re-wrote Legacy Ab initio graphs into PL/SQL standard codes.