Ehsan Ullah: Aviation week data pipeline

Pratt & Whitney – Aviation Week Utilization Pipeline Integration

To enhance operational visibility and aircraft performance analytics, Pratt & Whitney developed an external data ingestion pipeline to integrate Aviation Week utilization data into its enterprise Azure-based data lake.

Pipeline Architecture Overview:

1. Data Extraction using Apache Spark:

Aviation Week provides usage statistics and asset performance data through RESTful APIs and flat file dumps.
A Spark-based extraction engine was developed to ingest large volumes of structured and semi-structured data (CSV, JSON) from the source system.
Spark jobs run in Databricks notebooks or on a standalone Spark cluster and normalize the incoming data to match Pratt & Whitney’s bronze layer schema.

2. Real-Time Stream Ingestion with Kafka:

Spark writes transformed data to Apache Kafka topics, enabling decoupling of downstream consumers and real-time processing.
A custom Kafka enrichment service computes:
“Aging values” — the delta between event timestamp and current date, used for trend analysis.
“Snapshot values” — capturing the state of aircraft utilization at predefined intervals (daily/hourly) to build a historical change log.

3. Workflow Orchestration with Apache Airflow:

All ingestion, transformation, and quality checks are orchestrated through Apache Airflow.
DAGs schedule Spark jobs, Kafka producers, and downstream processing at controlled intervals (e.g., hourly for streaming, daily for batch snapshots).
Airflow also triggers alerting mechanisms in case of ingestion failures or SLA breaches.

4. Data Quality Validation with Great Expectations:

Each batch of incoming data is passed through a Great Expectations checkpoint before landing in the bronze layer.
Validations include:
Schema conformity
Null checks on critical fields (tail number, flight hours)
Distribution checks on numeric fields (e.g., cycles per flight)
Validation results are stored and visualized, with failed records quarantined in a separate S3 path.

Outcome:

Enabled near real-time and historical analysis of third-party aircraft utilization data.
Improved fleet readiness analytics, maintenance forecasting, and integration with internal engine performance dashboards.
Established a reusable external ingestion framework for additional aerospace industry datasets.

From Blogger iPhone client

Ehsan Ullah

Home

Aviation week data pipeline - Pratt and Whitney

Recommendations

Application ISSUES

Designed By Webmaster

Contact Information

Topics

ME

Traffic Solution

City I live in