Shambhu Adhikari

Senior Data Engineer

Building large-scale data pipelines and modern data platforms across AWS, Azure, and GCP. Specializing in distributed ETL/ELT workflows, lakehouse architectures, and GenAI-powered data solutions.

About

Senior Data Engineer with extensive experience building large-scale data pipelines and modern data platforms across AWS, Azure, and GCP environments. Proven track record in designing distributed systems and automating complex data processing workflows.

Key Achievements

  • Designed distributed ETL/ELT workflows using PySpark, Apache Airflow, AWS Glue, Azure Data Factory, and GCP Dataflow
  • Built lakehouse architectures with Delta Lake, Iceberg, and Hudi for batch and streaming workloads
  • Implemented GenAI-powered data quality pipelines using OpenAI APIs, LangChain, and custom prompt engineering
  • Developed ML-ready feature pipelines supporting end-to-end model lifecycle and real-time inference
  • Created cost-optimized multi-cloud deployments using Terraform and CI/CD workflows
  • Secured sensitive data in compliance with HIPAA, HITRUST, and PCI-DSS standards

Technical Skills

Big Data

DatabricksSparkKafkaAirflowHadoopHiveImpalaTezZookeeperNifiOozie

Databases

HBaseCassandraMongoDBOracleMySQLRedshiftSynapseBigQuery

Languages

.NETPySparkPythonScalaSQLPL/SQLShell ScriptingJava

Cloud Technologies

AWS (S3, EMR, EC2, Lambda, Glue, Redshift, Athena, Kinesis, Step Functions, SageMaker)Azure (Data Factory, Databricks, Synapse, Data Lake Gen2, Event Hubs, Purview)GCP (BigQuery, Dataflow, Cloud Composer)Snowflake

AI & ML Tools

Generative AIOpenAILangChainRAG PipelinesFAISSPineconeGPU Computing

Data Formats

ParquetAvroORCDeltaIcebergJSONXMLCSV

DevOps & Tools

TerraformJenkinsBambooGitHub ActionsAzure DevOpsDockerGitBitBucket

Visualization

LookerPower BIQuickSightTableau

Professional Experience

Sr. Data Engineer

United Airlines

Aug 2024 - Present
NW, NJ
  • Designed and implemented fully automated ETL pipeline on AWS leveraging S3, Glue, Redshift, and Step Functions
  • Built event-driven workflows using Amazon EventBridge and Step Functions to orchestrate complex tasks
  • Engineered distributed data processing jobs on AWS EMR using PySpark, processing 15+ TB of data weekly
  • Tuned Amazon Redshift for high-concurrency workloads, reducing query latency by 40%
  • Integrated Amazon Kinesis Data Streams and Firehose for real-time data ingestion
AWSS3GlueEMRRedshiftPySparkPythonTerraformdbt

Sr. ETL Data Engineer

American Express

Jan 2023 - Jul 2024
NY, NY
  • Designed and maintained scalable ETL pipelines using Python, SQL, and Apache Airflow
  • Leveraged AWS Glue and EMR with PySpark to process over 12TB of daily financial data
  • Implemented AI-enriched fraud detection data pipelines for real-time model scoring
  • Tuned ETL job performance using advanced partitioning, reducing processing time by 40%
  • Led migration of legacy ETL processes to modern Spark-based lakehouse architecture
AWSGlueEMRRedshiftAirflowPySparkPythonKafkaGreat Expectations

Azure Data Engineer

Cedar Gate Company

May 2018 - Dec 2021
CT
  • Designed scalable data pipelines using Azure Data Factory to ingest healthcare data from 30+ sources
  • Built Delta Lake architecture on Azure Data Lake Gen2 with bronze, silver, and gold layers
  • Engineered distributed ETL pipelines using Azure Databricks (PySpark) for multi-terabyte datasets
  • Integrated FHIR APIs and HL7 feeds for real-time clinical data ingestion
  • Led migration from on-prem SSIS to cloud-native ADF and Databricks with 40% cost reduction
AzureData FactoryDatabricksSynapsePySparkPower BIFHIRHL7

Big Data Developer

Cotiviti

Mar 2016 - Apr 2018
Remote
  • Developed Spark applications using PySpark and Spark-SQL for data extraction and transformation
  • Used Spark Streaming to receive real-time data from Kafka and store to HDFS
  • Built on-premise data pipelines using Kafka and Spark for real-time data analysis
  • Created Tableau dashboards using Hive outputs for data visualization
  • Used AWS EMR to transform and move large amounts of data into S3 and DynamoDB
HadoopAWSEMRSparkKafkaHiveHBaseSnowflakeTableau

Hadoop Developer

Mango Software

Jul 2014 - Feb 2016
Remote
  • Built end-to-end ETL pipelines using Hadoop ecosystem components
  • Used Hive to analyze partitioned and bucketed data for reporting metrics
  • Created shell scripts, Oozie workflows, and Coordinator jobs for automation
  • Implemented performance optimizations including distributed cache and map-side joins
  • Implemented Hive managed tables as ACID compliant for SCD Type 1
HadoopHiveSqoopOozieSparkKafkaPythonScalaTableau

Education

Master of Data Science

University of New Haven

Connecticut, US

Bachelor of Computer Science and Engineering

Ansal University

Gurugram, India

Get In Touch

I'm always open to discussing new opportunities, collaborations, or data engineering challenges.

© 2025 Shambhu Adhikari. Built with Next.js and Tailwind CSS.