Lokeswar Kudumula

Lokeswar Kudumula

Data Engineer

Data engineer specializing in scalable ETL pipelines, cloud data warehousing (Snowflake), real-time streaming (Kafka, Spark), and analytics transformation (dbt). Experienced in building end-to-end data platforms with production Kubernetes infrastructure, orchestration (Airflow), and data quality automation.

RemoteJoined Aug 2025

Experience

Data Engineer

Renro (連路)

Feb 2026 - PresentRemote (Personal Project)

Independent data engineering project building live trip planning platform with medallion-architecture data warehouse, CDC ingestion, real-time streaming, dbt transformations, PySpark batch/streaming jobs, and bi-hourly reverse-ETL on production Kubernetes.

  • Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking for compliance.
  • Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute data latency.
  • Developed 4 PySpark batch and streaming jobs processing real-time expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch processing windows.
  • Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
  • Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and alerting rules for proactive issue detection.
  • Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana monitoring.
  • Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for improved cost efficiency and control, supporting real-time analytics queries.
  • Built analytics-powered trip planning application backend with custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with pgvector RAG for semantic search, and real-time collaboration features.
dbt
Snowflake
PySpark
Airflow
Kafka
Debezium CDC
PostgreSQL
pgvector
Kubernetes
Spark Operator
Strimzi
Soda Core
Python
Helm

Data Engineer

MugenLink Network

Aug 2025 - PresentRemote (Personal Project)

Independent data engineering project building live blockchain analytics platform processing 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, and Airflow orchestration on k3s infrastructure.

  • Architected end-to-end ETL pipeline processing 17GB+ daily across 6 cryptocurrencies, reducing data latency from hours to minutes with fault-tolerant checkpoint/resume and incremental loading.
  • Designed 3-tier Snowflake Data Warehouse with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts across RAW, TRANSFORMED, and ANALYTICS layers.
  • Developed 30 dbt transformation models with incremental materialization strategies, improving build performance by 20-40% and enabling wallet-level transaction tracing and cross-chain analytics.
  • Implemented Change Data Capture with Snowflake Streams and Tasks, enabling real-time change detection and event-driven incremental processing across 30 RAW tables for near real-time analytics.
  • Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with schema evolution support.
  • Built FastAPI analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, and distributed tracing, serving sub-second responses for complex blockchain queries.
  • Deployed production k8s data platform with Airflow orchestration, Prometheus monitoring, auto-scaling, and zero-trust networking, achieving 99.9% uptime for all data services.
  • Established 8-stage CI/CD pipeline with GitHub Actions enforcing 70%+ test coverage across 223 tests, automated security scanning, and data quality gates before production deployment.
Python 3.13
Snowflake
dbt
Airflow
ETL/ELT
Kafka
CDC
FastAPI
SQLAlchemy
AWS S3
Kubernetes
Redis
Prometheus

System Engineer

Infosys Ltd. (Proximus)

Aug 2021 - Mar 2023Trivandrum, India

System engineer supporting telecom infrastructure operations for a major European carrier serving 4M+ subscribers.

  • Automated server health monitoring with Ansible playbooks and Bash scripts, reducing manual operational checks by 87% and improving early issue detection.
  • Reduced mean time to resolution by 40% by implementing centralized log analysis and anomaly detection using ELK Stack across production environments.
  • Optimized SQL queries for identity data validation, improving processing speed by 60% through query tuning and strategic indexing.
  • Managed CI/CD pipelines for 10+ microservices on OpenShift, maintaining a 99.5% deployment success rate and reducing release cycle time.
Ansible
ELK Stack
SQL
OpenShift
Jenkins
Dynatrace
Bash
Linux

Projects

MugenLink Network — Blockchain Analytics Platform

Featured

Production data engineering platform processing 17GB+ daily across 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, real-time CDC, Airflow orchestration, and FastAPI analytics API on Kubernetes.

  • Designed and deployed end-to-end ETL pipeline ingesting 6 cryptocurrency chains (Bitcoin, Ethereum, etc.) into Snowflake, processing 17GB+ daily with encrypted S3 staging and fault-tolerant checkpoint/resume.
  • Architected 3-tier Snowflake Data Warehouse (RAW/TRANSFORMED/ANALYTICS) with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts.
  • Developed 30 dbt transformation models across 5 layers with incremental materialization strategies, enabling wallet-level transaction tracing and cross-chain portfolio analytics with 20-40% faster builds.
  • Implemented real-time Change Data Capture with Snowflake Streams and Tasks, enabling event-driven incremental processing across 30 RAW tables with sub-minute detection latency.
  • Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with automatic schema evolution.
  • Built FastAPI 0.117+ analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, distributed tracing, and comprehensive 223-test suite (70%+ coverage).
  • Deployed production k3s data platform with Airflow orchestration, Prometheus/Grafana monitoring, Traefik ingress, auto-scaling, and full observability achieving 99.9% uptime.
  • Established 8-stage CI/CD pipeline with GitHub Actions enforcing data quality gates, security scanning, test coverage requirements, and container vulnerability analysis before production deployment.
Python 3.13
Snowflake
dbt
Airflow
ETL/ELT
Kafka
CDC
FastAPI
Redis
Kubernetes
Prometheus
AWS S3

Renro (連路) — Trip Planning Data Platform

Featured

Live trip planning platform with end-to-end data engineering: medallion data warehouse, real-time CDC streaming, dbt transformations, PySpark batch/streaming jobs, Airflow orchestration, and AI-powered features on production Kubernetes.

  • Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking.
  • Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute latency.
  • Developed 4 PySpark batch and streaming jobs processing real-time trip expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch windows.
  • Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
  • Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and proactive alerting for data quality issues.
  • Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana dashboards.
  • Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for cost efficiency, supporting real-time analytics queries and pgvector semantic search.
  • Built analytics-powered trip planning application with Next.js 16, custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with RAG, and real-time SSE collaboration features.
dbt
Snowflake
PySpark
Airflow
Kafka
Debezium CDC
PostgreSQL
pgvector
Kubernetes
Spark Operator
Strimzi
Soda Core
Python
Helm

Skills

Data Engineering & ETL

ETL/ELT Pipelines
Apache Airflow
dbt (30+ models)
Snowflake
Data Modeling
Data Warehousing
Schema Design
Incremental Loading
CDC (Debezium)
Reverse ETL
Medallion Architecture

Streaming & Real-Time

Apache Kafka
Kafka Connect
Strimzi Operator
PySpark (Batch/Stream)
Spark Operator
Event-Driven Architecture
Micro-batch Processing
Stream Processing

Programming & Databases

Python 3.13
SQL
Bash
FastAPI
SQLAlchemy 2.0
PostgreSQL
Async/Await
Pydantic
pytest

Analytics Engineering

dbt Transformations
SCD Type 2
Incremental Models
Data Quality (Soda)
PII Masking
Cross-Layer Reconciliation
ML Feature Engineering

Cloud & Infrastructure

Kubernetes (k3s)
Helm
Docker
GitHub Actions
Traefik
AWS S3
Doppler
CI/CD Pipelines
Linux

Observability & Quality

Prometheus
Grafana
Soda Core Quality Gates
OpenTelemetry
Distributed Tracing
Automated Alerting
70%+ Test Coverage
Data Reconciliation

Education

Master of Professional Studies

Data Science

University of Maryland, Baltimore County

Baltimore, MDMay 2025

Bachelor of Technology

Mechanical Engineering

Sree Vidyanikethan Engineering College

Tirupati, IndiaJune 2021

About Me

Data engineer specializing in building scalable, production-grade data pipelines and cloud data warehouses. I design end-to-end ETL/ELT systems, implement medallion-architecture warehouses, orchestrate complex data workflows, and deploy resilient distributed streaming platforms — with a strong focus on automation, data quality, observability, and platform reliability.

Current Work:

At Renro (連路) (Feb 2026 - Present), I'm building a live trip planning platform with a complete medallion-architecture data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) featuring 25+ dbt models, end-to-end Debezium CDC streaming to Kafka and Snowflake, 4 PySpark batch/streaming jobs for geo enrichment and ML features, 5 Airflow DAGs with Soda quality gates, and bi-hourly reverse-ETL syncing aggregates back to the application database. The data platform runs on production Kubernetes with Strimzi Kafka, Spark Operator, and cross-layer reconciliation ensuring sub-5% drift.

At MugenLink Network (Aug 2025 - Present), I lead data engineering for a live blockchain analytics platform processing 17GB+ daily across 6 cryptocurrency chains. I architected the ETL pipeline with fault-tolerant checkpointing, designed a 3-tier Snowflake warehouse with 13 RBAC roles, developed 30 dbt transformation models with incremental materialization, and implemented real-time CDC with Snowflake Streams and Kafka Connect — all orchestrated via Airflow on a production k3s cluster with comprehensive monitoring.

Get in Touch

Interested in collaborating or have questions about the platform? Feel free to reach out.