Lokeswar Kudumula

Data Engineer

Data engineer specializing in scalable ETL pipelines, cloud data warehousing (Snowflake), real-time streaming (Kafka, Spark), and analytics transformation (dbt). Experienced in building end-to-end data platforms with production Kubernetes infrastructure, orchestration (Airflow), and data quality automation.

RemoteJoined Aug 2025

Experience

Data Engineer

Renro (連路)

Feb 2026 - PresentRemote (Personal Project)

Independent data engineering project building live trip planning platform with medallion-architecture data warehouse, CDC ingestion, real-time streaming, dbt transformations, PySpark batch/streaming jobs, and bi-hourly reverse-ETL on production Kubernetes.

Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking for compliance.
Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute data latency.
Developed 4 PySpark batch and streaming jobs processing real-time expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch processing windows.
Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and alerting rules for proactive issue detection.
Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana monitoring.
Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for improved cost efficiency and control, supporting real-time analytics queries.
Built analytics-powered trip planning application backend with custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with pgvector RAG for semantic search, and real-time collaboration features.

dbt

Snowflake

PySpark

Airflow

Kafka

Debezium CDC

PostgreSQL

pgvector

Kubernetes

Spark Operator

Strimzi

Soda Core

Python

Helm

Data Engineer

MugenLink Network

Aug 2025 - PresentRemote (Personal Project)

Independent data engineering project building live blockchain analytics platform processing 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, and Airflow orchestration on k3s infrastructure.

Architected end-to-end ETL pipeline processing 17GB+ daily across 6 cryptocurrencies, reducing data latency from hours to minutes with fault-tolerant checkpoint/resume and incremental loading.
Designed 3-tier Snowflake Data Warehouse with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts across RAW, TRANSFORMED, and ANALYTICS layers.
Developed 30 dbt transformation models with incremental materialization strategies, improving build performance by 20-40% and enabling wallet-level transaction tracing and cross-chain analytics.
Implemented Change Data Capture with Snowflake Streams and Tasks, enabling real-time change detection and event-driven incremental processing across 30 RAW tables for near real-time analytics.
Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with schema evolution support.
Built FastAPI analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, and distributed tracing, serving sub-second responses for complex blockchain queries.
Deployed production k8s data platform with Airflow orchestration, Prometheus monitoring, auto-scaling, and zero-trust networking, achieving 99.9% uptime for all data services.
Established 8-stage CI/CD pipeline with GitHub Actions enforcing 70%+ test coverage across 223 tests, automated security scanning, and data quality gates before production deployment.

Python 3.13

Snowflake

dbt

Airflow

ETL/ELT

Kafka

CDC

FastAPI

SQLAlchemy

AWS S3

Kubernetes

Redis

Prometheus

System Engineer

Infosys Ltd. (Proximus)

Aug 2021 - Mar 2023Trivandrum, India

System engineer supporting telecom infrastructure operations for a major European carrier serving 4M+ subscribers.

Automated server health monitoring with Ansible playbooks and Bash scripts, reducing manual operational checks by 87% and improving early issue detection.
Reduced mean time to resolution by 40% by implementing centralized log analysis and anomaly detection using ELK Stack across production environments.
Optimized SQL queries for identity data validation, improving processing speed by 60% through query tuning and strategic indexing.
Managed CI/CD pipelines for 10+ microservices on OpenShift, maintaining a 99.5% deployment success rate and reducing release cycle time.

Ansible

ELK Stack

SQL

OpenShift

Jenkins

Dynatrace

Bash

Linux

Projects

MugenLink Network — Blockchain Analytics Platform

Featured

Production data engineering platform processing 17GB+ daily across 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, real-time CDC, Airflow orchestration, and FastAPI analytics API on Kubernetes.

Designed and deployed end-to-end ETL pipeline ingesting 6 cryptocurrency chains (Bitcoin, Ethereum, etc.) into Snowflake, processing 17GB+ daily with encrypted S3 staging and fault-tolerant checkpoint/resume.
Architected 3-tier Snowflake Data Warehouse (RAW/TRANSFORMED/ANALYTICS) with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts.
Developed 30 dbt transformation models across 5 layers with incremental materialization strategies, enabling wallet-level transaction tracing and cross-chain portfolio analytics with 20-40% faster builds.
Implemented real-time Change Data Capture with Snowflake Streams and Tasks, enabling event-driven incremental processing across 30 RAW tables with sub-minute detection latency.
Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with automatic schema evolution.
Built FastAPI 0.117+ analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, distributed tracing, and comprehensive 223-test suite (70%+ coverage).
Deployed production k3s data platform with Airflow orchestration, Prometheus/Grafana monitoring, Traefik ingress, auto-scaling, and full observability achieving 99.9% uptime.
Established 8-stage CI/CD pipeline with GitHub Actions enforcing data quality gates, security scanning, test coverage requirements, and container vulnerability analysis before production deployment.

Python 3.13

Snowflake

dbt

Airflow

ETL/ELT

Kafka

CDC

FastAPI

Redis

Kubernetes

Prometheus

AWS S3

Renro (連路) — Trip Planning Data Platform

Featured

Live trip planning platform with end-to-end data engineering: medallion data warehouse, real-time CDC streaming, dbt transformations, PySpark batch/streaming jobs, Airflow orchestration, and AI-powered features on production Kubernetes.

Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking.
Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute latency.
Developed 4 PySpark batch and streaming jobs processing real-time trip expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch windows.
Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and proactive alerting for data quality issues.
Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana dashboards.
Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for cost efficiency, supporting real-time analytics queries and pgvector semantic search.
Built analytics-powered trip planning application with Next.js 16, custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with RAG, and real-time SSE collaboration features.

dbt

Snowflake

PySpark

Airflow

Kafka

Debezium CDC

PostgreSQL

pgvector

Kubernetes

Spark Operator

Strimzi

Soda Core

Python

Helm

Skills

Data Engineering & ETL

ETL/ELT Pipelines

Apache Airflow

dbt (30+ models)

Snowflake

Data Modeling

Data Warehousing

Schema Design

Incremental Loading

CDC (Debezium)

Reverse ETL

Medallion Architecture

Streaming & Real-Time

Apache Kafka

Kafka Connect

Strimzi Operator

PySpark (Batch/Stream)

Spark Operator

Event-Driven Architecture

Micro-batch Processing

Stream Processing

Programming & Databases

Python 3.13

SQL

Bash

FastAPI

SQLAlchemy 2.0

PostgreSQL

Async/Await

Pydantic

pytest

Analytics Engineering

dbt Transformations

SCD Type 2

Incremental Models

Data Quality (Soda)

PII Masking

Cross-Layer Reconciliation

ML Feature Engineering

Cloud & Infrastructure

Kubernetes (k3s)

Helm

Docker

GitHub Actions

Traefik

AWS S3

Doppler

CI/CD Pipelines

Linux

Observability & Quality

Prometheus

Grafana

Soda Core Quality Gates

OpenTelemetry

Distributed Tracing

Automated Alerting

70%+ Test Coverage

Data Reconciliation

Education

Master of Professional Studies

Data Science

University of Maryland, Baltimore County

Baltimore, MDMay 2025

Bachelor of Technology

Mechanical Engineering

Sree Vidyanikethan Engineering College

Tirupati, IndiaJune 2021

About Me

Data engineer specializing in building scalable, production-grade data pipelines and cloud data warehouses. I design end-to-end ETL/ELT systems, implement medallion-architecture warehouses, orchestrate complex data workflows, and deploy resilient distributed streaming platforms — with a strong focus on automation, data quality, observability, and platform reliability.

Current Work:

At Renro (連路) (Feb 2026 - Present), I'm building a live trip planning platform with a complete medallion-architecture data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) featuring 25+ dbt models, end-to-end Debezium CDC streaming to Kafka and Snowflake, 4 PySpark batch/streaming jobs for geo enrichment and ML features, 5 Airflow DAGs with Soda quality gates, and bi-hourly reverse-ETL syncing aggregates back to the application database. The data platform runs on production Kubernetes with Strimzi Kafka, Spark Operator, and cross-layer reconciliation ensuring sub-5% drift.

At MugenLink Network (Aug 2025 - Present), I lead data engineering for a live blockchain analytics platform processing 17GB+ daily across 6 cryptocurrency chains. I architected the ETL pipeline with fault-tolerant checkpointing, designed a 3-tier Snowflake warehouse with 13 RBAC roles, developed 30 dbt transformation models with incremental materialization, and implemented real-time CDC with Snowflake Streams and Kafka Connect — all orchestrated via Airflow on a production k3s cluster with comprehensive monitoring.

Get in Touch

Interested in collaborating or have questions about the platform? Feel free to reach out.