Lokeswar Kudumula
Data Engineer
Data engineer specializing in scalable ETL pipelines, cloud data warehousing (Snowflake), real-time streaming (Kafka, Spark), and analytics transformation (dbt). Experienced in building end-to-end data platforms with production Kubernetes infrastructure, orchestration (Airflow), and data quality automation.
Experience
Data Engineer
Renro (連路)
Independent data engineering project building live trip planning platform with medallion-architecture data warehouse, CDC ingestion, real-time streaming, dbt transformations, PySpark batch/streaming jobs, and bi-hourly reverse-ETL on production Kubernetes.
- Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking for compliance.
- Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute data latency.
- Developed 4 PySpark batch and streaming jobs processing real-time expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch processing windows.
- Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
- Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and alerting rules for proactive issue detection.
- Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana monitoring.
- Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for improved cost efficiency and control, supporting real-time analytics queries.
- Built analytics-powered trip planning application backend with custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with pgvector RAG for semantic search, and real-time collaboration features.
Data Engineer
MugenLink Network
Independent data engineering project building live blockchain analytics platform processing 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, and Airflow orchestration on k3s infrastructure.
- Architected end-to-end ETL pipeline processing 17GB+ daily across 6 cryptocurrencies, reducing data latency from hours to minutes with fault-tolerant checkpoint/resume and incremental loading.
- Designed 3-tier Snowflake Data Warehouse with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts across RAW, TRANSFORMED, and ANALYTICS layers.
- Developed 30 dbt transformation models with incremental materialization strategies, improving build performance by 20-40% and enabling wallet-level transaction tracing and cross-chain analytics.
- Implemented Change Data Capture with Snowflake Streams and Tasks, enabling real-time change detection and event-driven incremental processing across 30 RAW tables for near real-time analytics.
- Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with schema evolution support.
- Built FastAPI analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, and distributed tracing, serving sub-second responses for complex blockchain queries.
- Deployed production k8s data platform with Airflow orchestration, Prometheus monitoring, auto-scaling, and zero-trust networking, achieving 99.9% uptime for all data services.
- Established 8-stage CI/CD pipeline with GitHub Actions enforcing 70%+ test coverage across 223 tests, automated security scanning, and data quality gates before production deployment.
System Engineer
Infosys Ltd. (Proximus)
System engineer supporting telecom infrastructure operations for a major European carrier serving 4M+ subscribers.
- Automated server health monitoring with Ansible playbooks and Bash scripts, reducing manual operational checks by 87% and improving early issue detection.
- Reduced mean time to resolution by 40% by implementing centralized log analysis and anomaly detection using ELK Stack across production environments.
- Optimized SQL queries for identity data validation, improving processing speed by 60% through query tuning and strategic indexing.
- Managed CI/CD pipelines for 10+ microservices on OpenShift, maintaining a 99.5% deployment success rate and reducing release cycle time.
Projects
MugenLink Network — Blockchain Analytics Platform
Production data engineering platform processing 17GB+ daily across 6 cryptocurrency chains with automated ETL, Snowflake data warehouse, dbt transformations, real-time CDC, Airflow orchestration, and FastAPI analytics API on Kubernetes.
- Designed and deployed end-to-end ETL pipeline ingesting 6 cryptocurrency chains (Bitcoin, Ethereum, etc.) into Snowflake, processing 17GB+ daily with encrypted S3 staging and fault-tolerant checkpoint/resume.
- Architected 3-tier Snowflake Data Warehouse (RAW/TRANSFORMED/ANALYTICS) with 13 RBAC roles, RSA key-pair authentication, and secure multi-team access for 8 service accounts.
- Developed 30 dbt transformation models across 5 layers with incremental materialization strategies, enabling wallet-level transaction tracing and cross-chain portfolio analytics with 20-40% faster builds.
- Implemented real-time Change Data Capture with Snowflake Streams and Tasks, enabling event-driven incremental processing across 30 RAW tables with sub-minute detection latency.
- Configured Kafka Connect with Snowflake Sink Connector for continuous data ingestion via Snowpipe Streaming, reducing pipeline latency from hours to minutes with automatic schema evolution.
- Built FastAPI 0.117+ analytics API with async SQLAlchemy 2.0, Redis caching layer for query acceleration, distributed tracing, and comprehensive 223-test suite (70%+ coverage).
- Deployed production k3s data platform with Airflow orchestration, Prometheus/Grafana monitoring, Traefik ingress, auto-scaling, and full observability achieving 99.9% uptime.
- Established 8-stage CI/CD pipeline with GitHub Actions enforcing data quality gates, security scanning, test coverage requirements, and container vulnerability analysis before production deployment.
Renro (連路) — Trip Planning Data Platform
Live trip planning platform with end-to-end data engineering: medallion data warehouse, real-time CDC streaming, dbt transformations, PySpark batch/streaming jobs, Airflow orchestration, and AI-powered features on production Kubernetes.
- Architected 4-layer medallion data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) with 25+ dbt models implementing incremental loading, SCD Type 2 change tracking, and automated PII masking.
- Built end-to-end CDC pipeline using Debezium streaming PostgreSQL changes to Kafka (7 topics), with Kafka Connect sinking to Snowflake BRONZE schema, achieving sub-5-minute latency.
- Developed 4 PySpark batch and streaming jobs processing real-time trip expense data with geo enrichment, daily aggregations, ML feature engineering, and 30-second micro-batch windows.
- Orchestrated 5 Airflow DAGs coordinating hourly dbt transformations, 30-minute Soda quality gates, daily Spark batch processing, and bi-hourly reverse-ETL syncing warehouse aggregates to application database.
- Implemented cross-layer data reconciliation ensuring sub-5% row-count drift between pipeline stages, with 8 automated Soda quality checks and proactive alerting for data quality issues.
- Deployed production k8s data platform with 3 Helm charts managing Strimzi Kafka cluster, Spark Operator for distributed processing, and full observability with Prometheus/Grafana dashboards.
- Designed 17-table PostgreSQL schema migrating from managed Neon to self-hosted in-cluster Postgres for cost efficiency, supporting real-time analytics queries and pgvector semantic search.
- Built analytics-powered trip planning application with Next.js 16, custom JWT auth (RBAC, MFA, API keys), AI assistant using Claude API with RAG, and real-time SSE collaboration features.
Skills
Data Engineering & ETL
Streaming & Real-Time
Programming & Databases
Analytics Engineering
Cloud & Infrastructure
Observability & Quality
Education
Master of Professional Studies
Data Science
University of Maryland, Baltimore County
Bachelor of Technology
Mechanical Engineering
Sree Vidyanikethan Engineering College
About Me
Data engineer specializing in building scalable, production-grade data pipelines and cloud data warehouses. I design end-to-end ETL/ELT systems, implement medallion-architecture warehouses, orchestrate complex data workflows, and deploy resilient distributed streaming platforms — with a strong focus on automation, data quality, observability, and platform reliability.
Current Work:
At Renro (連路) (Feb 2026 - Present), I'm building a live trip planning platform with a complete medallion-architecture data warehouse (BRONZE/SILVER/GOLD/ML_FEATURES) featuring 25+ dbt models, end-to-end Debezium CDC streaming to Kafka and Snowflake, 4 PySpark batch/streaming jobs for geo enrichment and ML features, 5 Airflow DAGs with Soda quality gates, and bi-hourly reverse-ETL syncing aggregates back to the application database. The data platform runs on production Kubernetes with Strimzi Kafka, Spark Operator, and cross-layer reconciliation ensuring sub-5% drift.
At MugenLink Network (Aug 2025 - Present), I lead data engineering for a live blockchain analytics platform processing 17GB+ daily across 6 cryptocurrency chains. I architected the ETL pipeline with fault-tolerant checkpointing, designed a 3-tier Snowflake warehouse with 13 RBAC roles, developed 30 dbt transformation models with incremental materialization, and implemented real-time CDC with Snowflake Streams and Kafka Connect — all orchestrated via Airflow on a production k3s cluster with comprehensive monitoring.