Saurabh Chhajed

Lead Data/AI Engineer

Thoughtworks India Pvt Ltd, Hyderabad, India

Oct 2024 - Present

Client: Leading U.S. Home Improvement Retailer

Spearheaded architecture and delivery of a scalable Promotion Analytics Engine, driving campaign insights and analytics based on 20+ financial parameters.
Developed robust, production-grade, DBT models based ETL pipelines combining PySpark and BigQuery for batch and near-real-time transformations.
Integrated LightGBM regression models via BigQuery ML to enable accurate volume forecasting.
Led a team of 10 engineers, overseeing design, architecture, and code reviews ensuring quality and scalability.
Collaborated across various stakeholders – Product, Data Science, and Architects.

Lead Data Engineer (LMTS)

Salesforce India Pvt Ltd, Hyderabad, India

June 2022 - Sept 2024

Salesforce's Unified Intelligence Platform (UIP) is an enterprise-scale internal data lake and analytics ecosystem, facilitating petabyte-scale data ingestion, exploration, transformation, and visualization.

Led the architecture and development of a metadata-driven ingestion pipeline processing petabytes of data, integrating Kafka, Spark, Scala, Trino, and Airflow for scalable batch and streaming ingestion.
Designed GDPR-compliant data leak scanners with sampling techniques, cutting scanning costs by 40%.
Built a high-throughput Leak Management Pipeline cleaning leaked PII data across 3000+ record types and several PBs of data.
Engineered an advanced tokenization service securing sensitive identifiers while enabling efficient analytics.
Developed Airflow Operators and frameworks to streamline ingestion and exploration workflows.
Implemented system monitoring and alerting with Grafana and PagerDuty for real-time system visibility.
Created a workload analytics dashboard using Apache Superset, optimizing Spark cluster resource utilization by ~20%.
Led a team of 5 engineers, driving design reviews, code quality initiatives, and operational improvements.

Lead Data/ML Engineer

American Express (via Impetus Technologies Inc.), Phoenix, Arizona, USA

Dec 2014 - June 2022

Architected and built a merchant recommender system using an ensemble of CatBoost, collaborative filtering, and Word2Vec models on Spark, enhancing Amex marketing personalization.
Developed end-to-end ML pipelines: feature extraction, model training, hyperparameter tuning (distributed grid search), monitoring, and deployment to online scoring systems.
Reduced hyperparameter tuning time by 40% through distributed Spark-based optimization.
Designed and deployed microservices-based model serving architecture for real-time, geo-personalized merchant recommendations.
Built Model/Feature Monitoring solutions to track GINI, PSI, and accuracy metrics, ensuring model health.
Spearheaded development of an online offer personalization engine with Hadoop, Hive, MapRDB, and Elasticsearch, improving campaign launch speed by 50%.
Collaborated cross-functionally with Product, Data Science, and Marketing teams using Agile/SAFe methodologies.

Application Developer

JP Morgan Chase & Co., India

Aug 2012 - Dec 2014

Developed a multi-clustered distributed data management platform for high availability, low-latency processing.
Built real-time Search and Analytics solutions using ELK Stack (Elasticsearch, Logstash, Kibana).
Designed real-time order update systems using distributed caching (Gemfire).
Evangelized code quality tools (Sonar, Jira, Crucible), improving team code health.
Conducted a Hadoop PoC for analyzing cross-application usage patterns.
Worked on Messaging products, providing end to end integration between many business-critical applications involving app. 2-3 million message exchanges daily.
Exposure to Multithreading and Java performance tuning methodologies involving GC algorithms and tuning.

Systems Engineer

General Electric (GE) Company (TCS), India

Nov 2009 - July 2012

Led re-architecture of a large-scale ASP/IIS application to a Java/Spring microservices framework.
Designed and developed RESTful APIs consumed by multiple clients.
Improved critical business process execution time by 50%, saving $30,000.
Optimized database queries and developed complex PL/SQL procedures.

Professional Summary

Professional Experience

Lead Data/AI Engineer

Lead Data Engineer (LMTS)

Lead Data/ML Engineer

Application Developer

Systems Engineer

Technical Skills

Big Data Platforms

Cloud Platforms

Programming

Distributed Systems

Data Security

Additional Technologies

DevOps & CI/CD

Education

Institute of Engineering and Technology, Indore, MP

Certifications & Publications

Awards & Accolades