Skip to main content

Data Projects

Showcasing my work in data analysis, machine learning, and data engineering projects that transform raw data into actionable insights.


Customer Churn Prediction Model

Built machine learning pipeline predicting customer churn with 92% accuracy for SaaS platform.

Role: Data Scientist & ML Engineer

Technologies: PythonScikitlearnPandasNumpyTensorflowMlflowAirflowPostgreSQL
Testing: PytestIntegration

Key Achievements:

  • Developed feature engineering pipeline processing 50+ customer attributes
  • Implemented ensemble model combining XGBoost, Random Forest, and Neural Networks
  • Built MLFlow experiment tracking for model versioning
  • Created Airflow DAGs for automated model retraining
  • Reduced customer churn by 25% through targeted interventions

Real-Time Sales Analytics Dashboard

Created comprehensive analytics platform processing 1M+ daily transactions for retail chain.

Role: Data Engineer & Analytics Developer

Technologies: PythonApachesparkKafkaDatabricksPowerbiAzureSQL
Testing: PytestIntegration

Key Contributions:

  • Built ETL pipelines with Apache Spark processing 10TB+ data daily
  • Implemented real-time streaming with Kafka for live sales tracking
  • Created data warehouse schema optimized for analytical queries
  • Developed Power BI dashboards with drill-down capabilities
  • Achieved sub-second query response times for executive dashboards

Fraud Detection System

Developed ML-based fraud detection system processing 100K+ transactions per minute.

Role: ML Engineer

Technologies: PythonPytorchRedisElasticsearchKubernetesPrometheus
Testing: PytestPerformance

Key Features:

  • Built deep learning models detecting fraudulent patterns in real-time
  • Implemented feature store with Redis for low-latency inference
  • Created anomaly detection pipeline using isolation forests
  • Developed A/B testing framework for model comparison
  • Reduced false positives by 40% while maintaining 99% fraud detection rate

Marketing Attribution Analysis

Built multi-touch attribution model analyzing customer journey across 20+ marketing channels.

Role: Data Analyst & Engineer

Technologies: PythonRBigqueryTableauGoogleanalyticsSegment
Testing: PytestValidation

Project Highlights:

  • Implemented Markov chain attribution model for customer journey analysis
  • Built data pipeline integrating multiple marketing platforms
  • Created automated reporting system with Tableau
  • Developed statistical models for budget optimization
  • Increased marketing ROI by 35% through data-driven insights

IoT Sensor Data Platform

Engineered data platform processing 50M+ IoT sensor readings daily for smart city initiative.

Role: Data Platform Engineer

Technologies: PythonApacheflinkCassandraInfluxdbGrafanaAWS
Testing: PytestIntegrationPerformance

Technical Achievements:

  • Built streaming data pipeline with Apache Flink
  • Implemented time-series database with InfluxDB
  • Created anomaly detection for sensor malfunction
  • Developed Grafana dashboards for real-time monitoring
  • Achieved 99.99% data processing reliability