Hello I'm

Rishabh Jain

Data Engineer

About Me

I’m Rishabh, Lead Data Engineer based in Seattle, WA, USA. I have extensive experience in Developing Big Data pipelines and working on cloud ecosystems. I am also good at Data Warehousing, ETL Development and writing scalable code.

  • Python
  • SQL
  • Scala
  • Spark
  • DBT
  • Airflow
  • AWS
  • Redshift
  • Snowflake
  • BigQuery
  • Kafka
  • Kubernetes
  • Terraform
  • Tableau
Downlaod CV

What I do

Data Engineer

I love playing with data, finding insights by creating Data pipelines and handling end to end Big Data systems.

Cloud Engineer

With experience working in different cloud technologies, it's my passion to develop and architect robust cloud ecosystems by following best practices.

Python Developer

Python is my preferred language in coding. Done data transformation using libraries like Pandas, Numpy. Also used in Pyspark and creating various ML models.

Technical Skills

SQL
95%
Python
90%
Apache Spark (PySpark / Scala)
85%
DBT
85%
AWS (Redshift, S3, EMR, Glue, EKS)
85%
Apache Airflow
80%
Snowflake / Redshift
80%
Kubernetes / Docker / Terraform
75%

Professional Skills

  • Enthusiastic
  • Team Work
  • Communication
  • Leadership

Work Experience

Lead Data Engineer Two Circles, Vancouver

May 2025 - Present
Responsibility :
  • Own end-to-end architecture of the clickstream analytics platform ingesting 10M+ events/day from NFL+ and premium media properties driving content engagement, subscription retention, and executive KPI reporting.
  • Drove the design of a 100+ model DBT warehouse on Redshift with conformed fact and dimension layers, SCD Type 2 snapshots, and SLA-backed freshness contracts enabling self-service analytics for 30+ downstream consumers.
  • Led a Redshift cost & performance overhaul (WLM redesign, sort/dist key strategy, materialized views, workload isolation) cutting compute spend 25% and report latency 40% while doubling concurrent query throughput.
  • Architected the unified subscription revenue pipeline consolidating Cleeng and Recurly into a single source of truth, unblocking finance close from T+5 to T+1.
  • Established the team's engineering standards, PR review framework, schema migration playbook, and rollback automation; hired and mentored 2 engineers.

Senior Data Engineer WineDirect, Vancouver

Jun 2023 - May 2025
Responsibility :
  • Founded and productionized the dbt practice from zero, establishing the modeling layer on Redshift, Airflow orchestration patterns, testing conventions, and documentation standards; shipped 50+ models in the first year.
  • Re-architected critical Spark pipelines (partitioning strategy, broadcast joins, adaptive query execution, intelligent caching), cutting processing time 25% and AWS compute spend across petabyte-scale workloads.
  • Architected a centralized S3 Data Lake holding petabytes of historical commerce data and designed the client-facing API access layer enabling thousands of wineries and retailers to query their data programmatically.
  • Owned the CI/CD pipeline for schema migrations on GitHub Actions, with automated rollback and blue/green deployment patterns — drove production data incidents down 60% QoQ.
  • Drove data quality maturity by introducing dbt tests, Great Expectations checks, and freshness SLAs across critical revenue and inventory pipelines.

Data Engineer Skillz, Vancouver

Jun 2021 - May 2023
Responsibility :
  • Drove the end-to-end Redshift to Snowflake migration covering 400+ tables across marketing, product, and finance domains — zero data loss and zero downtime for downstream consumers.
  • Built the real-time ingestion layer on Snowpipe, Tasks, and External Tables, reducing data freshness from hours to minutes for gaming and campaign event streams powering attribution and ML feature pipelines.
  • Owned query optimization and schema design for high-cardinality event tables, cutting dashboard load times 3–5x on critical campaign reports.
  • Established the dbt + Great Expectations testing framework across analytics pipelines, defining SLA tiers and quality gates that became the team's standard for production data releases.

Big Data Engineer Intern Samsung Electronics, Canada

May 2020 - Dec 2020
Responsibility :
  • Migrated production Spark Scala jobs from AWS EMR to EKS, leveraging Spot Instances and Kubernetes autoscaling to reduce infrastructure costs by 15% while processing petabytes of device telemetry.
  • Built Airflow-orchestrated ETL pipelines executing SQL and Python workloads across terabyte-scale datasets, with retry, alerting, and SLA tracking baked in.
  • Provisioned cloud infrastructure with Terraform, AWS Auto Scaling Groups, IAM, and EKS clusters enabling repeatable, version-controlled deployments across dev/stage/prod.
  • Contributed to internal functional data engineering standards covering containerization, CI/CD, and distributed Spark tuning.

Software Engineer, Data Tesco, Bengaluru, India

Aug 2015 - Aug 2019
Responsibility :
  • Built and maintained ETL pipelines in IBM DataStage 11.5, integrating structured and semi-structured sources (Parquet, CSV, JSON) to deliver consolidated datasets for banking analytics.
  • Engineered solutions to handle schema evolution and scaling challenges across multi-source ingestion frameworks.
  • Seconded to the UK for one year to partner directly with business stakeholders, gathering requirements and building a personalized offers engine spanning Savings, Mortgages, and Loans products.

Education

Master's in Computer Science, Big Data from Simon Fraser University, Canada

2019-2021

  • CGPA: 4.0 / 4.3
  • Relevant courses taken:
  • Machine Learning
  • Statistics
  • Big Data systems
  • Algorithms
  • Bachelor's in Computer Science from Vellore Institute of Technology, India

    2011-2015

  • CGPA: 9.10 / 10
  • Relevant courses taken :
  • Data Mining
  • Database
  • Cloud computing
  • Academic and Personal Projects

    Strategic Asset Manager

    System that evaluates an investment decision taking into account the stock’s historical performance, global news sentiment and company’s Edgar reports.

  • Parsed Edgar reports and applied sentiment analysis to be used as a feature in stock price prediction model.
  • Used AWS Comprehend to find insights and relationships in text using machine learning. Created chatbot using AWS Lex. BERT was hosted on EC2 and files were stored on S3 which were used to answer user questions on Edgar reports. Lambda was the central co-ordinator between all components to work with Lex and deliver output.
    • Python
    • NLP
    • AWS
    • SQL
    • Keras
    • Scikit-Learn
    Live Demo Github

    Architecture of the Chabot designed as part of the project.

    Real Time Speech Activated Assistant (RETINA)

    This project is aimed to act as an interactive assistant to humans to achieve tasks. There is a speech conversation with the system to command the system to detect objects in a live video feed.

    The user issues the wake up command to pass object name as speech and the systems returns a bounding box and speech feedback regarding the presence of the object.

    • Machine Learning
    • YOLO
    • Python
    • Tensorflow
    • CNN
    • RNN
    Live Demo Github Poster

    The application continues to look in a live webcam feed generated through OpenCV. In parallel, post a wakeup command, a text input is passed to the system and converted to text using Speech Recognizer.

    Find Your Home

    - Find Your Home is an online platform that enables landlords and their tenants to gain a common consensus and enhance transparency. A landlord would want to list and publicize his properties, manage his tenants and attend to service requests to ensure meeting his side of the deal. On the other hand, a potential tenant would want to find and compare properties or raise service requests in their existing property.

    Our solution aims to simplify the interactions amongst tenants, landlords and their properties by bringing them into a common domain and establishing a relationship that alleviates this plethora of problems making the pipe-dream a reality.

    • python
    • Azure
    • AWS
    • Rest API
    • Scala
    • Kubernetes
    • Docker
    Live Demo Github

    All variations are organized separately so you can use / customize the template very easily.

    Certifications

    CKAD: Certified Kubernetes Application Developer

    View
    AWS Certified Data Analytics – Specialty

    View
    Tableau Desktop Specialist

    View
    Data Science Methodology

    View
    Data Science Methodology

    View
    Data Science Methodology

    View

    Featured Posts

    Time Series Forecasting: A Deep Dive

    John is a hotel manager and is given the task for forecasting the room bookings for the next season so that the hotel can make staff and inventory available.

    Read More

    Proportion are what’s really needed

    Financial markets investment decisions are more than just crunching numbers. It is tough for the majority of us without any formal training to gain the necessary information to make investment decisions.

    Read More