Hello I'm

Rishabh Jain

Data Engineer

About Me

I’m Rishabh, Data Engineer based in Vancouver,Canada. I have extensive experience in Developing Big Data pipelines and working on cloud ecosystems. I am also good at Data Warehousing, ETL Development and writing scalable code.

  • Python
  • Spark
  • Sql
  • Airflow
  • DBT
  • Kubernetes
  • Snowflake
  • AWS
  • Glue
Downlaod CV

What I do

Data Engineer

I love playing with data, finding insights by creating Data pipelines and handling end to end Big Data systems.

Cloud Engineer

With experience working in different cloud technologies, it's my passion to develop and architect robust cloud ecosystems by following best practices.

Python Developer

Python is my preferred language in coding. Done data transformation using libraries like Pandas, Numpy. Also used in Pyspark and creating various ML models.

Technical Skills

SQL
95%
Python
85%
Spark
80%
Kubernetes
75%
AWS
70%
Airflow
65%

Professional Skills

  • Enthusiastic
  • Team Work
  • Communication
  • Leadership

Work Experience

Data Engineer WineDirect, Vancouver

May 2023 - Present
Responsibility :
  • Designed and implemented ETL pipelines using AWS Glue and PySpark, reducing AWS data processing time by 25% and improving data workflow efficiency.
  • Introduced and Productionized DBT within the data platform, improving data quality and governance using various custom and singular tests.
  • Optimized Snowflake performance by analyzing query profiles, fine-tuning warehouse configuration, built-in optimization tools, reducing costs by 25%.
  • Extensively worked on data modeling, leveraging Slowly Changing Dimensions and daily partitioned snapshots to optimize historical data tracking.
  • Architected a centralized Data Lake to store and manage Terabytes of historical data, enabling seamless access for clients through a client-facing API.
  • Streamlined database schema migrations using Flyway, ensuring version control and minimizing downtime during deployments.

Data Engineer Skillz, Vancouver

Jun 2021 - May 2023
Responsibility :
  • Worked on complex SQL queries, contributed to database schema design, and performed query optimization and tuning.
  • Migrated data from Redshift to Snowflake while maintaining data integrity and consistency while tackling schema evolution and enforcement.
  • Spearheaded the integration of Snowflake cloud data warehouse which enhanced data analytics and reporting capabilities for cross-functional teams.
  • Optimized alert thresholds for data integrity metrics, contributing to a 30% reduction in unnecessary notifications, enabling the team to concentrate on genuine data integrity challenges and enhancing overall productivity.

Big Data Developer Intern Samsung Electronics, Canada

May 2020- Dec 2020
Responsibility :
  • Migration of AWS EMR to EKS to run Spark jobs using Spot instances to save cost
  • Created Airflow/AWS data pipelines to run ETL batches running complex SQL queries and python code
  • Provisioned infrastructure using Terraform to create AWS ASG for running Spark Scala code on Kubernetes
  • Worked on Kubernetes, Docker, CI/CD, Distributed systems and Functional Data Engineering paradigm

Data Engineer Tesco Bank, United Kingdom

April 2017- Mar 2018
Responsibility :
  • Traveled to the UK for one year to understand the business requirements and build a robust application for sending personalised offers across different banking products like Savings, Mortgages, Loans etc
  • Worked on Requirement gathering, BSD, ETL Technical designs, Business analysis, Data modelling and build phase for different life cycle of Tesco Bank

Software Engineer Data Tesco, India

Aug 2015- Jan 2019
Responsibility :
  • Worked on multiple ETL projects and created end to end ETL pipelines
  • Effectively processed huge data sets by deploying advanced querying, visualization and analytics tools
  • Successfully processed structured, semi-structured and unstructured data sets to identify & assess key takeaways.

Education

Master's in Computer Science, Big Data from Simon Fraser University, Canada

2019-2021

  • CGPA: 4.0 / 4.3
  • Relevant courses taken:
  • Machine Learning
  • Statistics
  • Big Data systems
  • Algorithms
  • Bachelor's in Computer Science from Vellore Institute of Technology, India

    2011-2015

  • CGPA: 9.10 / 10
  • Relevant courses taken :
  • Data Mining
  • Database
  • Cloud computing
  • Academic and Personal Projects

    Strategic Asset Manager

    System that evaluates an investment decision taking into account the stock’s historical performance, global news sentiment and company’s Edgar reports.

  • Parsed Edgar reports and applied sentiment analysis to be used as a feature in stock price prediction model.
  • Used AWS Comprehend to find insights and relationships in text using machine learning. Created chatbot using AWS Lex. BERT was hosted on EC2 and files were stored on S3 which were used to answer user questions on Edgar reports. Lambda was the central co-ordinator between all components to work with Lex and deliver output.
    • Python
    • NLP
    • AWS
    • SQL
    • Keras
    • Scikit-Learn
    Live Demo Github

    Architecture of the Chabot designed as part of the project.

    Real Time Speech Activated Assistant (RETINA)

    This project is aimed to act as an interactive assistant to humans to achieve tasks. There is a speech conversation with the system to command the system to detect objects in a live video feed.

    The user issues the wake up command to pass object name as speech and the systems returns a bounding box and speech feedback regarding the presence of the object.

    • Machine Learning
    • YOLO
    • Python
    • Tensorflow
    • CNN
    • RNN
    Live Demo Github Poster

    The application continues to look in a live webcam feed generated through OpenCV. In parallel, post a wakeup command, a text input is passed to the system and converted to text using Speech Recognizer.

    Find Your Home

    - Find Your Home is an online platform that enables landlords and their tenants to gain a common consensus and enhance transparency. A landlord would want to list and publicize his properties, manage his tenants and attend to service requests to ensure meeting his side of the deal. On the other hand, a potential tenant would want to find and compare properties or raise service requests in their existing property.

    Our solution aims to simplify the interactions amongst tenants, landlords and their properties by bringing them into a common domain and establishing a relationship that alleviates this plethora of problems making the pipe-dream a reality.

    • python
    • Azure
    • AWS
    • Rest API
    • Scala
    • Kubernetes
    • Docker
    Live Demo Github

    All variations are organized separately so you can use / customize the template very easily.

    Certifications

    CKAD: Certified Kubernetes Application Developer

    View
    AWS Certified Data Analytics – Specialty

    View
    Tableau Desktop Specialist

    View
    Data Science Methodology

    View
    Data Science Methodology

    View
    Data Science Methodology

    View

    Featured Posts

    Time Series Forecasting: A Deep Dive

    John is a hotel manager and is given the task for forecasting the room bookings for the next season so that the hotel can make staff and inventory available.

    Read More

    Proportion are what’s really needed

    Financial markets investment decisions are more than just crunching numbers. It is tough for the majority of us without any formal training to gain the necessary information to make investment decisions.

    Read More