Hello I'm

Rishabh Jain

Data Engineer

About Me

I’m Rishabh, Data Engineer based in Vancouver,Canada. I have rich experience in Developing Big Data pipelines and working on cloud ecosystems. I am also good at Data Warehousing, ETL Development and Machine Learning.

  • Python
  • Spark
  • Sql
  • Airflow
  • DBT
  • Kubernetes
  • Streamsets
  • AWS
  • Terraform
Downlaod CV

What I do

Data Engineer

I love playing with data, finding insights by creating Data pipelines and handling end to end Big Data systems.

Cloud Engineer

With experience working in different cloud technologies, it's my passion to develop and architect robust cloud ecosystems by following best practices.

Python Developer

Python is my preferred language in coding. Done data transformation using libraries like Pandas, Numpy. Also used in Pyspark and creating various ML models.

Technical Skills

SQL
95%
Python
85%
Spark
80%
Kubernetes
75%
AWS
70%
Airflow
65%

Professional Skills

  • Enthusiastic
  • Team Work
  • Communication
  • Leadership

Work Experience

Data Engineer Skillz, India

Jun 2021 - Present
Responsibility :
  • Building systems to provide real-time streaming analytics and event processing pipeline based on fast data architecture.
  • Building enterprise-grade data warehouse to support both analytical needs and next-generation data infrastructure
  • Building data integration toolkit for backend services
  • Improve monitoring and alarms that impact data integrity replication lag.

Big Data Developer Intern Samsung Electronics, Canada

May 2020- Dec 2020
Responsibility :
  • Migration of AWS EMR to EKS to run Spark jobs using Spot instances to save cost
  • Created Airflow/AWS data pipelines to run ETL batches running complex SQL queries and python code
  • Provisioned infrastructure using Terraform to create AWS ASG for running Spark Scala code on Kubernetes
  • Worked on Kubernetes, Docker, CI/CD, Distributed systems and Functional Data Engineering paradigm

ETL Developer ITC Infotech, India

Jan 2019- Aug 2019
Responsibility :
  • Reporting - Boosted business via data manipulation and analysis to analyse customer habits & create user-friendly reports
  • Data Processing & Management - Analyzed source systems to identify business rules for data profiling, data flows, data transformations & cleansing
  • Worked in cloud based technologies such as S3, Dynamo DB, Elasticsearch, EC2, Cloudformation.

Data Analyst Tesco Bank, United Kingdom

April 2017- Mar 2018
Responsibility :
  • Traveled to the UK for one year to understand the business requirements and build a robust application for sending personalised offers across different banking products like Savings, Mortgages, Loans etc
  • Worked on Requirement gathering, BSD, ETL Technical designs, Business analysis, Data modelling and build phase for different life cycle of Tesco Bank

Data Engineer Tesco, India

Aug 2015- Jan 2019
Responsibility :
  • Worked on multiple ETL projects and created end to end ETL pipelines
  • Effectively processed huge data sets by deploying advanced querying, visualization and analytics tools
  • Successfully processed structured, semi-structured and unstructured data sets to identify & assess key takeaways.

Education

Master's in Computer Science, Big Data from Simon Fraser University, Canada

2019-2021

  • CGPA: 4.0 / 4.3
  • Relevant courses taken:
  • Machine Learning
  • Statistics
  • Big Data systems, Algorithms

    Bachelor's in Computer Science from Vellore Institute of Technology, India

    2011-2015

  • CGPA: 8.27 / 10
  • Relevant courses taken :
  • Data Mining
  • Database
  • Cloud computing
  • Academic and Personal Projects

    Strategic Asset Manager

    System that evaluates an investment decision taking into account the stock’s historical performance, global news sentiment and company’s Edgar reports.

  • Parsed Edgar reports and applied sentiment analysis to be used as a feature in stock price prediction model.
  • Used AWS Comprehend to find insights and relationships in text using machine learning. Created chatbot using AWS Lex. BERT was hosted on EC2 and files were stored on S3 which were used to answer user questions on Edgar reports. Lambda was the central co-ordinator between all components to work with Lex and deliver output.
    • Python
    • NLP
    • AWS
    • SQL
    • Keras
    • Scikit-Learn
    Live Demo Github

    Architecture of the Chabot designed as part of the project.

    Real Time Speech Activated Assistant (RETINA)

    This project is aimed to act as an interactive assistant to humans to achieve tasks. There is a speech conversation with the system to command the system to detect objects in a live video feed.

    The user issues the wake up command to pass object name as speech and the systems returns a bounding box and speech feedback regarding the presence of the object.

    • Machine Learning
    • YOLO
    • Python
    • Tensorflow
    • CNN
    • RNN
    Live Demo Github Poster

    The application continues to look in a live webcam feed generated through OpenCV. In parallel, post a wakeup command, a text input is passed to the system and converted to text using Speech Recognizer.

    Find Your Home

    - Find Your Home is an online platform that enables landlords and their tenants to gain a common consensus and enhance transparency. A landlord would want to list and publicize his properties, manage his tenants and attend to service requests to ensure meeting his side of the deal. On the other hand, a potential tenant would want to find and compare properties or raise service requests in their existing property.

    Our solution aims to simplify the interactions amongst tenants, landlords and their properties by bringing them into a common domain and establishing a relationship that alleviates this plethora of problems making the pipe-dream a reality.

    • python
    • Azure
    • AWS
    • Rest API
    • Scala
    • Kubernetes
    • Docker
    Live Demo Github

    All variations are organized separately so you can use / customize the template very easily.

    Certifications

    CKAD: Certified Kubernetes Application Developer

    View
    AWS Certified Data Analytics – Specialty

    View
    Tableau Desktop Specialist

    View
    Data Science Methodology

    View

    Featured Posts

    Time Series Forecasting: A Deep Dive

    John is a hotel manager and is given the task for forecasting the room bookings for the next season so that the hotel can make staff and inventory available.

    Read More

    Proportion are what’s really needed

    Financial markets investment decisions are more than just crunching numbers. It is tough for the majority of us without any formal training to gain the necessary information to make investment decisions.

    Read More