What is Data Science?

Data Science

It requires below skills:

- Domain Knowledge (Finance/Banking/Retail/Supply Chain/Healthcare/Insurance)

- Statistical Skills (Measures of Central Tendencies/Non-Central Tendencies)

- Programming (Engineering Skills)

- Mathematical Skills (Average)

Purpose of Data Science is to use modern technology, processing power to use above skills.

Statistical knowledge has been existing for years, the algorithms are existing for decades.

However, the term Data Science is merely combining technology with existing statistical skills.

Data Science enables business problem solving & Decision making process.

Role of a Data Scientist:

- Ask "right" Questions

- Acquire Data from various Organizational Sources (File Handling/File Formats)

- Structured Data (tabular/csv)

- Semi-structured Data (e.g. json/html/xml file format)

- Unstructured Data (Text files/Audio/Video files)

- Web Scraping (Getting Data through Web/View Source HTML)

- Wrangle Data

- Data Cleansing (NULL Removal/Duplicate Removals etc.)

- Data Manipulation/Transformation (bringing data in structured format)

-Impute missing Values

-Create New Variables

        - Challenges: - Takes upto 70% of Data Scientist's time

- Unexpected Data Format (eg.XML instead of CSV)

- Erroneous Data

- Voluminous data to be manipulated

- Classifying data into linear (eg.Age) or clustered (eg.Age range/slab)

- Explore Data

- Data Discovery

- Data Pattern (Plot the Data/Bar Charts/Histogram/Scatter Plot etc.)

- Challenges:

-Determining relationship between Observation, Feature & Response

Observation    : Rows in Table

Variables     : Columns in Table

Features (Variable): Column(s) which are thrown into Models. (Also called Independent Variables)

Response (Variable): Column which is to be studied. (Also called Dependent Variable)  

- Apply Mathematical/Statistical Model

- Based on the nature of problem we are trying to solve.

- Regression/Classification etc.

- Machine Learning

- Wrangled Data can be input to Machine Learning Algorithm to improve results

- Present Outcomes/Communicate to Stakeholders/Conclude or Prediction

- Data Visualization (Tableau/Dashboards/Charts)

- Data Report

- Data Products (Recommendations based on previous Data/Choices e.g. Youtube/Facebook)


Comments

Popular posts from this blog

Types of Data Analysis

MovieLens Case Study with Python

Data Pre-processing for Machine Learning