What is Data Science?
Data Science
It requires below skills:
- Domain Knowledge (Finance/Banking/Retail/Supply Chain/Healthcare/Insurance)
- Statistical Skills (Measures of Central Tendencies/Non-Central Tendencies)
- Programming (Engineering Skills)
- Mathematical Skills (Average)
Purpose of Data Science is to use modern technology, processing power to use above skills.
Statistical knowledge has been existing for years, the algorithms are existing for decades.
However, the term Data Science is merely combining technology with existing statistical skills.
Data Science enables business problem solving & Decision making process.
Role of a Data Scientist:
- Ask "right" Questions
- Acquire Data from various Organizational Sources (File Handling/File Formats)
- Structured Data (tabular/csv)
- Semi-structured Data (e.g. json/html/xml file format)
- Unstructured Data (Text files/Audio/Video files)
- Web Scraping (Getting Data through Web/View Source HTML)
- Wrangle Data
- Data Cleansing (NULL Removal/Duplicate Removals etc.)
- Data Manipulation/Transformation (bringing data in structured format)
-Impute missing Values
-Create New Variables
- Challenges: - Takes upto 70% of Data Scientist's time
- Unexpected Data Format (eg.XML instead of CSV)
- Erroneous Data
- Voluminous data to be manipulated
- Classifying data into linear (eg.Age) or clustered (eg.Age range/slab)
- Explore Data
- Data Discovery
- Data Pattern (Plot the Data/Bar Charts/Histogram/Scatter Plot etc.)
- Challenges:
-Determining relationship between Observation, Feature & Response
Observation : Rows in Table
Variables : Columns in Table
Features (Variable): Column(s) which are thrown into Models. (Also called Independent Variables)
Response (Variable): Column which is to be studied. (Also called Dependent Variable)
- Apply Mathematical/Statistical Model
- Based on the nature of problem we are trying to solve.
- Regression/Classification etc.
- Machine Learning
- Wrangled Data can be input to Machine Learning Algorithm to improve results
- Present Outcomes/Communicate to Stakeholders/Conclude or Prediction
- Data Visualization (Tableau/Dashboards/Charts)
- Data Report
- Data Products (Recommendations based on previous Data/Choices e.g. Youtube/Facebook)
Comments
Post a Comment