Data Analytics Overview

 Data Analytics Process

->Business Problem 

->Data Acquisition

->Data Wrangling

->Exploratory Data Analysis

<-Data Exploration

<-Conclusion or Prediction

<-Communication back to Stakeholders


Data Exploration: Model Selection 

- Consider factors affecting the Response Variable (Variable to be studied)

- Model Selection is a combination of ALGORITHM & DATA

- Model Selection depends on hypothesis testing

- Model should be accurate to avoid iterations

- Selection depends on the type of problem to be solved.

-eg. DO NOT select a REGRESSION Model to solve a CLASSIFICATION Problem

Exploratory Data Anaysis (EDA)

- Approach: Study the Data, Recommend suitable Models that best fit the Data

- Focus: Study the outliers, Data structure

- Assumption: Minimal or No Asumptions / Present based on all underlying Data with no Data loss

- Techniques: Graphical (Uses Statistical functions for input data)

- Histogram (summarizes the distribution of univariate dataset)

- Scatter plot (represents relationships between two variables X&Y)

-Does change in X depends on change in Y?

Quantitative (Numeric Output for input data)

- Measure of Central Tendencies 

- Mean (average)

- Median (middle value)

                                        - Mode (peaks)

- Measure of Spread 

- Variance (Shows volatility of the dataset)

- Standard Deviation

- Inter-quartile Range

- Try available Algorithms with the Dataset to obtain best Algorithm (Model)

Conclusion or Prediction

- Involves heavy use of mathematical/statistical functions

- Machine Learning

HYPOTHESIS: Meaning?

- Possible explaination for a phenomena/statement eg. Using X Medicine cures Cancer

- Hypothesis Building provides Testable explainations of a problem or observation

- Involves 2 variables, one dependent on another

- Test: Dependent variable changes when the independent variable changes

- Consider equation of a linear regression: Y = mX + C

Y is dependent variable

X is independent variable

m is slope of the line

C is co-efficient of Y intercept

- Involves domain expertise to make sense of the data

- Helps construct new features from raw Data automatically or manually


Comments

Popular posts from this blog

MovieLens Case Study with Python

Data Pre-processing for Machine Learning

Types of Data Analysis