Data Analytics Overview
Data Analytics Process
->Business Problem
->Data Acquisition
->Data Wrangling
->Exploratory Data Analysis
<-Data Exploration
<-Conclusion or Prediction
<-Communication back to Stakeholders
Data Exploration: Model Selection
- Consider factors affecting the Response Variable (Variable to be studied)
- Model Selection is a combination of ALGORITHM & DATA
- Model Selection depends on hypothesis testing
- Model should be accurate to avoid iterations
- Selection depends on the type of problem to be solved.
-eg. DO NOT select a REGRESSION Model to solve a CLASSIFICATION Problem
Exploratory Data Anaysis (EDA)
- Approach: Study the Data, Recommend suitable Models that best fit the Data
- Focus: Study the outliers, Data structure
- Assumption: Minimal or No Asumptions / Present based on all underlying Data with no Data loss
- Techniques: Graphical (Uses Statistical functions for input data)
- Histogram (summarizes the distribution of univariate dataset)
- Scatter plot (represents relationships between two variables X&Y)
-Does change in X depends on change in Y?
Quantitative (Numeric Output for input data)
- Measure of Central Tendencies
- Mean (average)
- Median (middle value)
- Mode (peaks)
- Measure of Spread
- Variance (Shows volatility of the dataset)
- Standard Deviation
- Inter-quartile Range
- Try available Algorithms with the Dataset to obtain best Algorithm (Model)
Conclusion or Prediction
- Involves heavy use of mathematical/statistical functions
- Machine Learning
HYPOTHESIS: Meaning?
- Possible explaination for a phenomena/statement eg. Using X Medicine cures Cancer
- Hypothesis Building provides Testable explainations of a problem or observation
- Involves 2 variables, one dependent on another
- Test: Dependent variable changes when the independent variable changes
- Consider equation of a linear regression: Y = mX + C
Y is dependent variable
X is independent variable
m is slope of the line
C is co-efficient of Y intercept
- Involves domain expertise to make sense of the data
- Helps construct new features from raw Data automatically or manually
Comments
Post a Comment