Statistics in Data Science

 What is Statistics?

- Study of the collection, analysis, interpretation, presentation and organization of Data

- Problem Statement -> Data Analysis -> Informed Business Decisions (Problems Solved)

- Mean, Median, Mode etc. are Statistical formulas

- Statistical Principles:

Data should be normally distributed

Linear Regression: Relation between variables should be linear

- Categories of Statistics

- Descriptive Analytics

- used when we have full data for given population

- Inferential Analytics

- used when there is incomplete data for given population (eg. exit polls)

- used when it is not feasible to examine every member/analyze entire population data

- we study a random sample and describe/make inferences about population

- Statistical Analysis Considerations

-purpose is clear and well-defined

-document questions in advance

-define population of interest (based on purpose)

-determine sample (based on purpose of study)

-sample must be random to represent characteristics of population


Comments

Popular posts from this blog

MovieLens Case Study with Python

Data Pre-processing for Machine Learning