## What's new? |

## Variation Inflation Factor - 컴퓨터 |

Unfortunately, not all collinearity problems can be

detected by inspection of the correlation matrix: it is possible for collinearity to exist between three or more variables even if no pair of variables

has a particularly high correlation. We call this situation multicollinearity.

multiInstead of inspecting the correlation matrix, a better way to assess

collinearity is to compute the variance inflation factor (VIF). The VIF is

the ratio of the variance of βˆj when fitting the full model divided by the

variance of βˆj if fit on its own. The smallest possible value for VIF is 1,

which indicates the complete absence of collinearity. Typically in practice

there is a small amount of collinearity among the predictors. As a rule of

thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of

collinearity.

공식은 ISL Chapter 3. Linear Regression 3.3 마지막 부분 p. 102

detected by inspection of the correlation matrix: it is possible for collinearity to exist between three or more variables even if no pair of variables

has a particularly high correlation. We call this situation multicollinearity.

multiInstead of inspecting the correlation matrix, a better way to assess

collinearity is to compute the variance inflation factor (VIF). The VIF is

the ratio of the variance of βˆj when fitting the full model divided by the

variance of βˆj if fit on its own. The smallest possible value for VIF is 1,

which indicates the complete absence of collinearity. Typically in practice

there is a small amount of collinearity among the predictors. As a rule of

thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of

collinearity.

공식은 ISL Chapter 3. Linear Regression 3.3 마지막 부분 p. 102

written time : 2017-08-20 21:15:33.0

## Outliers/High Leverage observations and influential points - 컴퓨터 |

In this section, we learn the distinction between outliers and high leverage observations. In short:

An outlier is a data point whose response

A data point has high leverage if it has "extreme" predictor

Note that — for our purposes — we consider a data point to be an outlier only if it is extreme with respect to the other y values, not the x values.

A data point is influential if it unduly influences any part of a regression analysis, such as the predicted responses, the estimated slope coefficients, or the hypothesis test results. Outliers and high leverage data points have the potential to be influential, but we generally have to investigate further to determine whether or not they are actually influential.

One advantage of the case in which we have only one predictor is that we can look at simple scatter plots in order to identify any outliers and influential data points. Let's take a look at a few examples that should help to clarify the distinction between the two types of extreme values.

from https://onlinecourses.science.psu.edu/stat501/node/337

An outlier is a data point whose response

**y**does not follow the general trend of the rest of the data.A data point has high leverage if it has "extreme" predictor

**x**values. With a single predictor, an extreme x value is simply one that is particularly high or low. With multiple predictors, extreme x values may be particularly high or low for one or more predictors, or may be "unusual" combinations of predictor values (e.g., with two predictors that are positively correlated, an unusual combination of predictor values might be a high value of one predictor paired with a low value of the other predictor).Note that — for our purposes — we consider a data point to be an outlier only if it is extreme with respect to the other y values, not the x values.

A data point is influential if it unduly influences any part of a regression analysis, such as the predicted responses, the estimated slope coefficients, or the hypothesis test results. Outliers and high leverage data points have the potential to be influential, but we generally have to investigate further to determine whether or not they are actually influential.

One advantage of the case in which we have only one predictor is that we can look at simple scatter plots in order to identify any outliers and influential data points. Let's take a look at a few examples that should help to clarify the distinction between the two types of extreme values.

from https://onlinecourses.science.psu.edu/stat501/node/337

written time : 2017-08-20 20:54:42.0

## qual 대비1 - 컴퓨터 |

stat 2

- 1-way RM ANOVA(wg(ssub, err))

- 2-way ANOVA

stat 3

- pooled proportion to SE(pooled q!!, sqrt((pooledP * pooledQ) / N))

- proportion: 2-way table

ml 0

- Forward, State Prob.

- EM: theta 는 emission

- GMM: Euclidean / variance 최소

- SVM(Maximal margin) -> 어떤점??

ml 1

- bootstrap : lim(1-1/n)^n = e^-1

ml 2

- WCV 는 2 곱하기 클러스터 분산

- Random Forest

- Effective df

- Stacked classifier

- Ensemble: p(majority)

- 1-way RM ANOVA(wg(ssub, err))

- 2-way ANOVA

stat 3

- pooled proportion to SE(pooled q!!, sqrt((pooledP * pooledQ) / N))

- proportion: 2-way table

ml 0

- Forward, State Prob.

- EM: theta 는 emission

- GMM: Euclidean / variance 최소

- SVM(Maximal margin) -> 어떤점??

ml 1

- bootstrap : lim(1-1/n)^n = e^-1

ml 2

- WCV 는 2 곱하기 클러스터 분산

- Random Forest

- Effective df

- Stacked classifier

- Ensemble: p(majority)

written time : 2017-08-18 23:44:14.0