# Hypothesis Space / Underfitting / Overfitting / Bias / Variance

Yao Yao on May 15, 2018

## Hypothesis Space

Lets say you have an unknown target function $f:X \to Y$ that you are trying to capture by learning. In order to capture the target function you have to come up with some hypotheses $h_1, \dots, h_n$ where $h_i \in H$. Here, $H$ is your hypothesis space or set.

ISL 有图：

## Bias / Variance

Bias - Bias means how far off our predictions are from real values.

Variance - Change in predictions across different data sets.

## Underfitting / Overfitting

If your algorithm works well with points in your data set, but not on new points, then the algorithm overfit the data set. And if your algorithm works poorly even with points in your data set, then the algorithm underfit the data set.

• Algorithm $A$ 在 training dataset $D_{train}$ 上 learn 到的 hypothesis 为 $\mathbf{w}$
• $D_{train}$ 上的 optimum hypothesis 为 $\mathbf{w’}$
• $D_{test}$ 上的 optimum hypothesis 为 $\mathbf{w’’}$

• Overfitting: $\mathbf{w}$ 接近 $\mathbf{w’}$；但 $\mathbf{w}$ 远离 $\mathbf{w’’}$
• Underfitting: $\mathbf{w}$ 远离 $\mathbf{w’}$；通常情况下 $\mathbf{w}$ 同样远离 $\mathbf{w’’}$

Underfitting is also known as high bias

Overfitting is also known as high variance

A simple way to “fix” your understanding would be to say that, “linguistically”, the underfitting models are biased “away” from training data.

It might be better, however, to rely on a slightly deeper understanding than plain linguistic intuition here, so bear with me for a couple of paragraphs.

The terms “bias” and “variance” do not describe a single trained model. Instead, they are meant to describe the space of possible models among which you will be picking your fit, as well as the method you will use to select this best fit.

No matter what space and method you choose, the model that you find as a result of training is most often not the “true” model that generated your data. The “bias” and “variance” are the names of two important factors, which contribute to your error.

1. Firstly, your space of models / fitting method may be initially biased. That is, the “true” model may not be part of your model space at all. And even if it were, you may be using a fitting method which deliberately misses the correct answer on average, thus introducing the “bias error”
2. Secondly, your space of models may be so large that you will have a hard time finding that particular “true” model within your space. This factor is known as the “variance error”

Consequently, when you are dealing with “narrow” model spaces, you are in a situation when the true model is most probably not within your space (“high bias”), however, you have little problem finding the best possible model within that narrow space (“low variance”). On the other hand, when you fit a model from a “large” space, even though the true model may be part of it (“low bias”), there will be millions of confusingly similar models for you to pick from, and hence a very low chance to stumble upon the correct one (“high variance”).

• 引入新的 feature $x_j$，也就会引入新的 parameter $w_j$，所以 $\mathbf{w}$ 会加一维，hypothesis space 变大（缓解 underfitting）
• 去掉 feature $x_i$，也就会去掉相应的 parameter $w_i$，所以 $\mathbf{w}$ 会减一维，hypothesis space 变小（缓解 overfitting）