Cette page appartient aux archives web de l'EPFL et n'est plus tenue à jour.
This page belongs to EPFL's web archive and is no longer updated.

Bias/Variance Tradeoff

Hello,

In week6, slide 3 page3, when computing delta for the 3 different cases, I don't understand the calculation that results to the bias in the wrong model and to 0 (no bias) for the correct and true model.. What am I missing?

Thanks in advance,

Jérémy

Posted by Jeremy Gotteland on Wednesday 8 January 2014 at 18:00
Comments
Delta is given by 1/n E(||y+ - yHat||^2) where ||y+ - yHat||^2 is given by the formula on the slide. Since E(cross terms) = 0, we'll only need to compute the expectation of the first three terms. The first term is a constant so the expectation is just the term itself. The expectations of epsilon^T H_diamond epsilon and epsilon_+^T epsilon_+ can be evaluated using the same trace trick as in the midterm. The resulting expression for delta is the one given for the wrong model. When X_diamond corresponds to the true model, i.e., X_diamond = X_heart, mu^T (I-H_diamond) mu = 0 and we get the expression given for the true model. Obtaining the expression for the correct model is a bit more complicated. When the columns of X_heart are orthogonal to the extra columns in X_diamond, it is pretty easy to see that again mu^T (I-H_diamond) mu = 0. When we don't have orthogonality, I believe that the result follows from the same matrix inversion formula we used in Serie 10, Exercise 2, but I haven't verified this.
Posted by Mikael Kuusela on Friday 10 January 2014 at 19:45
Hi,
I was wondering in which case we do not have orthogonality in the correct model case since M(X_heart) is a subset of M(X_diamond).

Thanks in advance

Jean-Claude
Posted by Jean-Claude Ton on Saturday 11 January 2014 at 11:25
By orthogonality I mean that X_heart^T X_+ = 0, where X_+ are the extra columns in X_diamond. I.e., each column in X_heart is orthogonal to each column in X_+. You are right that M(X_heart) is a subset of M(X_diamond), but as far as I can see this doesn't guarantee anything about the orthogonality of the columns.
Posted by Mikael Kuusela on Sunday 12 January 2014 at 23:31
Ok thanks, would you be so kind to tell me the mistake in my reasoning please?
Since mu^T is in M(X_heart), subset of M(X_Diamond), and (I-H_diamond) is a projection on the orthogonal complement of M(X_Diamond) .
mu_T (I-H_diamond)=0.
Thanks in advance
Jean-Claude
Posted by Jean-Claude Ton on Monday 13 January 2014 at 10:07
Everything you say is true, but how do you deduce from this that the columns of X_heart would be orthogonal to the columns of X_+? As far as I can see, this shows that mu is orthogonal to a set of vectors in the orthogonal complement of M(X_diamond) which is trivially true.
Posted by Mikael Kuusela on Wednesday 15 January 2014 at 18:11
Oh, I've misunderstood your question then.
However you said that
"When the columns of X_heart are orthogonal to the extra columns in X_diamond, it is pretty easy to see that again mu^T (I-H_diamond) mu = 0."
but my reasoning shows that mu^T (I-H_diamond)= 0, which you said is always true. So why do we need the columns of X_heart to be orthogonal to the extra columns in X_diamond ?
Posted by Jean-Claude Ton on Wednesday 15 January 2014 at 18:22
Ah, yes, very good! Your argument shows that the bias term vanishes whenever X_diamond is a correct or the true model. I tried doing this by writing out the matrix products which becomes very messy unless you assume orthogonality. But indeed, by just thinking about the geometry of mu^T (I - H_diamond), we can deduce that this is equal to 0 for any linearly independent columns.
Posted by Mikael Kuusela on Wednesday 15 January 2014 at 18:43