 Cette page appartient aux archives web de l'EPFL et n'est plus tenue à jour.
This page belongs to EPFL's web archive and is no longer updated.

# SMAT - Linear Models Course

Practical 2 : the data

Hi,

In the presentation of the data is it necesseray to include boxplot and scatter plots of all the observations, or should we just include plots for the weight and Horsepower ?

Clément

Posted by Clément Genetet at 13:33
Practical 2 : Still some influential observations after cleaning

Hello,

In part e), after fitting the model we chose, we noticed 4 observations outliers + leverage points, so we decided to remove them. But after refitting the model without thoses observations a new outlier + leverage point observation appears.

Is that normal ? If so, should we refit the model again without that observations ?

Jérémy

Posted by Jeremy Gotteland at 11:07
Modèle ne contenant que le poids.

Bonjour,

Dans le cadre du practical 2, nous trouvons un modèle ne contenant que le poids pour la méthode forward BIC. De l'avis de tous, il s'agit d'une erreur. Voici le code qui fournit ce modèle :

cars.new <- cars[,-c(1:10)]

m0 <- lm(y ~ 1, data = cars)

mod.fwdBIC <- step(m0, scope = formula(cars.new) , direction="forward", k = log(82))

Merci d'avance,

Arthur

Posted by Arthur Simon Lucien Waltz at 15:11
Practical 2

Hi,

I just want to know if the question 2.e) means that for each criterion (BIC or AIC) we must find a unique model. So it doesn't matter if we go forward or backward?

Thanks!

Posted by Jad Abou-Moussa at 9:09
Rapport 2 - questions

Bonjour,

Nous avons différentes question concernant le deuxième rapport.

1) Dans le point 2)c) nous devons vérifier les problèmes de collinéarité, mais pour cela on doit utiliser "the picket fence" ou une autre méthode?

2) Dans le point 2)d). Comment obtient-on la matrice des graphiques pour vérifier la collinéarité? (cf. exemple body fat du cours)

3)  Dans le point 2)e) Quels sont les deux modèles à approfondir? ceux avec AIC/BIC, ou ceux avec toute les variables/variables réduites?

Merci beaucoup!

Bonne journée.

Posted by Camille Marie Montalcini at 12:37
R becomes crazy when squared

Hi,

For the fun, we compute R2 in R the way it is given in the slides, namely by computing

(t(yHat) %*% yHat)/(t(y) %*% y).

However, we were surprised that the result (that we won't give here for obvious reason) was not the same as the result given by the "Multiple R-squared" of the function summary(myLinearModel).

Should we worry ? If not, which value should we use for the practical ?

Nice evening, Philémon

Posted by Philémon Orphée Favrod at 18:05
Practical 1 - R misunderstanding

Hello,

We do not understand why

y <- 100/cars[,7]

x1 <- cars[,25]

x2 <- cars[,13] / x1

lm(y ~ x1 + x2)

Produces a different model fit than

lm(100/CityMPG ~ Weight + (Horsepower/Weight) , data=cars)

Can you explain that behaviour ?

Jérémy

Posted by Jeremy Gotteland at 14:08
Practical 1 - plots

Hello,

When we're asked to analyse data in paragraph 2, the method suggest than we plot boxplots and scatter plots of response vs. each covariate

But basically box plots and scatterplots are the same when plotting response versus weight or horsepower as every car's got a different value for these variables. On the contrary, if we want to analyse response vs type of transmission, box plots are usefull.

Do we have to choose between scatter and box plots ? Do we need to tell the relevance of each variables or only the ones that we think they're relevant ?

Regards

Posted by Pierre Morel at 17:29
Exercice 2, série 7

Hi,

In part (a), the first term (namely the log-likelihood) of the generalized likelihood ratio is fed with estimator of mu and sigma squared. Which estimators are they ? The MSE ?

Thanks!

Posted by Philémon Orphée Favrod at 12:23