Lesson 6 December 2019

Practice with regression and statistical hypothesis

Exercise 1

Load the library MASS and consider the dataset “crabs”, which has 200 rows and 8 columns, describing 5 morphological measurements on 50 crabs each of two colour forms and both sexes, of the species Leptograpsus variegatus collected at Fremantle, W. Australia.

  1. Create a new dataset that contains only the variables sex, FL e RW. Compute the mean, quartiles and standard deviation of the numerical variables. The following analysis are intended on this reduced dataset.

  2. Using histograms, visualize the distribution of the numerical variables.

  3. Using a box plot, visualize the variable RW at the varying of the sex. Comment on the result.

  4. Define a linear model for the variable RW at the varying of FL (predictor). Comment on the results and visualize the regression line against the experimental data.

  5. Define a linear model for the variable RW at the varying of FL and sex (predictors). Compare the results with the previous regression model and comment on the results.

Exercise 2

Read the dataset about the bike sharing service in Washington (description).

  1. Check if all the variables are read correctly, otherwise transform them in the correct format.

  2. Plot the variable cnt at the varying of the date.

  3. Using the likelihood ratio, discuss if the mean number of rented bikes in the months of May 2011 and May 2012 are statistically different.

  4. Compare and comment the results with t.test.

Solutions

Exercise 3

Download the dataset containing the weather information for the same period of Exercise 2.

  1. Merge the two datasets (weather and bike sharing). The following analysis are intended on this merged dataset.

  2. From this new dataset, select the rows about spring of 2012. The following analysis are intended on this reduced dataset.

  3. Using histograms, visualize the distribution of the numerical variables cnt.

  4. Using a box plot, visualize the variable cnt at the varying of the weathersit. Comment on the result.

  5. Use the t.test to determine if the mean number of rented bike is statistically different in days with nice weather (variable weathersit == 1) or not so nice weather (weathersit >1). Comment on the results.

  6. Define a linear model for the variable cnt at the varying of atemp (predictor). Comment on the results and visualize the regression line against the experimental data.

  7. Define a linear model for the variable cnt at the varying of atemp and weathersit (predictors). Compare the results with the previous regression model and comment on the results.

Link at the Datasaurus distribution.

© 2017-2020 Federico Reali