# Exam 1 Instructions & Study Guide

** **

** **

**Instructions**

**Exam Date and Time: **Friday June 17 at 7pm EST (__no make-ups__). ALL students will be required to take Exam 1 at the same time (7pm EST on June 17) which is the time designated by the University of Delaware for this course.

**Format: **Online on Canvas and Zoom. Exam 1 will be proctored by Professor Kourpas. You will need to log into Zoom at the same time and have your camera on. You will not be able to take the exam and receive a grade if you do not have your computer camera on Zoom. Exam 1 monitoring software on Canvas will note what programs you have open during Exam 1, so any use to email, chat, or other online programs is strictly prohibited and will reported, enforcing the appropriate University disciplinary action.

Absolutely no use of cellphones or multiple computers during the exam.

**Zoom Link for Exam 1: **https://udel.zoom.us/j/95130647263

Do not try to access before Exam 1.

**Lectures covered in Exam 1**: The first 8 lectures will be covered (Lectures 1-8) with emphasis on the Linear Regression lectures (Lectures 4-8)

**Time Limit: **90 minutes (DSS Students will have adjusted time). __You can’t start, stop,__ __and continue! Time starts from the time you first access the exam!__ If you are a DSS student, you will need to have approval from the DSS office sent directly to Professor Kourpas (some students have already done so).

**Access to Material: **You can have access to your computers, including lecture slides, lecture videos, R software, and scripts you have built as part of this class (absolutely no email/chat, or any material and notes that have not been part of my course). In some parts, you will be required to run models on R, transfer your results on Canvas, and answer questions. In other parts, you will be required to interpret R output and make decisions. All answers will be submitted on Canvas. Some questions will be essay format, some will be multiple choice.

**Exams should be completed individually! Any indications of academic dishonesty will be reported and result in enforcing the appropriate University disciplinary action.**

# Study Guide & Hints

The following study guide was designed as a summary to help you study and prepare for the cumulative Final. It is not supposed to be an exhaustive list of questions or topics.

## Basic R Programming and Statistical Concepts: Lectures 1-3

- Know the different type of variables and how to recognize them (continuous, discrete, categorical, binary, nominal, ordinal, etc.)
- Understand the difference between independent variables (predictors) and dependent variables (response, outcome) and how to use in regression.
- Understand how to read files in R, and how to use and recognize the right syntax of the read command. Example: > d=read.csv(“Packages.csv”, header=TRUE)
- Understand the basic R commands, how to use and recognize Examples: length, mean, median, unique, quantile, dim, cor
- Understand the use of the “which” command, how to use it appropriately and how to recognize its meaning/logic.
- Know how to use the aggregate The aggregate command can have more than 1 variables in the grouping list. Understand the logic.

Example: >aggregate(d$driversworking, by=list(d$year, d$weekend), mean)

- Understand how to use the histogram command in R, and how to interpret histograms. What do histograms show us?

Example: hist(d$avghoursperdriver), hist(fit$resid), hist(tstresid)

- Understand how to use the plot command in R to create scatterplots, and how to interpret scatterplot. What do scatterplot show us?

Example: plot(d$avghoursperdriver, d$driversworking)

- Know how to use and interpret correlation in R (i.e., cor function in R). What does correlation mean? (just a number means nothing…)
- Know how to divide and cut a dataset, creating a new subset of the data in You have to be able to recognize and understand the syntax and logic.

Example: w = d[which(d$age> 25),]

## Linear Regressions (simple & multiple): Lectures 4-8

- Understand when to use Linear versus Logistic Regression
- Understand and recognize the R command for building a linear regression Ex. : > fit = lm(d$avghoursperdriver ~ d$pctoversized + d$weatherconditions)
- Understand what statistical significance means and how to test for What does it mean if a variable is not statistically significant?

- To test for significance at standard levels of significance (e.g., α=0.05) use stars or p-values (if p-value is less than the level of significance α, then the variable/coefficient is statistically significant). P-values allow you to test statistical significance at non-standard levels of significance (e.g., α=0.0375).
- Understand how to build a model equation after you have tested for statistical
- Make sure you know how to interpret the intercept in a model, the coefficients/slopes, and the Coefficient of Determination (R2).
- If a variable is not statistically significant, it is a candidate for removal. DO NOT interpret the associated coefficient, because we do not have enough evidence/confidence (at the specific confidence level α) that the coefficient is different than zero, since we failed to reject the null hypothesis. So what happens then? The correct thing is to rerun the model without the not statistically significant predictor variables. IF really pressed for time, some business analytics people and statisticians just write the model with the rest of the coefficients (rest of independent variables), but this is less preferred.
- Now what happens to the coefficient of determination (R2). The coefficient of determination is still valid (for all independent variables in the R2, even the not statistically significant ones) IF the associated p-value of the F-test is less than the significance level α (e.g., usually 0.05). So you can still make the statement of interpretation that X% of the variation in the dependent variable (Y) is explained by the independent variables (X’s, including the non-significant ones). The only time the R2 is not interpretable is if ALL independent variables are not statistically significant.
- What is left unexplained is 100% minus the R2.
- Note: If you add additional independent variables to an existing model, R-squared CAN NEVER decrease, even if the variables added are not-significant,
- Incremental impacts on dependent variable (Y) and
__economic significance__have to do with the sign (increase or decrease) and magnitude (how much) of the associated coefficients of independent variables. Model fit (how good your model is, how much of the variation in dependent variable is explained by the independent variables) is given by the coefficient of determination (R2). - Sometimes the intercept in a model may not be interpretable (e.g., negative weight, see example on Lecture 8, pages 19-21 and Lecture 8 Video/Part 3) but it should still be included in the model as it helps us build the line of best fit.

Good luck!