The Prescott county mayor, Robert (Pete) Smith has been worried for some time that housing values in the county have been declining. Pete said to the county commission recently,
“Our housing stock is getting so old and tired, and I’m afraid that if we don’t start building new homes that our children will just move away to Orlando or even to, heaven forbid, to South Florida. I think that we need to study this situation, and do something about it right now!”
What Pete did not say, but each of the commissioners knew, was that his brother-in-law, Bo Bradley is a developer who wants the commission to rezone 350 acres in the north county for a new development. This land is currently envisioned to be a county park, but old Bo want to develop it. Bo said recently, “What this county needs is my development, not some old park for the deer.” Bo it seems is interested in making money more than he is in protecting undeveloped land.
In order to get this study going Pete has asked you to look at some recent sales of homes in the county to understand what is going on with the housing stock, and then to project out what kind of values that five typical housing could bring. What he is hoping is that the values will be so low that the commission will want to rezone those 350 acres for Bo’s development.
Using the Excel data file that has been provided you are to completely answer the following questions:
1. What is the current status of the housing stock in the county?
a. To do this you will create a one-variable summary using StatTools and analyze the age of the recently sold homes, their average price sold, the number of bedrooms, bathrooms and number of cars that can be garaged.
b. What does the skewness and kurtosis tell you about these data?
c. Would it be better to use the Interquartile range to analyze this data (not a yes or no answer) and if so why, or why not?
2. Doing two Q-Q plots, do you consider the data for price and square footage to be normal or not, and why?
3. Doing a correlation in StatTools and using all six of the variables, how are each of these variables correlated to each other. Again be specific.
4. Doing a scatterplot of price versus square footage and adding a trend line to the plot, what does this tell you about the data?
a. Now do a scatterplot of price versus age and adding a trend line, what does this tell you about the question of new homes versus price?
5. Next do a multiple regression using price as the dependent variable, and all other variables as independent variables:
a. Do any of the variables have a t-value that is greater than the alpha (.05) for this assignment? If so, delete them and rerun the regression and compare and contrast the old regression versus the new regression without one or more of the variables.
b. Is the F-ratio for this/these regressions significant? Why?
c. Is the r-squared values for this/these regressions appear to be valid? Does it show that it explained a sufficient amount of the total variation?
6. Using the coefficients from this regression estimate the selling prices for the following typical Prescott county homes:
Home
number Square footage Age of the home Number of bedrooms Number of bathrooms Size of the garage (cars) Projected Selling Price (determined by you)
1 1,850 25 3 2 1
2 2,200 14 4 3 2
3 3,000 5 5 4 3
4 3,400 5 5 5 3
5 2,200 40 3 2 1
7. Based on the projected selling price of these homes, will Mayor Pete convince the county commission to rezone the land for his brother-in-law, or does the county get a new park? Defend your answer based on the statistics that you have calculated in this case study. Your argument should be no more than 300 words.
You are to do the following tests including explaining/analyzing the results of these statistical tests:
1. A one variable summary for all variables (single output with all variables on it).
2. A scatterplot with the price versus square footage, add a trend line and show the R2 value on the plot.
3. A correlation coefficient showing all variables – analyze the results.
4. Histograms for all variables – 6 different histograms.
5. A multiple regression with all 6 coefficients calculated correctly. Explain the r-squared value, what it means, the F-value and what it means, and the T-test values and what they mean.
a. Are there any variables that could be deleted from the regression? If there are rerun the regression without this variable(s).
b. If you remove one or more variable then recalculate the regression.
6. Predict the home selling price complete and include Table One found below in your written report:
Home
number
Square footage
Age of the home
Number of bedrooms
Number of bathrooms
Size of the garage (cars)
Projected Selling Price (determined by you)
Price | Sq. Feet | Age | Bedrooms | Bathrooms | Garage _ |
110000 | 1000 | 28 | 3 | 1 | 1 |
133500 | 1400 | 23 | 3 | 1 | 1 |
112500 | 1248 | 58 | 3 | 4 | 1 |
141750 | 1106 | 12 | 2 | 1 | 1 |
195250 | 2112 | 78 | 2 | 6 | 2 |
132250 | 1078 | 33 | 2 | 1 | 1 |
136000 | 952 | 13 | 2 | 3 | 2 |