SI 644 – Introductory Statistics & Data Anslysis
Office hours: Thursdays 1-3PM, DIAD Lab and/or 314 West Hall (usually, I will wander through the DIAD at 1PM or shortly thereafter. Once I’ve talked to anyone there who wants to talk to me, I’ll retreat to 314 West Hall, where people can drop in. If you’d like to schedule a time, send email.
email: presnick@umich.edu
Administrative Assistant: Sharon Mahoney 764-1858, mahoneys@umich.edu
This course teaches the fundamentals of statistics, that is the ability to describe data samples and draw inferences about the populations from which they were drawn. It should also sharpen individual intuition about how to read data, interpret data, and judge others’ claims about data.
Specifically, at the end of this course students should be able to:
This course should be useful to a wide variety of students both as preparation for more advanced courses and as a means to professional advancement. Throughout your life, you will need to make judgments based on data. This class is designed to help you do that. If you ever wanted to answer questions like “Are minorities being treated fairly?” or “How well do spam filters work?” then this class will help. Tools from statistics can help you rule out competing theories and judge the strength of relationships. It is not just about “the numbers” but rather thinking clearly about what data do and do not imply. Critical thinking about data is part of good citizenship in a modern society— it was a pre-requisite for following some of the controversies after the last U.S. presidential voting and the last U.S. census. (Hopefully, we won’t have quite as much material from this year’s voting process, but we’ll be ready if there is!)
Skills from this course will also guide you in many professional tasks. Here are some of the professional tasks that information professionals perform where they can make use of statistical analysis:
· Appraising and selecting documents for archives
· Evaluating user interface alternatives
· Redesigning web sites based on usage history
· Assessing demand for potential new product or service offerings
· Estimating the cost of providing a service
· Evaluating the outcomes of programs and services
· Evaluating the effectiveness of products and government policies
· Collecting, summarizing, and interpreting trend data about organizations in a sector or industry
· Assessing an organization’s compliance with internal or government-imposed policies
· Presenting government statistical data to lay audiences
· Conducting academic research
Required:
· Statistics (2003) by McClave & Sincich (9th ed). Prentice Hall
Recommended:
· Student solutions manual (for odd-numbered exercises)
· Guide to exercises using Excel
Supplemental
· Statistics – Concepts & Controversies (2000) by David Moore (5th ed). W.H. Freeman & Co.
We’ll try to do as little calculation as possible by hand, though you might have to dust off your multiplication tables for the first couple weeks, to get the feel of things.
Excel is ubiquitous and it or some equivalent will probably be available throughout the rest of your career. It’s available in the DIAD Lab. Everything we cover in this course can be done with Excel.
That said, if you’re doing something anything even a little complicated, it’s a lot better to do your analysis using a statistical package that lets you write text files that you can edit, debug, and insert comments in. Perhaps most importantly, such text files can be re-run when you get an updated data set (often, through initial statistical analysis, you discover some error in the input data, which leads you to get a revised dataset.)
Personally, I use a program called stata, which is excellent for anyone who has any programming experience. For $39, you can get a 1-year license to “Small Stata” and a copy of the “Getting Started” manual. See http://www.stata.com/order/new/edu/gradplans/gp3-order.html.
SPSS is another statistical package, often preferred by psychology and sociology researchers (stata is preferred by economists). SPSS may be available in the DIAD Lab. SPSS has a lot more menu-driven options than stata, and also has a command syntax, though it’s a bit unwieldy and not well documented. It does not have built-in functions for some of the more sophisticated statistical models that stata does, and it is not extensible with user programming as stata is. We will not be covering any statistical methods that are beyond the limits of SPSS’ capabilities.
A variety of other packages for statistical analysis are also available. I have research collaborators who swear by SAS and JMP, but I’ve never used them. The textbook provides examples using MINITAB, which must have been (or still be) popular with some group of researchers or educators. The textbook also provides examples using a TI graphing calculator—I don’t know why anyone would use one if they had access to a laptop or desktop computer running Excel or a statistical application.
Exercises from the textbook will be assigned for each class session. Generally, these will be odd-numbered exercises, for which answers are available in the student manual. These will not be graded. However, you are expected to work these problems before class and I may call on students in class to provide and explain their answers to these exercises; the class preparation and participation grade will be affected.
· Class preparation and participation 10%
· 3 problem sets 30%
· Midterm exam 20%
· final exam 40%
You are encouraged to notice and share with the class good and bad examples and statistical issues from the popular press. Discussion is enabled on the CourseTools site and you are encouraged to use it.
|
# |
Date |
Topics/readings |
Exercises to complete before class |
Assignments due |
|
|
September |
|
|
|
|
|
7 |
1.1 The Science of Statistics 1.2 Types of Statistical Applications 1.3 Fundamental Elements of Statistics 1.4 Types of Data 1.5 Collecting Data 1.6 The Role of Statistics in Critical Thinking |
Chapter 1 1-11, 13 |
|
|
|
9 |
2.1 Describing Qualitative Data 2.2 Graphical Methods for Describing Quantitative Data 2.3 Summation Notation 2.4 Numerical Measures of Central Tendency |
Chapter 2 1, 2, 9, 19, 33, 41a, 43, 47, 50, 53 Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 25 |
|
|
|
14 |
2.5 Numerical Measures of Variability 2.6 Interpreting the Standard Deviation 2.7 Numerical Measures of Relative Standing 2.8 Methods for Detecting Outliers 2.9 Graphic Bivariate Relationships 2.10 Distorting the Truth with Descriptive Techniques |
Chapter 2 54, 55a, 57a, 59, 60, 61, 65, 66, 67, 71, 75, 81, 82 abef, 84, 93, 96, 97, 103, 106, 109, 111, 140 |
|
|
|
16 |
|
NO CLASS—Jewish Religious Holiday |
|
|
|
21 |
3.1 Events, Sample Spaces and Probability 3.2 Unions and Intersections 3.3 Complementary Events 3.4 The Additive Rule and Mutually Exclusive Events 3.5 Conditional Probability |
Chapter 3 1a, 5, 8, 9, 11, 21, 22, 29, 33, 41, 44, 45, 55ad |
|
|
|
23 |
3.6 The Multiplicative Rule and Independent Events 3.7 Random Sampling 3.8 Some Counting Rules |
Chapter 3 59, 69, 75, 76a, 77a, 79c (Use Excel or stata for this), 85a, 87b, 99, Game Show Strategy (p. 165) |
|
|
|
28 |
4.1 Two Types of Random Variables 4.2 Probability Distributions for Discrete Random Variables 4.3 Expected Values of Discrete Random Variables 4.4 The Binomial Random Variable |
Chapter 4 3, 7, 11, 25, 33ab, 37bc, 39ab, 43, 51, 55, 109 |
|
|
|
30 |
5.1 Continuous Probability Distributions 5.2 The Uniform Distribution 5.3 The Normal Distribution 5.4 Descriptive Methods for Assessing Normality
|
Chapter 5 1, 3abcf, 4a, 7, 15abc, 17abcf, 19ab, 21a, 23a, 25ad, 30, 41 45, 46, 47ab, 53 |
|
|
7 |
October |
|
|
|
|
|
5 |
6.1 What is a Sampling Distribution? 6.2 Properties of Sampling Distributions: Unbiasedness and Minimum Variance |
Chapter 6 1, 7, 35abc, 11 |
|
|
|
7 |
6.3 The Central Limit Theorem 5.5 Approximating a Binomial Distribution with a Normal Distribution |
Chapter 6 15abc, 18, 21, 31, 34ab, 35d Chapter 5 55, 67 |
|
|
|
12 |
7.1 Large-Sample Confidence Interval for a Population Mean 7.2 Small-Sample Confidence Interval for a Population Mean
|
Chapter 7 1, 3, 7, 9, 13, 19, 23ab, 25bd, 31, 69, 81 |
Problem Set 1 due |
|
|
14 |
7.3 Large-Sample Confidence Interval for a Population Proportion 7.4 Determining the Sample Size |
Chapter 7 35, 40, 45, 51, 57, 61 (also, for 61, what sample size is needed to estimate within .02 with 95% confidence?) |
Problem Set 1 returned |
|
|
19 |
|
|
|
|
|
21 |
8.1 The Elements of a Test of Hypothesis 8.2 Large-Sample Test of Hypothesis About a Population Mean 8.3 Observed Significance Levels: p-Values |
Chapter 8 1, 5, 7, 15, 17abg, 19, 29, 30, 31, 37, 39 |
|
|
|
26 |
8.4 Small-Sample Test of Hypothesis About a Population Mean 8.5 Large-Sample Test of Hypothesis About a Population Proportion |
Chapter 8 45,
49ab, 51, 55, 61ab, 66, 69, 75, 123 (would it be more appropriate to do a
hypothesis test or a confidence interval?); |
|
|
|
27 |
|
|
MIDTERM EXAM, 1-3PM |
|
|
28 |
9.1 Comparing Two Population Means: Independent Sampling 9.2 Comparing Two Population Means: Paired Difference Experiments 9.3 Comparing Two Population Proportions: Independent Sampling 9.4 Determining the Sample Size |
Chapter 9 1, 5, 11, 25, 27ab, 29, 35, 38, 43, 50, 63, 67 |
|
|
8 |
November |
|
|
|
|
|
2 |
10.1 Elements of a Designed Experiment 10.2 The Completely Randomized Design 10.3 Multiple Comparisons of Means
|
Chapter 10 5, 9, 11, 13, 14, 15, 16, 19, 22 26, 31, 34 |
|
|
|
4 |
10.4 The Randomized Block Design 10.5 Factorial Experiments |
Chapter 10 41, 47, 51, 53, 55, 61, 63, 67, 84 |
Problem Set 2 distributed |
| 9 |
Guest Lecture: Simulations and Applications
Yan Chen |
|||
|
11 |
13.1 Categorical Data and the Multinomial Experiment 13.2 Testing Categorical Probabilities: One-Way Table 13.3 Testing Categorical Probabilities: Two-Way (Contingency) Table 13.4 A Word of Caution About Chi-Square Tests |
Chapter 13
3, 5, 6, 11, 18, 19, 25 |
||
|
|
16 |
11.1 Probabilistic Models 11.2 Fitting the Model: The Least Squares Approach 11.3 Model Assumptions |
Chapter 11 1ac, 2ac, 5ac, 9, 19, 23 |
|
|
|
18 |
11.4 An Estimator of ...sigma^2 11.5 Assessing the Utility of the Model: Making Inferences About the slope...b1 11.6 The Coefficient of Correlation |
Chapter 11 27, 33, 35, 45, 47 |
Problem Set 2 due |
|
|
23 |
11.7 The Coefficient of Determination 11.8 Using the Model for Estimation and Prediction 11.9 A Complete Example |
49, 51bc, 53, 56, 63, 64, 70 |
|
|
|
25 |
|
No Class-- Thanksgiving |
|
|
|
30 |
12.1 Multiple Regression Models 12.2 The First-Order Model: Estimating and Interpreting the ... b Parameters 12.3 Model Assumptions 12.4 Inferences About the Individual ... b Parameters 12.5 Checking the Overall Utility of a Model 12.6 Using the Model for Estimation and Prediction |
Chapter 12 3, 5, 7abc, 17, 18, 28, 29, 31 |
Problem Set 2 returned |
|
4 |
December |
|
|
|
|
|
2 |
12.7 Model Building: Interaction Models 12.8 Model Building: Quadratic and Other Higher-Order Models |
38, 41, 43a, 49, 51, 53 |
|
|
|
7 |
12.9
Model Building: Qualitative (Dummy) Variable Models
12.10 Model Building: Models with Both Quantitative and Qualitative Variables 12.11 Model Building: Comparing Nested Models |
65, 77, 79, 87, 89, 90 |
|
|
|
9 |
12.13 Residual Analysis: Checking the Regression Assumptions 12.14 Some Pitfalls: Estimatibility, Multicollinearity, and Extrapolation Statistics in Action: "Wringing the Bell Curve" (p. 691) |
102, 105, 107, 109 |
|
|
|
14 |
14.1 Introduction: Distribution-Free Tests 14.2 Single Population Inferences: The Sign Test 14.4 Comparing Two Populations: The Wilcoxon Signed Rank Test for the Paired Difference Experiment |
|
Turn in Problem Set 3 if you'd like it to be graded by Thursday |
|
|
16 |
|
Review session, regular class time |
Problem Set 3 due; solution set distributed |
| 17 | Alternate Final Exam, 12:30-2:30, 409 West Hall | |||
|
|
21 |
|
Final Exam 4PM-6PM, 409 West Hall |
|