SI 544 – Introductory Statistics & Data Analysis
Right Click here to open this page in a new browser window instead of in the CTools frame
Office hours: Mondays 4-5PM, 3246C SI North. You can also sign up to go to lunch somewhere near West Hall most Tuesdays at noon. Signups are at this wiki page.
email: presnick@umich.edu
GSI: Jahna Otterbacher
Office hours: Mondays 7-9 PM in the DIAD and Wednesdays 4:30-5:30 in 3204 SI North .
email: jahna@umich.edu
Administrative Assistant: Sharon Mahoney 764-1858, mahoneys@umich.edu
This course teaches the fundamentals of statistics, that is the ability to describe data samples and draw inferences about the populations from which they were drawn. It should also sharpen individual intuition about how to read data, interpret data, and judge others’ claims about data.
Specifically, at the end of this course students should be able to use computer-based statistics packages to:
This course should be useful to a wide variety of students both as preparation for more advanced courses and as a means to professional advancement. Throughout your life, you will need to make judgments based on data. This class is designed to help you do that. If you ever wanted to answer questions like “Are minorities being treated fairly?” or “How well do spam filters work?” then this class will help. Tools from statistics can help you rule out competing theories and judge the strength of relationships. It is not just about “the numbers” but rather thinking clearly about what data do and do not imply. Critical thinking about data is part of good citizenship in a modern society— for example, it was a pre-requisite for following some of the controversies during the 2000 U.S. presidential voting process and determining population counts after the most recent U.S. census.
Skills from this course will also guide you in many professional tasks. Here are some of the professional tasks that information professionals perform where they can make use of statistical analysis:
· Setting policies for sampling documents to save in archives
· Evaluating user interface alternatives
· Redesigning web sites based on usage history
· Assessing demand for potential new product or service offerings
· Estimating the cost of providing a service
· Evaluating the outcomes of programs and services
· Evaluating the effectiveness of products and government policies
· Collecting, summarizing, and interpreting trend data about organizations in a sector or industry
· Assessing an organization’s compliance with internal or government-imposed policies
· Presenting government statistical data to lay audiences
· Conducting academic research
Required:
· Statistics (2006) by McClave & Sincich (10th ed). Prentice Hall. (Note: For those who may wish to use the 9th edition that was used last year in this course, we have included the respective chapter/section and exercise numbers on the syllabus, to the extent possible.)
Recommended:
· Student solutions manual (for odd-numbered exercises). Comes packaged with the 10th edition textbook.
We’ll try to do as little calculation as possible by hand, though you might have to dust off your multiplication tables for the first couple weeks, to get the feel of things.
We will be using Stata in this course. While many of the exercises can be done in a spreadsheet such as Excel, if you’re doing something even a little complicated, it’s a lot better to do your analysis using a statistical package. Packages such as Stata let you write text files that you can edit, debug, and insert comments in. Perhaps most importantly, such text files can be re-run when you get an updated data set (often, through initial statistical analysis, you discover some error in the input data, which leads you to get a revised dataset.)
Intercooled Stata is available to you in the DIAD (as a NAL object). For those who may be interested in obtaining a personal copy, for $39, you can get a 1-year license to “Small Stata” and a copy of the “Getting Started” manual. See http://www.stata.com/order/new/edu/gradplans/gp3-order.html. However, "small data" may not be sufficient to handle the size of datasets we use in some of the exercises and problem sets. Intercooled Stata is also available from the same site, but at a somewhat higher price.
SPSS is another statistical package, often preferred by psychology and sociology researchers (Stata is preferred by economists). SPSS has a lot more menu-driven options than stata, and also has a command-line syntax, though it’s a bit unwieldy and not well documented. It does not have built-in functions for some of the more sophisticated statistical models that stata does, and it is not extensible with user programming as stata is. We will not be covering any statistical methods that are beyond the limits of SPSS’ capabilities. The instructors will not be using any SPSS examples in the course; you are welcome to use it if you prefer it over stata, but we are not providing support for it.
A variety of other packages for statistical analysis are also available. I have research collaborators who swear by SAS and JMP, but I’ve never used them. The textbook provides examples using MINITAB, which must have been (or still be) popular with some group of researchers or educators. The textbook also provides examples using a TI graphing calculator—I don’t know why anyone would use one if they had access to a laptop or desktop computer running Excel or a statistical application.
Exercises from the textbook will be assigned for each class session. Generally, these will be odd-numbered exercises, for which answers are available in the student manual. These will not be graded. However, you are expected to work these problems before class and I may call on students in class to provide and explain their answers to these exercises; the class preparation and participation grade will be affected.
· Class preparation and participation 8%
· 7 problem sets (best 6 scores count) 42%
· Midterm exam 15%
· Final exam 35%
You are encouraged to notice and share with the class good and bad examples of statistics use, and statistical issues that you find outside of class. Discussion is enabled on the CTools site and you are encouraged to use it.
These links were compiled in 2004 by student Jude Yew.
·
Hyperstat - online site for
the statistically clueless
http://davidmlane.com/hyperstat/index.html
·
Nice general introduction to quantitative methods with
interesting case studies and links.
http://glass.ed.asu.edu/stats/
·
The algebra of summation notation:
http://www.math.ucdavis.edu/~kouba/CalcTwoDIRECTORY/summationdirectory/Summation.html
(solutions:
http://www.math.ucdavis.edu/~kouba/CalcTwoDIRECTORY/summationsoldirectory/SummationSol.html)
·
Explanation of the summation sign:
http://www.psychstat.smsu.edu/introbook/sbk12m.htm
·
Webpage which explains the basics of summation (sigma)
notation with links to Computer algebra systems like Maple,
Mathematica, and the TI-92 by Frank
Wattenberg, Department of Mathematics, Montana State University
http://www.math.montana.edu/frankw/ccp/general/sigma/learn.htm
·
Clear explanation of product & summation notations by
Illinois State University
http://www.math.ilstu.edu/day/courses/old/305/contentsummationnotation.html#sums
·
Wikipedia's very comprehensive
explanation of the factorial notation
http://en.wikipedia.org/wiki/Factorial
Instances of plagiarism and other violations of academic integrity will be taken seriously. It is expected that students will turn in their own work (e.g. both answers to questions and code) on class assignments and exams. You are strongly encouraged to form study groups to work on the ungraded exercises assigned for each class-- working together in that situation would not count as plagiarism.
If you have any special needs, please let the instructors know early on so that your needs may be accommodated.
| # | Date | Topics/readings | Exercises to complete before class |
Exercises to complete before class (Edition 9) |
Assignments due |
| September |
|
|
|
|
|
| 1 | 6 |
1.1 The Science of Statistics 1.2 Types of Statistical Applications 1.3 Fundamental Elements of Statistics 1.4 Types of Data 1.5 Collecting Data 1.6 The Role of Statistics in Critical Thinking
In-class exercise: 1.25 |
Chapter 1 1-11, 13 |
Chapter 1 1-11, 13
|
|
| 2 |
8 (Class meets in the DIAD) |
2.1 Describing Qualitative Data 2.2 Graphical Methods for Describing Quantitative Data
In-class exercises: 2.55a, 2.62, 2.64, 2.70, 2.74a, 2.77, 2.78, 2.79, 2.87, 2.117c |
Chapter 2 4, 5, 15, Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 37
|
Chapter 2 1, 2, 9; Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 25
|
Problem Set 1 (Stata) distributed |
| 3 |
13 (Class meets in the DIAD) |
2.3 Summation Notation 2.4 Numerical Measures of Central Tendency 2.5 Numerical Measures of Variability |
43, 55a, 51, 53, 62, 64abcd, 70, 74a, 75a, 77, 78, 79, 84 |
33, 41a, 42,43, 47, 50abcd |
|
| 4 | 15 |
2.6 Interpreting the Standard Deviation 2.7 Numerical Measures of Relative Standing 2.8 Methods for Detecting Outliers 2.9 Graphing Bivariate Relationships 2.10 Distorting the Truth with Descriptive Techniques |
Chapter 2 86, 87, 91, 106 abef, 104, 117, 124, 125, 134, 139, 141, 175, 177, 184 |
Chapter 2 54, 55a, 57a, 59, 60, 61, 65, 66, 67, 71, 75, 81, 82 abef, 84, 93, 96, 97, 103, 106, 109, 111, 140 |
|
| 5 | 20 |
3.1 Events, Sample Spaces and Probability 3.2 Unions and Intersections 3.3 Complementary Events 3.4 The Additive Rule and Mutually Exclusive Events 3.5 Conditional Probability
|
Chapter 3 9a, 10, 15, 16, 17, 29, 30, 45, 49, 63, 71, 73, 89
|
Chapter 3 1a, 5, 8, 9, 11, 21, 22, 29, 33, 41, 44, 45, 55ad |
Problem Set 1 due; |
| 6 | 22 |
3.6 The Multiplicative Rule and Independent Events 3.7 Random Sampling |
Chapter 3 69, 85, 177, 96a, 97a, 99c (Use Stata for this) |
Chapter 3 59, 69, 75, 76a, 77a, 79c (Use Stata for this) |
|
| 7 | 27 |
4.1 Two Types of Random Variables 4.2 Probability Distributions for Discrete Random Variables 4.3 Expected Values of Discrete Random Variables 4.4 The Binomial Random Variable |
Chapter 4 3, 11, 15, 33, 43ab, 49bc, 51ab, 57, 63, 130 |
Chapter 4 3, 7, 11, 25, 33ab, 37bc, 39ab, 43, 51, 55, 109 |
Problem Set 1 returned |
| 8 |
29 |
5.1 Continuous Probability Distributions 5.2 The Uniform Distribution 5.3 The Normal Distribution 5.4 Descriptive Methods for Assessing Normality
|
Chapter 5 3, 5abcf, 6a, 117, 20abc, 23abcf, 25ab, 24a, 29a, 30ad, 36, 47, 53, 56, 57abcd, 65 |
Chapter 5 1, 3abcf, 4a, 7, 15abc, 17abcf, 19ab, 21a, 23a, 25ad, 30, 41 45, 46, 47ab, 53 |
|
| October |
|
|
|
|
|
| 9 |
4 (Paul away, Jahna lectures) |
6.1 The Concept of a Sampling Distribution 6.2 Properties of Sampling Distributions: Unbiasedness and Minimum Variance |
Chapter 6 3, 9, 49abc, 17 |
Chapter 6 1, 7, 35abc, 11 |
Problem Set 2 due; Problem Set 3 distributed |
| 10 | 6 |
6.3 Central Limit Theorem 5.5 Approximating a Binomial Distribution with a Normal Distribution |
Chapter 6 27abc, 26, 31, 41, 48, 49d Chapter 5 69, 131 |
Chapter 6 15abc, 18, 21, 31, 34ab, 35d Chapter 5 55, 67 |
|
| 11 |
11 |
We're one session behind here, so just catching up |
Problem Set 2 returned |
||
| 12 |
13 (Paul away, Jahna lectures) |
7.1 Identifying the Target Parameter 7.2 Large-Sample Confidence Interval for a Population Mean 7.3 Small-Sample Confidence Interval for a Population Mean
|
Chapter 7 7, 9a, 3, 5, 19, 23, 29ab, 31bd, 37, 83, 99 |
Chapter 7 1, 3, 7, 9, 13, 19, 23ab, 25bd, 31, 69, 81 |
|
|
18 |
UM Study Break |
|
|
|
|
| 13 | 20 |
7.4 Large-Sample Confidence Interval for a Population Proportion 7.5 Determining the Sample Size |
Chapter 7 42, 48, 53, 63, 69, 75 (also, for 75, what sample size is needed to estimate within .02 with 95% confidence?) |
Chapter 7 35, 40, 45, 51, 57, 61 (also, for 61, what sample size is needed to estimate within .02 with 95% confidence?) |
|
| 14 | 25 |
8.1 The Elements of a Test of Hypothesis 8.2 Large-Sample Test of Hypothesis About a Population Mean 8.3 Observed Significance Levels: p-Values |
Chapter 8 1, 5, 7, 147, 21abg, 23, 35, 38, 39, 45, 47 |
Chapter 8 1, 5, 7, 15, 17abg, 19, 29, 30, 31, 37, 39 |
Problem Set 3 due; Problem Set 4 distributed |
| 27 |
8.4 Small-Sample Test of Hypothesis About a Population Mean 8.5 Large-Sample Test of Hypothesis About a Population Proportion |
Chapter 8 53, 57ab, 59, 137, 73ab, 130, 151, 145 (would it be more appropriate to do a hypothesis test or a confidence interval?); 148, estimate the probability that a #1 seed will lose to a #16 seed. (Hint: consider the margin of victory data, not just won-loss record. Make whatever heroic assumptions you need, but be clear about what they are.) |
Chapter 8 45, 49ab, 51, 55, 61ab, 66, 69, 75, 123
(would it be more appropriate to do a hypothesis test or a
confidence interval?); |
Problem Set 3 returned |
|
| November |
|
|
|
|
|
| 15 | 1 | MIDTERM EXAM (covers chapters 1-8; open book and open notes; calculators OK; but no computers) |
|
||
| 16 | 3 |
9.1 Identifying the Target Parameter 9.2 Comparing Two Population Means: Independent Sampling 9.3 Comparing Two Population Means: Paired Difference Experiments 9.4 Comparing Two Population Proportions: Independent Sampling 9.5 Determining the Sample Size |
Chapter 9 6, 3, 15, 29, 33ab, 35, 43, 122, 51, 58, 75, 79 |
Chapter 9 1, 5, 11, 25, 27ab, 29, 35, 38, 43, 50, 63, 67 |
Midterm exam returned |
| 17 | 8 |
10.1 Elements of a Designed Experiment 10.2 The Completely Randomized Design 10.3 Multiple Comparisons of Means
|
Chapter 10 5, 11, 17, 19, 20, 21, 22, 24, 28, 34, 37, 38, 46 |
Chapter 10 5, 9, 11, 13, 14, 15, 16, 19, 22, 26, 31, 34 |
Problem Set 4 due; Problem Set 5 distributed
|
|
18 |
10 |
10.4 The Randomized Block Design 10.5 Factorial Experiments |
Chapter 10 57, 65, 110, 73, 75, 83, 85, 114, 108 |
Chapter 10 41, 47, 51, 53, 55, 61, 63, 67, 84 |
|
|
19 |
15 |
13.1 Categorical Data and the Multinomial Experiment 13.2 Testing Categorical Probabilities: One-Way Table 13.3 Testing Categorical Probabilities: Two-Way (Contingency) Table 13.4 A Word of Caution About Chi-Square Tests |
Chapter 13 5, 2, 6, 11, 22, 23, 31, 49 |
Chapter 13 3, 5, 6, 11, 18, 19, 25 |
Problem Set 5 due; |
| 20 |
17 |
11.1 Probabilistic Models 11.2 Fitting the Model: The Least Squares Approach 11.3 Model Assumptions |
Chapter 11
5ac, 6ac, 9ac, 4, 26, 28 |
Chapter 11 1ac, 2ac, 5ac, 9, 19, 23 |
|
| 21 | 22 |
11.4 An Estimator of ...sigma^2 11.5 Assessing the Utility of the Model: Making Inferences About the slope...b1 11.6 The Coefficient of Correlation |
Chapter 11 32, 42, 48, 59, 119 |
Chapter 11 27, 33, 35, 45, 47 |
|
| 24 |
|
No Class-- Thanksgiving |
|
|
|
| 22 | 29 |
11.7 The Coefficient of Determination 11.8 Using the Model for Estimation and Prediction 11.9 A Complete Example |
Chapter 11 65, 68bc, 107, 118, 87, 88, 92
|
Chapter 11 49, 51bc, 53, 56, 63, 64, 70 |
Problem Set 6 due; Problem Set 7 distributed; Problem Set 5 returned; |
| December |
|
|
|
|
|
| 23 |
1 |
12.1 Multiple Regression Models 12.2 The First-Order Model: Estimating and Interpreting the ... b Parameters 12.3 Inferences About the Individual...b Parameters and the Overall Utility of a Model 12.4 Using the Model for Estimation and Prediction |
Chapter 12 7, 9, 13abc, 11, 12, 26, 143, 30 |
Chapter 12 3, 5, 7abc, 17, 18, 28, 29, 31 |
Problem Set 6 returned |
| 24 | 6 |
12.5 Model Building: Interaction Models 12.6 Model Building: Quadratic and Other Higher-Order Models |
Chapter 12 38, 41, 43a, 51, 53, 55 |
Chapter 12 38, 41, 43a, 49, 51, 53 |
|
| 25 | 8 |
12.7 Model Building: Qualitative (Dummy) Variable Models 12.8 Model Building: Models with Both Quantitative and Qualitative Variables 12.9 Model Building: Comparing Nested Models |
Chapter 12 67, 80, 79, 89, 93, 90 |
Chapter 12 65, 77, 79, 87, 89, 90 |
|
| 26 | 13 |
12.11 Residual Analysis: Checking the Regression Assumptions 12.12 Some Pitfalls: Estimatibility, Multicollinearity, and Extrapolation
14.1 Introduction: Distribution-Free Tests 14.2 Single Population Inferences: The Sign Test 14.4 Comparing Two Populations: The Wilcoxon Signed Rank Test for the Paired Difference Experiment |
Chapter 12 116, 117, 120, 153
Kraut et al article, see CTools site for PDF, in the "Resources" section |
Chapter 12 102, 105, 107, 109 |
|
| 15 |
|
Problem Set 7 due |
|||
| 19 |
|
Review session 3-5PM, 311 West Hall |
|
Problem Set 7 returned |
|
| 21 |
|
Final Exam 10:30AM-12:30PM, 311 West Hall |
|
|