SI 644 – Introductory Statistics & Data Anslysis

 

 

 

Contact Information

Professor: Paul Resnick

Office hours: Thursdays 1-3PM, DIAD Lab and/or 314 West Hall (usually, I will wander through the DIAD at 1PM or shortly thereafter. Once I’ve talked to anyone there who wants to talk to me, I’ll retreat to 314 West Hall, where people can drop in. If you’d like to schedule a time, send email.

email: presnick@umich.edu

 

Administrative Assistant: Sharon Mahoney 764-1858, mahoneys@umich.edu

 

Course Objectives                                                                                   

This course teaches the fundamentals of statistics, that is the ability to describe data samples and draw inferences about the populations from which they were drawn.  It should also sharpen individual intuition about how to read data, interpret data, and judge others’ claims about data. 

 

Specifically, at the end of this course students should be able to:

 

Why you should take this course

This course should be useful to a wide variety of students both as preparation for more advanced courses and as a means to professional advancement. Throughout your life, you will need to make judgments based on data.  This class is designed to help you do that. If you ever wanted to answer questions like “Are minorities being treated fairly?” or “How well do spam filters work?” then this class will help.  Tools from statistics can help you rule out competing theories and judge the strength of relationships.  It is not just about “the numbers” but rather thinking clearly about what data do and do not imply. Critical thinking about data is part of good citizenship in a modern society— it was a pre-requisite for following some of the controversies after the last U.S. presidential voting and the last U.S. census. (Hopefully, we won’t have quite as much material from this year’s voting process, but we’ll be ready if there is!)

Skills from this course will also guide you in many professional tasks. Here are some of the professional tasks that information professionals perform where they can make use of statistical analysis:

·        Appraising and selecting documents for archives

·        Evaluating user interface alternatives

·        Redesigning web sites based on usage history

·        Assessing demand for potential new product or service offerings

·        Estimating the cost of providing a service

·        Evaluating the outcomes of programs and services

·        Evaluating the effectiveness of products and government policies

·        Collecting, summarizing, and interpreting trend data about organizations in a sector or industry

·        Assessing an organization’s compliance with internal or government-imposed policies

·        Presenting government statistical data to lay audiences

·        Conducting academic research

Texts (all available on reserve at the library)

Required:

·        Statistics (2003) by McClave & Sincich (9th ed). Prentice Hall

Recommended:

·        Student solutions manual (for odd-numbered exercises)

·        Guide to exercises using Excel

Supplemental

·        Statistics – Concepts & Controversies (2000) by David Moore (5th ed). W.H. Freeman & Co.

Software

We’ll try to do as little calculation as possible by hand, though you might have to dust off your multiplication tables for the first couple weeks, to get the feel of things.

Excel is ubiquitous and it or some equivalent will probably be available throughout the rest of your career. It’s available in the DIAD Lab.  Everything we cover in this course can be done with Excel.

That said, if you’re doing something anything even a little complicated, it’s a lot better to do your analysis using a statistical package that lets you write text files that you can edit, debug, and insert comments in. Perhaps most importantly, such text files can be re-run when you get an updated data set (often, through initial statistical analysis, you discover some error in the input data, which leads you to get a revised dataset.)

Personally, I use a program called stata, which is excellent for anyone who has any programming experience. For $39, you can get a 1-year license to “Small Stata” and a copy of the “Getting Started” manual. See http://www.stata.com/order/new/edu/gradplans/gp3-order.html.

SPSS is another statistical package, often preferred by psychology and sociology researchers (stata is preferred by economists). SPSS may be available in the DIAD Lab. SPSS has a lot more menu-driven options than stata, and also has a command syntax, though it’s a bit unwieldy and not well documented. It does not have built-in functions for some of the more sophisticated statistical models that stata does, and it is not extensible with user programming as stata is. We will not be covering any statistical methods that are beyond the limits of SPSS’ capabilities.

A variety of other packages for statistical analysis are also available. I have research collaborators who swear by SAS and JMP, but I’ve never used them. The textbook provides examples using MINITAB, which must have been (or still be) popular with some group of researchers or educators. The textbook also provides examples using a TI graphing calculator—I don’t know why anyone would use one if they had access to a laptop or desktop computer running Excel or a statistical application.

Assignments & Grading

Exercises from the textbook will be assigned for each class session. Generally, these will be odd-numbered exercises, for which answers are available in the student manual. These will not be graded. However, you are expected to work these problems before class and I may call on students in class to provide and explain their answers to these exercises; the class preparation and participation grade will be affected.

·        Class preparation and participation        10%

·        3 problem sets                                      30%

·        Midterm exam                                      20%

·        final exam                                             40%

You are encouraged to notice and share with the class good and bad examples and statistical issues from the popular press.  Discussion is enabled on the CourseTools site and you are encouraged to use it.

Schedule

 

#

Date

Topics/readings

Exercises to complete before class

Assignments due

 

September

 

 

 

 

7

Chapter 1 slides

1.1 The Science of Statistics

1.2 Types of Statistical Applications

1.3 Fundamental Elements of Statistics

1.4 Types of Data

1.5 Collecting Data

1.6 The Role of Statistics in Critical Thinking

Chapter 1

1-11, 13

 

 

9

Chapter 2 slides

2.1 Describing Qualitative Data

2.2 Graphical Methods for Describing Quantitative Data

2.3 Summation Notation

2.4 Numerical Measures of Central Tendency

Chapter 2

1, 2, 9, 19, 33, 41a, 43, 47, 50, 53

Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 25

 

 

14

2.5 Numerical Measures of Variability

2.6 Interpreting the Standard Deviation

2.7 Numerical Measures of Relative Standing

2.8 Methods for Detecting Outliers

2.9 Graphic Bivariate Relationships

2.10 Distorting the Truth with Descriptive Techniques

Chapter 2

54, 55a, 57a, 59, 60, 61, 65, 66, 67, 71, 75, 81, 82 abef, 84, 93, 96, 97, 103, 106, 109, 111, 140

 

 

16

 

NO CLASS—Jewish Religious Holiday

 

 

21

Chapter 3 slides

3.1 Events, Sample Spaces and Probability

3.2 Unions and Intersections

3.3 Complementary Events

3.4 The Additive Rule and Mutually Exclusive Events

3.5 Conditional Probability

Chapter 3

1a, 5, 8, 9, 11, 21, 22, 29, 33, 41, 44, 45, 55ad

 

 

23

3.6 The Multiplicative Rule and Independent Events

3.7 Random Sampling

3.8 Some Counting Rules

Chapter 3

59, 69, 75, 76a, 77a, 79c (Use Excel or stata for this), 85a, 87b, 99, Game Show Strategy (p. 165)

 

 

28

Chapter 4 slides

4.1 Two Types of Random Variables

4.2 Probability Distributions for Discrete Random Variables

4.3 Expected Values of Discrete Random Variables

4.4 The Binomial Random Variable

Chapter 4

3, 7, 11, 25, 33ab, 37bc, 39ab, 43, 51, 55, 109

 

 

30

Chapter 5 slides

5.1 Continuous Probability Distributions

5.2 The Uniform Distribution

5.3 The Normal Distribution

5.4 Descriptive Methods for Assessing Normality

 

Chapter 5

1, 3abcf, 4a, 7, 15abc, 17abcf, 19ab, 21a, 23a, 25ad, 30, 41

45, 46, 47ab, 53

 

7

October

 

 

 

 

5

Chapter 6 slides

6.1 What is a Sampling Distribution?

6.2 Properties of Sampling Distributions: Unbiasedness and Minimum Variance

Chapter 6

1, 7, 35abc, 11

 

 

7

6.3 The Central Limit Theorem

5.5 Approximating a Binomial Distribution with a Normal Distribution

Chapter 6

15abc, 18, 21, 31, 34ab, 35d

Chapter 5

55, 67

 

 

12

Chapter 7 slides

7.1 Large-Sample Confidence Interval for a Population Mean

7.2 Small-Sample Confidence Interval for a Population Mean

 

Chapter 7

1, 3, 7, 9, 13, 19, 23ab, 25bd, 31, 69, 81

Problem Set 1 due

 

14

7.3 Large-Sample Confidence Interval for a Population Proportion

7.4 Determining the Sample Size

Chapter 7

35, 40, 45, 51, 57, 61 (also, for 61, what sample size is needed to estimate within .02 with 95% confidence?)

Problem Set 1 returned

 

19

 

 

 

 

21

Chapter 8 slides

8.1 The Elements of a Test of Hypothesis

8.2 Large-Sample Test of Hypothesis About a Population Mean

8.3 Observed Significance Levels: p-Values

Chapter 8

1, 5, 7, 15, 17abg, 19, 29, 30, 31, 37, 39

 

 

26

8.4 Small-Sample Test of Hypothesis About a Population Mean

8.5 Large-Sample Test of Hypothesis About a Population Proportion

Chapter 8

45, 49ab, 51, 55, 61ab, 66, 69, 75, 123 (would it be more appropriate to do a hypothesis test or a confidence interval?);
 p. 373, estimate the probability that a #1 seed will lose to a #16 seed. (Hint: consider the margin of victory data, not just won-loss record. Make whatever heroic assumptions you need, but be clear about what they are.)

 

 

27

 

 

MIDTERM EXAM, 1-3PM

 

28

Chapter 9 slides

9.1 Comparing Two Population Means: Independent Sampling

9.2 Comparing Two Population Means: Paired Difference Experiments

9.3 Comparing Two Population Proportions: Independent Sampling

9.4 Determining the Sample Size

Chapter 9

1, 5, 11, 25, 27ab, 29, 35, 38, 43, 50, 63, 67

 

8

November

 

 

 

 

2

Chapter 10 slides

10.1 Elements of a Designed Experiment

10.2 The Completely Randomized Design

10.3 Multiple Comparisons of Means

 

Chapter 10

5, 9, 11, 13, 14, 15, 16, 19, 22 26, 31, 34

 

 

4

Supplemental slides

10.4 The Randomized Block Design

10.5 Factorial Experiments

Chapter 10

41, 47, 51, 53, 55, 61, 63, 67, 84

Problem Set 2 distributed

  9 Guest Lecture: Simulations and Applications

Yan Chen

   
 

11

Chapter 13 slides

13.1 Categorical Data and the Multinomial Experiment

13.2 Testing Categorical Probabilities: One-Way Table

13.3 Testing Categorical Probabilities: Two-Way (Contingency) Table

13.4 A Word of Caution About Chi-Square Tests

Chapter 13

3, 5, 6, 11, 18, 19, 25
Siskel and Ebert on p. 730 bottom of page

 

 

16

Chapter 11 Slides

11.1 Probabilistic Models

11.2 Fitting the Model: The Least Squares Approach

11.3 Model Assumptions

Chapter 11

1ac, 2ac, 5ac, 9, 19, 23

 

 

18

11.4 An Estimator of ...sigma^2

11.5 Assessing the Utility of the Model: Making Inferences About the slope...b1

11.6 The Coefficient of Correlation

Chapter 11

27, 33, 35, 45, 47

Problem Set 2 due

 

23

11.7 The Coefficient of Determination

11.8 Using the Model for Estimation and Prediction

11.9 A Complete Example

49, 51bc, 53, 56, 63, 64, 70

 

 

25

 

No Class-- Thanksgiving

 

 

30

Chapter 12 slides

12.1 Multiple Regression Models

12.2 The First-Order Model: Estimating and Interpreting the ... b Parameters

12.3 Model Assumptions

12.4 Inferences About the Individual ... b Parameters

12.5 Checking the Overall Utility of a Model

12.6 Using the Model for Estimation and Prediction

Chapter 12

3, 5, 7abc, 17, 18, 28, 29, 31

Problem Set 2 returned

4

December

 

 

 

 

2

12.7 Model Building: Interaction Models

12.8 Model Building: Quadratic and Other Higher-Order Models

38, 41, 43a, 49, 51, 53

 

 

7

12.9 Model Building: Qualitative (Dummy) Variable Models

12.10 Model Building: Models with Both Quantitative and Qualitative Variables

12.11 Model Building: Comparing Nested Models

65, 77, 79, 87, 89, 90

 

 

9

12.13 Residual Analysis: Checking the Regression Assumptions

12.14 Some Pitfalls: Estimatibility, Multicollinearity, and Extrapolation

Statistics in Action: "Wringing the Bell Curve" (p. 691)

102, 105, 107, 109

 

 

14

Chapter 14

14.1 Introduction: Distribution-Free Tests

14.2 Single Population Inferences: The Sign Test

14.4 Comparing Two Populations: The Wilcoxon Signed Rank Test for the Paired Difference Experiment

 

Turn in Problem Set 3 if you'd like it to be graded by Thursday

 

16

 

Review session, regular class time

Problem Set 3 due; solution set distributed

  17   Alternate Final Exam, 12:30-2:30, 409 West Hall  

 

21

 

Final Exam 4PM-6PM, 409 West Hall