SI 544 – Introductory Statistics & Data Analysis

Right Click here to open this page in a new browser window instead of in the CTools frame

 

 

 

Contact Information

Professor: Paul Resnick

 

Office hours:  Mondays 4-5PM, 3246C SI North.  You can also sign up to go to lunch somewhere near West Hall most Tuesdays at noon.  Signups are at this wiki page.

 

email: presnick@umich.edu

 

GSI: Jahna Otterbacher

 

Office hours: Mondays 7-9 PM in the DIAD and Wednesdays 4:30-5:30 in 3204 SI North .  

 

email jahna@umich.edu

 

Administrative Assistant: Sharon Mahoney 764-1858, mahoneys@umich.edu

 

Course Objectives                                                                                   

This course teaches the fundamentals of statistics, that is the ability to describe data samples and draw inferences about the populations from which they were drawn.  It should also sharpen individual intuition about how to read data, interpret data, and judge others claims about data. 

 

Specifically, at the end of this course students should be able to use computer-based statistics packages to:

 

Why you should take this course

This course should be useful to a wide variety of students both as preparation for more advanced courses and as a means to professional advancement. Throughout your life, you will need to make judgments based on data.  This class is designed to help you do that. If you ever wanted to answer questions like Are minorities being treated fairly? or How well do spam filters work? then this class will help.  Tools from statistics can help you rule out competing theories and judge the strength of relationships.  It is not just about the numbers but rather thinking clearly about what data do and do not imply. Critical thinking about data is part of good citizenship in a modern society for example, it was a pre-requisite for following some of the controversies during the 2000 U.S. presidential voting process and determining population counts after the most recent U.S. census.

Skills from this course will also guide you in many professional tasks. Here are some of the professional tasks that information professionals perform where they can make use of statistical analysis:

·        Setting policies for sampling documents to save in archives

·        Evaluating user interface alternatives

·        Redesigning web sites based on usage history

·        Assessing demand for potential new product or service offerings

·        Estimating the cost of providing a service

·        Evaluating the outcomes of programs and services

·        Evaluating the effectiveness of products and government policies

·        Collecting, summarizing, and interpreting trend data about organizations in a sector or industry

·        Assessing an organizations compliance with internal or government-imposed policies

·        Presenting government statistical data to lay audiences

·        Conducting academic research

Texts (all available on reserve at the library)

Required:

·        Statistics (2006) by McClave & Sincich (10th ed). Prentice Hall.  (Note:  For those who may wish to use the 9th edition that was used last year in this course, we have included the respective chapter/section and exercise numbers on the syllabus, to the extent possible.)

Recommended:

·        Student solutions manual (for odd-numbered exercises). Comes packaged with the 10th edition textbook.

 

Software

Well try to do as little calculation as possible by hand, though you might have to dust off your multiplication tables for the first couple weeks, to get the feel of things.

We will be using Stata in this course.  While many of the exercises can be done in a spreadsheet such as Excel, if youre doing something even a little complicated, its a lot better to do your analysis using a statistical package.  Packages such as Stata let you write text files that you can edit, debug, and insert comments in. Perhaps most importantly, such text files can be re-run when you get an updated data set (often, through initial statistical analysis, you discover some error in the input data, which leads you to get a revised dataset.)

Intercooled Stata is available to you in the DIAD (as a NAL object).  For those who may be interested in obtaining a personal copy, for $39, you can get a 1-year license to Small Stata and a copy of the Getting Started manual. See http://www.stata.com/order/new/edu/gradplans/gp3-order.html. However, "small data" may not be sufficient to handle the size of datasets we use in some of the exercises and problem sets. Intercooled Stata is also available from the same site, but at a somewhat higher price.

SPSS is another statistical package, often preferred by psychology and sociology researchers (Stata is preferred by economists). SPSS has a lot more menu-driven options than stata, and also has a command-line syntax, though its a bit unwieldy and not well documented. It does not have built-in functions for some of the more sophisticated statistical models that stata does, and it is not extensible with user programming as stata is. We will not be covering any statistical methods that are beyond the limits of SPSS capabilities. The instructors will not be using any SPSS examples in the course; you are welcome to use it if you prefer it over stata, but we are not providing support for it.

A variety of other packages for statistical analysis are also available. I have research collaborators who swear by SAS and JMP, but Ive never used them. The textbook provides examples using MINITAB, which must have been (or still be) popular with some group of researchers or educators. The textbook also provides examples using a TI graphing calculatorI dont know why anyone would use one if they had access to a laptop or desktop computer running Excel or a statistical application.

Assignments & Grading

Exercises from the textbook will be assigned for each class session. Generally, these will be odd-numbered exercises, for which answers are available in the student manual. These will not be graded. However, you are expected to work these problems before class and I may call on students in class to provide and explain their answers to these exercises; the class preparation and participation grade will be affected.

·        Class preparation and participation 8%

·        7 problem sets (best 6 scores count) 42%

·        Midterm exam                                      15%

·        Final exam                                             35%

You are encouraged to notice and share with the class good and bad examples of statistics use, and statistical issues that you find outside of class.  Discussion is enabled on the CTools site and you are encouraged to use it.

 

Some Useful Math Refresher Links

These links were compiled in 2004 by student Jude Yew.

General introduction to statistics:

·         Hyperstat - online site for the statistically clueless
http://davidmlane.com/hyperstat/index.html

·         Nice general introduction to quantitative methods with interesting case studies and links.
http://glass.ed.asu.edu/stats/

Summation and product notation:

·         The algebra of summation notation:
http://www.math.ucdavis.edu/~kouba/CalcTwoDIRECTORY/summationdirectory/Summation.html
(solutions: http://www.math.ucdavis.edu/~kouba/CalcTwoDIRECTORY/summationsoldirectory/SummationSol.html)

·         Explanation of the summation sign:
http://www.psychstat.smsu.edu/introbook/sbk12m.htm

·         Webpage which explains the basics of summation (sigma) notation with links to Computer algebra systems like Maple, Mathematica, and the TI-92 by Frank Wattenberg, Department of Mathematics, Montana State University
http://www.math.montana.edu/frankw/ccp/general/sigma/learn.htm

·         Clear explanation of product & summation notations by Illinois State University
http://www.math.ilstu.edu/day/courses/old/305/contentsummationnotation.html#sums

Factorial notation:

·         Wikipedia's very comprehensive explanation of the factorial notation
http://en.wikipedia.org/wiki/Factorial

 

Academic Integrity and Special Needs

Instances of plagiarism and other violations of academic integrity will be taken seriously.  It is expected that students will turn in their own work (e.g. both answers to questions and code) on class assignments and exams. You are strongly encouraged to form study groups to work on the ungraded exercises assigned for each class-- working together in that situation would not count as plagiarism.

If you have any special needs, please let the instructors know early on so that your needs may be accommodated.

 

Schedule

# Date Topics/readings Exercises to complete before class

Exercises to complete before class (Edition 9)

Assignments due
  September

 

 

 

 

1 6

Chapter 1 slides

1.1 The Science of Statistics

1.2 Types of Statistical Applications

1.3 Fundamental Elements of Statistics

1.4 Types of Data

1.5 Collecting Data

1.6 The Role of Statistics in Critical Thinking

 

In-class exercise: 1.25

Chapter 1

1-11, 13

Chapter 1

1-11, 13

 

 

2 8

(Class meets in the DIAD)

Chapter 2 slides

2.1 Describing Qualitative Data

2.2 Graphical Methods for Describing Quantitative Data

 

In-class exercises: 2.55a, 2.62, 2.64, 2.70, 2.74a, 2.77, 2.78, 2.79, 2.87, 2.117c

Chapter 2

4, 5, 15, Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 37

 

Chapter 2

1, 2, 9;  Generate stem-and-leaf, frequency tables for measurement classes and histogram for the BrainPMI dataset from exercise 25

 

Problem Set 1 (Stata) distributed

3 13

(Class meets in the DIAD)

2.3 Summation Notation

2.4 Numerical Measures of Central Tendency

2.5 Numerical Measures of Variability

43, 55a, 51, 53, 62, 64abcd,  70, 74a, 75a, 77, 78, 79, 84

33, 41a, 42,43, 47, 50abcd

 

4 15

 

2.6 Interpreting the Standard Deviation

2.7 Numerical Measures of Relative Standing

2.8 Methods for Detecting Outliers

2.9 Graphing Bivariate Relationships

2.10 Distorting the Truth with Descriptive Techniques

Chapter 2

86, 87, 91, 106 abef, 104, 117, 124, 125, 134, 139, 141, 175, 177, 184

 Chapter 2

54, 55a, 57a, 59, 60, 61, 65, 66, 67, 71, 75, 81, 82 abef, 84, 93, 96, 97, 103, 106, 109, 111, 140

 

5 20

Chapter 3 slides

3.1 Events, Sample Spaces and Probability

3.2 Unions and Intersections

3.3 Complementary Events

3.4 The Additive Rule and Mutually Exclusive Events

3.5 Conditional Probability

 

Chapter 3

9a, 10, 15, 16, 17, 29, 30, 45, 49, 63, 71, 73, 89

 

 Chapter 3

1a, 5, 8, 9, 11, 21, 22, 29, 33, 41, 44, 45, 55ad

Problem Set 1 due;
Problem Set 2 distributed
 

6 22

3.6 The Multiplicative Rule and Independent Events

3.7 Random Sampling

Chapter 3

69, 85, 177, 96a, 97a, 99c (Use Stata for this)

 Chapter 3

59, 69, 75, 76a, 77a, 79c (Use Stata for this)

 

7 27

Chapter 4 slides

4.1 Two Types of Random Variables

4.2 Probability Distributions for Discrete Random Variables

4.3 Expected Values of Discrete Random Variables

4.4 The Binomial Random Variable

Chapter 4

3, 11, 15, 33, 43ab, 49bc, 51ab, 57, 63, 130

Chapter 4

3, 7, 11, 25, 33ab, 37bc, 39ab, 43, 51, 55, 109

Problem Set 1 returned

8

29

Chapter 5 slides

5.1 Continuous Probability Distributions

5.2 The Uniform Distribution

5.3 The Normal Distribution

5.4 Descriptive Methods for Assessing Normality

 

Chapter 5

3, 5abcf, 6a, 117, 20abc, 23abcf, 25ab, 24a, 29a, 30ad, 36, 47, 53, 56, 57abcd, 65

Chapter 5

1, 3abcf, 4a, 7, 15abc, 17abcf, 19ab, 21a, 23a, 25ad, 30, 41

45, 46, 47ab, 53

 

  October

 

 

 

 

9 4

(Paul away, Jahna lectures)

Chapter 6 slides

6.1 The Concept of a Sampling Distribution

6.2 Properties of Sampling Distributions: Unbiasedness and Minimum Variance

Chapter 6

3, 9, 49abc, 17

Chapter 6

1, 7, 35abc, 11

Problem Set 2 due;

Problem Set 3 distributed

10 6

6.3 Central Limit Theorem

5.5 Approximating a Binomial Distribution with a Normal Distribution

Chapter 6

27abc, 26, 31, 41, 48, 49d

Chapter 5

69, 131

Chapter 6

15abc, 18, 21, 31, 34ab, 35d

Chapter 5

55, 67

 

 

11

11

We're one session behind here, so just catching up    

Problem Set 2 returned

12

13

(Paul away, Jahna lectures)

Chapter 7 slides

7.1 Identifying the Target Parameter

7.2 Large-Sample Confidence Interval for a Population Mean

7.3 Small-Sample Confidence Interval for a Population Mean

 

Chapter 7

7, 9a, 3, 5, 19, 23, 29ab, 31bd, 37, 83, 99

Chapter 7

1, 3, 7, 9, 13, 19, 23ab, 25bd, 31, 69, 81

 

 

18

UM Study Break

 

 

 

13 20

7.4 Large-Sample Confidence Interval for a Population Proportion

7.5 Determining the Sample Size

Chapter 7

42, 48, 53, 63, 69, 75 (also, for 75, what sample size is needed to estimate within .02 with 95% confidence?)

Chapter 7

35, 40, 45, 51, 57, 61 (also, for 61, what sample size is needed to estimate within .02 with 95% confidence?)

 

14 25

Chapter 8 slides

8.1 The Elements of a Test of Hypothesis

8.2 Large-Sample Test of Hypothesis About a Population Mean

8.3 Observed Significance Levels: p-Values

Chapter 8

1, 5, 7, 147, 21abg, 23, 35, 38, 39, 45, 47

Chapter 8

1, 5, 7, 15, 17abg, 19, 29, 30, 31, 37, 39

Problem Set 3 due; Problem Set 4 distributed

  27

8.4 Small-Sample Test of Hypothesis About a Population Mean

8.5 Large-Sample Test of Hypothesis About a Population Proportion

Chapter 8

53, 57ab, 59, 137, 73ab, 130, 151, 145 (would it be more appropriate to do a hypothesis test or a confidence interval?); 148, estimate the probability that a #1 seed will lose to a #16 seed. (Hint: consider the margin of victory data, not just won-loss record. Make whatever heroic assumptions you need, but be clear about what they are.)

Chapter 8

45, 49ab, 51, 55, 61ab, 66, 69, 75, 123 (would it be more appropriate to do a hypothesis test or a confidence interval?);
 p. 373, estimate the probability that a #1 seed will lose to a #16 seed. (Hint: consider the margin of victory data, not just won-loss record. Make whatever heroic assumptions you need, but be clear about what they are.)

Problem Set 3 returned

  November

 

 

 

 

15 1 MIDTERM EXAM (covers chapters 1-8; open book and open notes; calculators OK; but no computers)    

 

16 3

Chapter 9 slides

9.1 Identifying the Target Parameter

9.2 Comparing Two Population Means: Independent Sampling

9.3 Comparing Two Population Means: Paired Difference Experiments

9.4 Comparing Two Population Proportions: Independent Sampling

9.5 Determining the Sample Size

Chapter 9

6, 3, 15, 29, 33ab, 35, 43, 122, 51, 58, 75, 79

Chapter 9

1, 5, 11, 25, 27ab, 29, 35, 38, 43, 50, 63, 67

Midterm exam returned

17 8

Chapter 10 slides

10.1 Elements of a Designed Experiment

10.2 The Completely Randomized Design

10.3 Multiple Comparisons of Means

 

Chapter 10

5, 11, 17, 19, 20, 21, 22, 24, 28, 34, 37, 38, 46

Chapter 10

5, 9, 11, 13, 14, 15, 16, 19, 22, 26, 31, 34

Problem Set 4 due;

Problem Set 5 distributed

 

18

10

10.4 The Randomized Block Design

10.5 Factorial Experiments

Chapter 10

57, 65, 110, 73, 75, 83, 85, 114, 108

Chapter 10

41, 47, 51, 53, 55, 61, 63, 67, 84

 

19

15

Chapter 13 slides

13.1 Categorical Data and the Multinomial Experiment

13.2 Testing Categorical Probabilities: One-Way Table

13.3 Testing Categorical Probabilities: Two-Way (Contingency) Table

13.4 A Word of Caution About Chi-Square Tests

Chapter 13

5, 2, 6, 11, 22, 23, 31, 49

Chapter 13

3, 5, 6, 11, 18, 19, 25
Siskel and Ebert on p. 730 bottom of page

Problem Set 5 due;
Problem Set 6 distributed;
Problem Set 4 returned

20

17

Chapter 11 Slides

11.1 Probabilistic Models

11.2 Fitting the Model: The Least Squares Approach

11.3 Model Assumptions

Chapter 11

 

5ac, 6ac, 9ac, 4, 26, 28

Chapter 11

1ac, 2ac, 5ac, 9, 19, 23

 

21 22

11.4 An Estimator of ...sigma^2

11.5 Assessing the Utility of the Model: Making Inferences About the slope...b1

11.6 The Coefficient of Correlation

Chapter 11

32, 42, 48, 59, 119

Chapter 11

27, 33, 35, 45, 47

 

  24

 

No Class-- Thanksgiving

 

 

22 29

11.7 The Coefficient of Determination

11.8 Using the Model for Estimation and Prediction

11.9 A Complete Example

Chapter 11

65, 68bc, 107, 118, 87, 88, 92

 

 Chapter 11  49, 51bc, 53, 56, 63, 64, 70

Problem Set 6 due;

Problem Set 7 distributed;

Problem Set 5 returned;

  December

 

 

 

 

23

1

Chapter 12 slides

12.1 Multiple Regression Models

12.2 The First-Order Model: Estimating and Interpreting the ... b Parameters

12.3 Inferences About the Individual...b Parameters and the Overall Utility of a Model

12.4 Using the Model for Estimation and Prediction

Chapter 12

7, 9, 13abc, 11, 12, 26, 143, 30

Chapter 12

3, 5, 7abc, 17, 18, 28, 29, 31

Problem Set 6 returned

24 6

12.5 Model Building: Interaction Models

12.6 Model Building: Quadratic and Other Higher-Order Models

Chapter 12

38, 41, 43a, 51, 53, 55

Chapter 12

38, 41, 43a, 49, 51, 53

 

25 8

12.7 Model Building: Qualitative (Dummy) Variable Models

12.8 Model Building: Models with Both Quantitative and Qualitative Variables

12.9 Model Building: Comparing Nested Models

Chapter 12

67, 80, 79, 89, 93, 90

Chapter 12

65, 77, 79, 87, 89, 90

 

26 13

12.11 Residual Analysis: Checking the Regression Assumptions

12.12 Some Pitfalls: Estimatibility, Multicollinearity, and Extrapolation

 

Chapter 14

14.1 Introduction: Distribution-Free Tests

14.2 Single Population Inferences: The Sign Test

14.4 Comparing Two Populations: The Wilcoxon Signed Rank Test for the Paired Difference Experiment

Chapter 12

116, 117, 120, 153

 

Kraut et al article, see CTools site for PDF, in the "Resources" section

Chapter 12

102, 105, 107, 109

 

  15    

 

Problem Set 7 due

  19

 

Review session 3-5PM, 311 West Hall

 

Problem Set 7 returned

  21

 

Final Exam 10:30AM-12:30PM, 311 West Hall