Gail=s

SPSS Lab Workshop

 

Winter 2002

Research Methods

 

 

 

 

 

 

Gail Johnson

The Evergreen State College

Masters of Public Administration Program

867-6739

johnsong@evergreen.edu

 

 

 

 

Table of Contents

Purpose and Objectives: 1

Basic Concepts: 1

SPSS Overview: 2

The Data Sets Used in Course 3

Tasks 4

Opening a Data Set:

Creating a Data Set

Entering Data

Saving and Close

Analyzing Data 11

Frequencies: 11

Central Tendency: Mode, Median, Mean. 13

Measures of Dispersion and Standard Deviation 14

Cross Tabs(contingency tables) 14

Comparing Means 16

Confidence Intervals: 18

Statistical Significance: 19

1. Chi-Square: Tests for Nominal and Ordinal Data: 19

2. T-Test: Testing for Statistical Significance for Interval and Ratio Data: 21

3. One-way Anova (Analysis of Variance).......................................................... 23

Relationship: Measures of Association 24

Simple Regression: 27

Multiple Regression: 28

Mananaging Output.............................................................................................................................30

Moving SPSS Tables to Documents 31

Graphics 32

 

SPSS Workshop Guide

Purpose and Objectives:

The purpose of workshop is to provide hands-on experience in using SPSS.

Note: you may find it helpful to try the tutorial on your own at a later time.

These exercises are designed to familiarize you with the following:

# Opening a Data Set

# Adding variables to a data set

# Saving a Data Set

# Analyzing a Data Set .

Working with Computers: General Advice

( Approach the computer with a calm mind.

Basic Concepts:

Data Set: A specific set of data that is in a file format which resembles a spread sheet with rows and columns. The rows contain the information for each person or case; this is called a Arecord.@ The columns contain the specific information for each person: age, gender, income, etc. Every record will have a unique identifier, usually called an ID.

Variables: These are the specific items which are collected and entered into the columns of a data set. For example in the typical survey data set, age, gender, race, income, etc. would be considered variables. Each variable has a unique name (less than 9 characters). E.g. V8

Variable Labels: Variable labels are closer to the English language, so when it prints out, it makes sense to you e.g. V8 is labeled PoliticalView.

Values: Each variable is comprised of values. If the variable is gender, then the values are male and female. In a data base, however, values are typically coded to take up less room. So gender might be coded as M or F, or gender could be coded as 0 or 1.

Value Labels: These labels show up on your output. So, in our gender example, instead of having 0s and 1s on the output, the computer will print out the value labels of Male and Female. You can create and change value labels whenever you want.

SPSS Overview:

SPSS looks like a spread sheet with rows and columns. Every row represents an individual record. Like other windows application, SPSS is menu driven. Typically, the menu is at the top of the screen.

Menu Options:

File: To open, save, or close a data set.

Edit: To cut, copy, paste, find data or text. Options: you can make changes to

how things appear. For exmaple, you can change how the variables appear on the lists in the analysis boxes: names or labels, alphabetically or in order of the file.

View: Allows you to change how you view the screen. i.e. make it larger or smaller.

Data: To insert cases or variables.

Transform: Main one we will use. To recode variables; this is useful when you want to change interval level data, such as age, into categories (18-25, 26-35 etc.).

Analyze: This is were all the fun takes place. This menu contains all the tools you will need to crunch numbers and analyze data.

Graphs: This menu will allow you to create a variety of charts, line, bar, and pie, from your data.

Utilities: This allows you to view variable labels, variable definitions, and data set Information.

Help: Help.

You will also find that right clicking the mouse will give you options. You can cut and paste to make creating data sets less labor intensive.

Determining How Variables are Coded:

Output: Whenever you analyze some data, SPSS will create an output file. Each set can be copied, cut, and pasted into other programs such as Word or Word Perfect much like you would text or clip art.

You can also save the output to a disk or to the hard drive.

You can print out the output, as you would any document.

Warning: Each time you analyze data, the new analysis is added to the output file. When you tell the computer to print, all the output will print out. What this means is that if you have stuff you do not want, delete it before printing to save trees.

Data Sets

You have been given a disk with data sets on it: Put your name and phone number on the disk. The following data sets are to be used for your assignments:

USPA Fall 2000: All the students in USPA classes were asked to complete a student survey in Fall 2000. A copy of the survey is on your Lab disk. Some variables have been deleted so data set has <50 variables. It is similar to the 1997 survey in Appendix I in the Practical Guide. You will use this dataset for your data analysis paper.

File: USPA2000.sav

GSS Surveys : These are short version of the General Social Survey which is a random sample of U.S. Adults. These data sets include some demographic information as well as respondents attitudes on a variety of political issues, and music preferences. These set contains mostly categorical and ordinal data, but there are few variables with interval level data.

Files: gss91pol.sav and gss93.sav

Welfare Data: This data set contains mostly interval level data. Information from all 50

states include: poverty data, AFDC benefits for a family of three, teen pregnancy, etc.

File: welfare.sav

 

The Workshop Begins:

1. Opening SPSS

Steps:

2. Opening an Existing Data Set

Steps:

3. Creating a Data Set

Now you can practice creating a data set on the Aworkshop.sav@ file. Someone started to set up this data set but did not finish. These are the survey questions on page 6. You will finish creating this data set. Your first task is to create the remaining variables (with variable names, variable labels and value labels).

Steps:

Now you completed creating a variable label and value label for one variable, move onto next questions, and finish creating the variables. Instead of just using var1, var2 etc., try to give them names that will help you during data analysis time. Age, race, program, etc. might be useful. If you use an English name as your variable name, you don=t need to label it.

Note: number of credit hours is a real number. You do not label the values for real numbers: they stand as they are.

Also note: You might want to add missing values to Teach1BTeach9 variables. You can either type in 9s and 6s as the discrete values for each one or you can:

4. Entering Data

Once you have set up your data set, you are ready to enter the data. You will find 4 surveys on pages 7-10 Please enter the data into the data set.

Go back to the bottom left on your screen, anc click on data view. You can use the Atab@ key or the arrow key to move around the cells in the data base. Hit the Aenter@ key or tab to the next column once you have typed in the appropriate value.

It is often easiest to work in pairs for this, with one person calling out the data and the other person typing it. It is easy to make errors in typingBtrust me, I know. Go back and verify all your entries.

5. Saving and Closing a Data File

Steps:

Question: 3. Effectiveness of Instructional Approaches:

 

Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material?

Instructional

Approaches

Very

Effective

1

Generally

Effective

2

As Effective as

Ineffective

or Neither

3

Generally

Ineffective

4

Very

Ineffective

5

a. Lectures

 

 

 

 

 

 

 

 

 

 

b. Guest speakers

 

 

 

 

 

 

 

 

 

 

c. Case studies

 

 

 

 

 

 

 

 

 

 

d. Videos

 

 

 

 

 

 

 

 

 

 

e. Class discussions

 

 

 

 

 

 

 

 

 

 

f. Individual student presentations

 

 

 

 

 

 

 

 

 

 

g. Group student presentations

 

 

 

 

 

 

 

 

 

 

h. Group student projects

 

 

 

 

 

 

 

 

 

 

i. Exams

 

 

 

 

 

 

 

 

 

 

j. Research papers

 

 

 

 

 

 

 

 

 

 

Question: 11. What is your present age group?

_____21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.

Question: 12. What is your gender?

_____ Male _____ Female.

Question: 16. In which program are you currently enrolled?

____MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.

Question: 17. Including this semester, approximately how many credit hours have you completed?

________ credit hours.

ID: 01

Question: 3. Effectiveness of Instructional Approaches:

 

Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material?

Instructional

Approaches

Very

Effective

Generally

Effective

As Effective as

Ineffective

or Neither

Generally

Ineffective

Very

Ineffective

a. Lectures

 

 

 

 

 

 

x

 

 

b. Guest speakers

 

 

x

 

 

 

 

 

 

c. Case studies

x

 

 

 

 

 

 

 

 

d. Videos

 

 

x

 

 

 

 

 

 

e. Class discussions

x

 

 

 

 

 

 

 

 

f. Individual student presentations

 

 

 

 

x

 

 

 

 

g. Group student presentations

 

 

 

 

x

 

 

 

 

h. Group student projects

x

 

 

 

 

 

 

 

 

i. Exams

 

 

 

 

 

 

x

 

 

j. Research papers

x

 

 

 

 

 

 

 

 

Question: 11. What is your present age group?

_____21 to 25 _____26 to 35 _____36 to 45 __x___ 46 or over.

Question: 12. What is your gender?

_____ Male ___x__ Female.

Question: 16. In which program are you currently enrolled?

____MPA _____MUS __x__Ph.D., ____Other Program(MS/MA) ____Not enrolled.

Question: 17. Including this semester, approximately how many credit hours have you completed?

_____36_____ credit hours.

 

ID: 02

Question: 3. Effectiveness of Instructional Approaches:

 

Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material?

Instructional

Approaches

Very

Effective

Generally

Effective

As Effective as

Ineffective

or Neither

Generally

Ineffective

Very

Ineffective

a. Lectures

x

 

 

 

 

 

 

 

 

b. Guest speakers

 

 

x

 

 

 

 

 

 

c. Case studies

 

 

 

 

x

 

 

 

 

d. Videos

 

 

x

 

 

 

 

 

 

e. Class discussions

x

 

 

 

 

 

 

 

 

f. Individual student presentations

 

 

 

 

 

 

x

 

 

g. Group student presentations

 

 

 

 

 

 

 

 

x

h. Group student projects

 

 

 

 

x

 

 

 

 

i. Exams

 

 

x

 

 

 

 

 

 

j. Research papers

 

 

x

 

 

 

 

 

 

Question: 11. What is your present age group?

__x___21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.

Question: 12. What is your gender?

___x__ Male _____ Female.

Question: 16. In which program are you currently enrolled?

__x__MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.

Question: 17. Including this semester, approximately how many credit hours have you completed?

___9_______ credit hours.

ID: 03

Question: 3. Effectiveness of Instructional Approaches:

 

Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material?

Instructional

Approaches

Very

Effective

Generally

Effective

As Effective as

Ineffective

or Neither

Generally

Ineffective

Very

Ineffective

a. Lectures

x

 

 

 

 

 

 

 

 

b. Guest speakers

 

 

 

 

 

 

 

 

 

 

c. Case studies

x

 

 

 

 

 

 

 

 

d. Videos

 

 

 

 

x

 

 

 

 

e. Class discussions

 

 

x

 

 

 

 

 

 

f. Individual student presentations

 

 

 

 

 

 

 

 

x

g. Group student presentations

 

 

 

 

 

 

x

 

 

h. Group student projects

 

 

 

 

x

 

 

 

 

i. Exams

 

 

 

 

 

 

x

 

 

j. Research papers

 

 

x

 

 

 

 

 

 

Question: 11. What is your present age group?

_____21 to 25 _____26 to 35 __x___36 to 45 _____ 46 or over.

Question: 12. What is your gender?

_____ Male ___x__ Female.

Question: 16. In which program are you currently enrolled?

___x_MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.

Question: 17. Including this semester, approximately how many credit hours have you completed?

_____9_____ credit hours.

 

 

ID: 04

Question: 3. Effectiveness of Instructional Approaches:

 

Considering the USPA courses you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material?

Instructional

Approaches

Very

Effective

Generally

Effective

As Effective as

Ineffective

or Neither

Generally

Ineffective

Very

Ineffective

a. Lectures

x

 

 

 

 

 

 

 

 

b. Guest speakers

 

 

x

 

 

 

 

 

 

c. Case studies

 

 

 

 

 

 

 

 

 

 

d. Videos

 

 

 

 

 

 

x

 

 

e. Class discussions

 

 

 

 

 

 

 

 

x

f. Individual student presentations

 

 

 

 

 

 

 

 

x

g. Group student presentations

 

 

 

 

 

 

x

 

 

h. Group student projects

 

 

x

 

 

 

 

 

 

i. Exams

x

 

 

 

 

 

 

 

 

j. Research papers

x

 

 

 

 

 

 

 

 

Question: 11. What is your present age group?

____x_21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.

Question: 12. What is your gender?

____x_ Male _____ Female.

Question: 16. In which program are you currently enrolled?

____MPA ___x__MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.

Question: 17. Including this semester, approximately how many credit hours have you completed?

_____120___ credit hours.

 

Analyzing Data: Overview

Nominal: This is data which has a name, i.e. a person=s job, their sex, the city they live. Nominal data should not be averaged.

Ordinal: This is data which has an order (low to high). Survey data is frequently ordinal data: how strongly do you agree or disagree . Gail=s advice: ordinal data should not be averaged.

Interval and Ratio: SPSS calls these Scale. These measures are real numbers, such as: age, weight, dollars. Interval and ratio data can be analyzed using the same statistical procedures.

Descriptive Analysis

A. Using SPSS to get Frequencies and Percent Distributions:

SPSS Exercise 1: What percent of the respondents from the Fall 1997 survey were male or female?

Steps:

 

SPSS Exercise 2: What is the frequency and percent distribution for the number of credits reported in the Fall 1997 survey?

Steps:

 

Output # 2:

CREDITS Credit hours completed

Values

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

0

32

20.5

20.5

20.5

 

3

16

10.3

10.3

30.8

 

6

16

10.3

10.3

41.0

 

7

1

.6

.6

41.7

 

9

15

9.6

9.6

51.3

 

12

12

7.7

7.7

59.0

 

13

1

.6

.6

59.6

 

15

11

7.1

7.1

66.7

 

18

9

5.8

5.8

72.4

 

21

6

3.8

3.8

76.3

 

22

2

1.3

1.3

77.6

 

24

1

.6

.6

78.2

 

25

1

.6

.6

78.8

 

26

2

1.3

1.3

80.1

 

27

3

1.9

1.9

82.1

 

30

7

4.5

4.5

86.5

 

33

4

2.6

2.6

89.1

 

34

1

.6

.6

89.7

 

36

5

3.2

3.2

92.9

 

39

2

1.3

1.3

94.2

 

40

1

.6

.6

94.9

 

43

1

.6

.6

95.5

 

45

2

1.3

1.3

96.8

 

49

2

1.3

1.3

98.1

 

60

1

.6

.6

98.7

 

153

1

.6

.6

99.4

 

190

1

.6

.6

100.0

 

Total

156

100.0

100.0

 

 

Note:

Output # 2. It will be long because it will tell you how many people (and what percent) for each value of credit hours.

How to Interpret the Output:

The second column contains the values for the variable that were reported. In this table, it is the exact number that they reported. You might note that there are no missing values in the table. The next column is the frequency column, which shows how frequently the various credit hour was selected. For example: 32 people reported they had not completed any credit hours and 12 reported they had completed 12 credit hours. The next is the percent column. This shows the percent distribution of all the people who were included in the survey. 21% reported 0 credit hours and 8% reported 12 credit hours.

The next column shows the valid percent, that is, of the people who gave an answer. In the previous example, there were missing values. In this example, everyone answered the question, so the percent and valid percent columns are the same. The last column, the cumulative percent column, adds the percent as you got down the column. 21% reported 0 credits. 10% reported 3 credit hours, which gets added to the 21%, for a total of 31%. If we want to see where the majority are, we find 51% reported having 9 credits hours or less. Majority is always 51% or more.

Extra Exercise:

1. Are students more likely to prefer lecture classes (teach1) as compared to class discussions (teach5)?

2. Are students more likely to prefer exams (teach 9) as compared to research papers (teach 10)?

3. What percent of students would recommend the program (recommnd)?

B: Using SPSS to Get Means, Medians and Modes

Exercise 3: How would you best describe central tendency of the number of credit hours of the respondents?

Steps:

Output 3: Credit hours:Statistics

CREDITS Credit hours completed

N

Valid

156

 

Missing

0

Mean

 

15.58

Median

 

9.00

Mode

 

0

 

C. Using SPSS to Get the Standard Deviation

You have options for doing this. You can use the options in the frequency window, as you did above.

Exercise 4: What is the dispersion of credit hours?

Steps:

We can use a different analysis on SPSS to get the standard deviation: Descriptive command. The choice is a preference thing.

Steps:

D: The Concept of Crosstabs (contingency tables): Used with Nominal and Ordinal data.

Cross tabs are perhaps the most interesting and fun statistical tool you can use. It can be used to describe two or more variables, as well as examine relationships.

Dependant Variable: This is the variable you want to explain.

Independent Variable: This is the variable you think affects your dependant variable or causes it to change.

 

Using SPSS to Do Crosstabs:

Exercise 6: Suppose we want to see if gender matters in terms of extent to which respondents will recommend the program to others.

Steps:

Note: you always percentage the row or column where you place your independent variable.

Output # 6:

SEX Sex * Extent of recommendation Crosstabulation

Extent of Recommendation

Total

     

Strongly

Generally

Possibly

Generally not

 

SEX Sex

Male

Count

17

26

7

1

51

   

% within SEX Sex

33.3%

51.0%

13.7%

2.0%

100.0%

 

Female

Count

19

20

7

1

47

   

% within SEX Sex

40.4%

42.6%

14.9%

2.1%

100.0%

Total

 

Count

36

46

14

2

98

   

% within SEX Sex

36.7%

46.9%

14.3%

2.0%

100.0%

Interpreting the Output:

Of the 51 men, 33% would strongly recommend and 51% would generally recommend. Now look at the female row. How many women are there? 47. What percent of them strongly recommended the program? 40% strongly recommended and 43% generally recommended the program.

Gail=s advice: I look for at least a 10% difference between the results. In this case, I would look for at least a 10% difference before I am willing to say there is a difference in extent to which men and women would recommend this program. These results do not have a 10% difference.

Exercise 6A: Let=s Review this setting up the tables stuff:

You can set up the table differently. You can put sex in column and recomnd in the row. When you go to cells, you want to click on percentage in column, since that is where you placed the independent variable.

Steps:

Even more confusing: The computer will do whatever you ask. So what does it look like if we click on other cell options?

Exercise 7: Go back to your analysis on sex and recommendation. Go through the steps to set up the cross tab analysis, but when you go into cells, click on the options for percent: expected, row, column, total. Print this out and bring this output to class. We will go over this mondo bizarro table!!

Exercise 8:

a. Do people from different programs (program) have more or less similar in their willingness to recommend (recommnd) the program?

B. Do students vary in their preference for research papers (Teach10) and exams (teach9) based on what program (MPA, MUS or Ph.D) they are in?

 

E. Using SPSS to Compare Means:

In situations where your dependent variable is interval/ratio level data, and your independent variable is nominal, you can compare the means.

Exercise 9: Do men earn more than women?

To answer this question, we need to use a different data set.

You need to open a new data set: GSS91pol.sav

Steps:

Output 9

Income recoded to dollars

Respondent's Sex

Mean

N

Std. Deviation

Median

Male

38965.11

609

25470.86

32500.00

Female

33096.23

756

25365.95

27500.00

Total

35714.65

1365

25570.58

32500.00

 

Exercises 10: Compare means using the GSS91pol data:

a. find out if men and women watch the same amount of TV (TVhrs).

b. find out if there is a difference in the age of marriage (agewed) between men and women.

Exercise 11: Is there is a difference in income (income recoded as dollars: name incomdol) and educational degree (RS Highest Degree?

Output # 11:

Report

Income recoded to dollars

RS Highest Degree

Mean

N

Std. Deviation

Less than HS

18021.08

249

18217.58

High school

33188.21

704

22530.35

Junior college

41129.31

87

22645.84

Bachelor

49033.56

216

25542.65

Graduate

62275.46

108

25168.22

Total

35738.27

1364

25565.06

 

Concept of Inferential Statistics: Working with Sample Data

A. Confidence Levels and Confidence Intervals.

Sometimes we want to make estimates about the larger population based on our sample. Because we are working with a sample, there is always some amount of error. But we still want to be able to say with some degree of confidence (usually 95% confident) that the average income is at least between two points. One way to estimate averages back to the larger population is by constructing a confidence interval. Typically, you will ask the computer to tell you, with 95% confidence, what the true mean would be if we had surveyed everyone in the population.

Using SPSS to Calculate Confidence Intervals:

Exercise 12: What is the best estimate of the average salary of the population, and the 95% confidence interval, based on our sample?

Steps:

Output 12:

Descriptives

 

Statistic

Std. Error

Mean

 

35714.65

692.11

95% Confidence Interval for Mean

Lower Bound

34356.94

 
 

Upper Bound

37072.36

 

5% Trimmed Mean

 

34682.13

 

Median

 

32500.00

 

Variance

 

653854367.273

 

Std. Deviation

 

25570.58

 

Minimum

 

500

 

Maximum

 

87500

 

Range

 

87000

 

Interquartile Range

 

41250.00

 

Skewness

 

.714

.066

Kurtosis

 

-.516

.132

 

Exercise 13: Based on our sample:

What is our best estimate of average number of siblings in the population?

What is our best estimate of the average age that people in the population wed?

Print out and Bring to Class.

Another Analysis: You can use explore to obtain CI if you are comparing means.

Exercise 14: Suppose you want to know the means and the confidence intervals of the average salaries for men and women?

Steps:

Output 14:

Descriptives: Income for Men and Women

Respondent's Sex

   

Statistic

Std. Error

Male

Mean

 

38965.11

1032.13

 

95% Confidence Interval for Mean

Lower Bound

36938.13

 
   

Upper Bound

40992.08

 
 

5% Trimmed Mean

 

38254.70

 
 

Median

 

32500.00

 
 

Variance

 

648764713.845

 
 

Std. Deviation

 

25470.86

 

Female

Mean

33096.23

922.55

 

95% Confidence Interval for Mean

Lower Bound

31285.16

 
   

Upper Bound

34907.30

 
 

5% Trimmed Mean

 

31792.99

 
 

Median

 

27500.00

 
 

Variance

 

643431538.750

 
 

Std. Deviation

 

25365.95

 

 

B. Concept of Statistical Significance:

When you are working from a random sample you must figure out if somehow you got your results by chance. If the results are .05 or less, we will say the results are statistically significant and therefore results we observe are not likely to be caused by random chance.

 

1. Using SPSS to Do Chi-Square Tests for Nominal and Ordinal Data:

To do a chi square, you first do the cross tab analysis.

Exercise 15: Is there a statistically significant difference between men and women willingness to recommend the program.

Steps

Note: this is Chi Square for the crosstab you did earlier that answered whether or not men and women differed in there responses on recommending the program (Exercise 6, output 6). As you recall, we did not see much of a difference. The Chi Square Table looks like this:

Output 15:

Chi-Square Tests

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

.732

3

.866

Likelihood Ratio

.733

3

.865

Linear-by-Linear Association

.136

1

.713

N of Valid Cases

98

   

 

Interpreting the Output:

Remember, we are pretending this is a random sample for a large population of students for this exercise.

The only column of importance is the last one: Sig. (2 sided). For the results to be statistically significant, the test should be .05 or less. Clearly, .866 is greater than .05. The relationship between sex and extent to which respondents recommended the program is not statistically significant.

Exercise 16: Is political outlook (Liberal/conservative) (politics) related to views on legalization of marijuana (grass)?

Open the GSS91 Pol data set. Do a cross tab, with Chi Square test for statistical significance. Remember, to click cell and percentage on wherever you place your independent variable.

2. T-Test: Testing for Statistical Significance for Interval and Ratio Data:

The T-Test is an interval statistic that determines if the means are statistically significant.

One Sample: Use when you are comparing sample results to a known population.

Paired Means: Use when two variables are paired like in a pre and post test: before and after design. You must use the same people for your results to be correct.

Independent Means: Use when you are comparing the means of two groups (for example, men and women)..

Using SPSS to Do a One Sample T-Test

Exercise 17: If you were trying to figure the number of hours worked per week and whether the results you got were pretty accurate of the population in general. Use GSS91pol.sav

One Sample T-Test: Steps:

Output 17:

One-Sample Statistics

N

Mean

Std. Deviation

Std. Error Mean

HRS1 Number of Hours Worked Last Week

900

41.76

14.63

.49

One-Sample Test

Test Value = 0

   
 

t

df

Sig. (2-tailed)

Mean Difference

95% Confidence Interval of the Difference

         

Lower

Upper

HRS1 Number of Hours Worked Last Week

85.636

899

.000

41.76

40.81

42.72

 

Using SPSS to Do an Independent Sample T-Test:

Let=s go back to our gender question about income. In prior analysis, we found that men earned more, on average, than women (Exercise 9).

Exercise 18: Are these differences in income statistically significant? We use an independent sample t-test.

Steps:

Output 18:

Group Statistics

Respondent's Sex

N

Mean

Std. Deviation

Std. Error Mean

Male

609

38965.11

25470.86

1032.13

Female

756

33096.23

25365.95

922.55

Independent Samples Test

Levene's Test for Equality of Variances

 

t-test for Equality of Means

   
 

F

Sig.

t

df

Sig. (2-tailed)

           

Equal variances assumed

.474

.491

4.241

1363

.000

Equal variances not assumed

   

4.239

1299.580

.000

 

Interpreting the Output:

We get two tables. The first table tells us the mean and standard deviation. The second gives us the statistical test for significance. This is presented in the last column, Sig. 2-tailed. The statistically test is less than .05. We can ignore all the rest of this stuff.

Exercise 19:

How likely are the average number of hours spent watching TV (TVhours) reported by our respondents a fairly accurate portrayal of the population as a whole?

Are there differences in the average number of hours of TV watched by men and women, and are those results statistically significant?

 

Analysis of Variance:

You may have noticed that the t-test grouping variable only accepts two values. When you have a variable with more than two values, such as race, religion and educational degrees, you will need to use one-way Anova (analysis of variance) to test for statistical significance.

Exercise 20: Is there a statistically significant difference in income based on educational degree (highest degree obtained)?

Steps:

Output 20:

Descriptives

Income recoded to dollars

N

Mean

Std. Deviation

95% Confidence Interval for Mean

Minimum

Maximum

       

Lower Bound

Upper Bound

   

Less than HS

249

18021.08

18217.58

15747.22

20294.94

500

87500

High school

704

33188.21

22530.35

31521.05

34855.37

500

87500

Junior college

87

41129.31

22645.84

36302.83

45955.79

2000

87500

Bachelor

216

49033.56

25542.65

45607.95

52459.18

500

87500

Graduate

108

62275.46

25168.22

57474.50

67076.42

5500

87500

Total

1364

35738.27

25565.06

34380.35

37096.19

500

87500

ANOVA

Income recoded to dollars

Sum of Squares

df

Mean Square

F

Sig.

Between Groups

199504503974.035

4

49876125993.509

98.048

.000

Within Groups

691314308342.681

1359

508693383.622

   

Total

890818812316.716

1363

     

 

VII. Relationships

Sometimes it is important to determine how strong the relationship is between variables. Measures of relationships are generally measured from zero to one, with 1 being a perfect relationship and 0 being no relationship at all. Most of the time you will see the relationship reported in a number which has a plus(+) or a minus(-) sign next to it. These signs refer to whether the relationship is an inverse(-) relationship, where as one goes up the other goes down, or a positive(+) relationship, where the variables change in the same direction. As one variable goes up so does the other or as one goes down so does the other.

Measures of Association:

Using SPSS to Measure Relationships for Nominal Data:

Exercise 21: Is one=s religious preference related to one=s veiws on programs to provide birth control to teenagers? Use the GSS91pol data set.

Steps:

Chi-Square Tests

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

32.672

12

.001

Likelihood Ratio

35.095

12

.000

Linear-by-Linear Association

22.588

1

.000

N of Valid Cases

970

   

a 6 cells (30.0%) have expected count less than 5. The minimum expected count is 3.17.

Symmetric Measures

 

Value

Approx. Sig.

Nominal by Nominal

Phi

.184

.001

 

Cramer's V

.106

.001

N of Valid Cases

 

970

 

a Not assuming the null hypothesis.

b Using the asymptotic standard error assuming the null hypothesis.

 

Interpreting the Output:

The first table is our crosstab which you are expert at interpreting by now. The second table is our test for statistical significance, which you also know how to interpret. The last table is our strength of association measures. Although the mathematics behind these measures vary, the results are often very similar. The second column tells you the name of the measure of association. The Value column shows the strength of the relationship. Since both measures are not all that far from 0, this is a weak relationship. The next column tells you the statistical significance. In this analysis, the results are statistically significant.

Exercise 22:

Is there a relationship between gender and views on spanking? Is that relationship statistically significant?

Is there a relationship between religious preference and spanking? Is that relationship statistically significant?

2. Measuring Relationships for Ordinal Data:

Tau Tau B is considered to be a conservative function because it subtracts ties, therefore the results tend to be smaller. A tau b of .2 or more is respectable.

Gamma: Gamma is based upon probability. If you select on individual at random it computes what the chances of getting the same answer if you selected another at random. A gamma of .3 or more is worth examining.

Spearman=s Rho: Converts interval data to data which can be ranked (highest to lowest)

Using SPSS: Output for Relationships Using Ordinal Data

Exercise 23: What is the strength of the relationship between political ideology and legalization of marijuana?

Steps:

You will get a normal crosstabs table like the one for exercise 16. At the bottom, the computer will generate additional tables with the measures of association.

Output 23:

Symmetric Measures

 

Value

Asymp. Std. Error

Approx. T

Approx. Sig.

Ordinal by Ordinal

Kendall's tau-b

.210

.029

6.917

.000

 

Kendall's tau-c

.220

.032

6.917

.000

 

Gamma

.391

.052

6.917

.000

N of Valid Cases

 

899

     

Using SPSS to Do Interval Level Correlations:

The most common test to determine relationships between interval level variables is Pearson=s R. The coefficient compares the data set to the ideal perfect data set and to no relationship. Then it assigns a score (0-1) depending on how close the data is to the ideal set.

Exercise 24: What is the relationship education and income? Our hypothesis is that those with more education would earn more. Both years of education and income are at least interval level data.

Steps:

Output 24:

Correlations

 

EDUC Highest Year of School Completed

INCOMDOL Income recoded to dollars

EDUC Highest Year of School Completed

Pearson Correlation

1.000

.460

 

Sig. (2-tailed)

.

.000

 

N

1496

1362

INCOMDOL Income recoded to dollars

Pearson Correlation

.460

1.000

 

Sig. (2-tailed)

.000

.

 

N

1362

1365

** Correlation is significant at the 0.01 level (2-tailed).

 

 

Regression Analysis:

When doing regression analysis you will be working with an equation. This equation will allow you to describe a data set, to estimate population parameters, to infer causality, and to forecast.

Measures of Association: R

r= multiple correlation coefficient (overall fit).

r2 = proportion of explained variation

1-r2 = proportion of unexplained variation (a.k.a. coefficient of determination).

Using SPSS to Do a Simple Regression

Exercise 25: Let=s look at education and income, using a simple regression

Steps:

Income (Incomdol) and place in dependent box

Educ (highest year of education in independent box

 

Output 25:

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.460

.212

.211

22699.57

a Predictors: (Constant), EDUC Highest Year of School Completed

 

ANOVA

Model

 

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

188426936995.327

1

188426936995.327

365.685

.000

 

Residual

700768148035.877

1360

515270697.085

   
 

Total

889195085031.204

1361

     

a Predictors: (Constant), EDUC Highest Year of School Completed

b Dependent Variable: INCOMDOL Income recoded to dollars

Coefficients

 

Unstandardized Coefficients

 

Standardized Coefficients

t

Sig.

Model

 

B

Std. Error

Beta

   

1

(Constant)

-13936.587

2671.534

 

-5.217

.000

 

EDUC Highest Year of School Completed

3802.543

198.848

.460

19.123

.000

a Dependent Variable: INCOMDOL Income recoded to dollars

 

 

Interpreting the Output:

We have three tables. The first table, The Model Summary, shows us how well a model explains variation in income. The R square is .21, which indicates a moderate relationship: 21% of all the variation in income is explained by education. The second table, ANOVA, tells us the statistical significance of the model. We want to look at the last column which gives us the statistical significance. In this case, it is less than .05, so the results are statistically significant. The coefficient table shows the impact of education on income. For every year of education, income goes up $3,802. This is also statistically significant.

Using SPSS to Do Multiple Regression:

To determine the regression coefficients and the constant you follow the same steps as if you are doing a bi-variate analysis. Once you have entered you dependant variable you can simply enter more than one independent variable. The computer will generate for you the coefficients the constant and the R value.

Exercise 26: What explains income?

Steps:

Output 26:

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.577

.333

.330

21048.82

a Predictors: (Constant), DWELOWN Homeowner or Renter, AGEWED Age When First Married, EDUC Highest Year of School Completed

 

ANOVA

Model

 

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

160736543078.188

3

53578847692.729

120.931

.000

 

Residual

321656336117.017

726

443052804.569

   
 

Total

482392879195.205

729

     

a Predictors: (Constant), DWELOWN Homeowner or Renter, AGEWED Age When First Married, EDUC Highest Year of School Completed

b Dependent Variable: INCOMDOL Income recoded to dollars

Coefficients

 

Unstandardized Coefficients

 

Standardized Coefficients

t

Sig.

Model

 

B

Std. Error

Beta

   
 

(Constant)

7174.054

5114.637

 

1.403

.161

 

EDUC Highest Year of School Completed

3880.293

255.097

.481

15.211

.000

 

AGEWED Age When First Married

-142.368

161.734

-.027

-.880

.379

 

DWELOWN Homeowner or Renter

-13176.228

1601.745

-.253

-8.226

.000

a Dependent Variable: INCOMDOL Income recoded to dollars

 

Interpreting the Output:

How good is our model?

The Model Summary gives us the answer here in the R square value. This tells us how much variation in income is explained by this model. Overall, this model explains .33 of the variation, which indicates there is a moderately strong relationship here. (Note: this is higher than what we got from our simple regression, so we are on the right track here).

Is our overall model statistically significant?

The ANOVA table gives us a lot of information. The only thing of importance for us is the last column, which gives us the significance. In this case, it is less than .05, so the model is statistically significant.

What explains income?

The Coefficient Table: The unstandardized coefficients tell you how much change there will be in income for every unit change in each independent variable while holding the others constant. (Sometimes we talk about this as the independent effect of each of the variables in our equation on income). So if we look at the independent effect of education while holding age when married (age wed) and home ownership (dwelown) constant, for every year increase in years of education, we get a $3,880 increase in income. Looking at home ownership, we see a negative sign, indicating an inverse relationship. In this case, we need to know how this variable was coded (right click on the variable to get information when you are in the regression window). This variable was coded 0 if the respondent owned a home and 1 if respondent rented. This gets interpreted as we move from a home owner (0) to a renter (1), income goes down by $13,176. Because of the way the data was coded, this is an inverse relationship.

Standardized Coefficients: Beta weights: The measurement of our independent variables are not the same; we use years for education and homeowner/renter for dwelling. They are not comparable. So, the computer standardizes them so we can see which is stronger in changing the dependent variable. It is called the Standardized Coefficient (or beta or beta weights). In this case, Education has a higher Beta weight.

 

Other Stuff:

Managing the Output:

Deleting stuff from an Output file:

Option A: right click on the chart or table, then hit delete button.

Option B: go to the file menu on the left side of your screen. Click on the yellow file folder, and it will delete everything in that folder when you hit the delete button. Or you can click on the specific files and hit delete button.

Print an Output File

Click File C> Print

You can print the material you have selected and highlighted by clicking on selection

Or You can print the whole file (remember to deleted stuff you don=t want) by clicking All Visible Output

Helpful Hint: You can save time and ink by clicking on properties and selecting econofast print.

You can choose between Portrait (which this is) or Landscape.

Crosstabs print better using landscape.

Note: all your output gets saved to a single file until you close it (either save or delete). If you hit print, all the output done in the output session will be printed out.

Saving an Output file:

I don=t usually save output since it is easy enough to recreate. But you may want to save the output from assignments at least until after your receive feedback.

 

 

Managing the Hardcopy Output:

I tend to save the hardcopy output while I am working on a report because it maintains an audit trail as GAO folks refer to it. I label output and keep it in a looseleaf binder. I label it in a way I can find it when I write a draft report using data and I can source it (e.g. Table 1 from Run 1, pages 12-20). It makes it easier to find the data if there is a question about the analysis or if I want to have someone check that my numbers matchBaccuracy check.

You can label SPSS output before printing or saving.

Exercise: For example, you want to label the run that has information about demographics:

You can change the label of any table as well. Double clicking creates an edit box.

 

Moving a SPSS Table into a Word Document:

You may find it helpful to move SPSS tables into your report. You want to open a word document. You can do this my hitting the B button at the top of the SPSS menu, then open up word and open a blank file. Got it? Then minimize that and go back to SPSS.

Let=s say you want to move the table reporting sex to your word document.

Steps:

Creating Summary Tables: You cannot create a single table that shows the percent rating various instructional approaches as effective in SPSS. You will have to create the table in word or wordperfect, and type in the percent distributions.

 

Using Graphs to Analyze and Display Data

This section will walk you through using graphs options. Remember to use Bar or Pie charts for ordinal and nominal variables.

Exercises: Show a picture of the percent of respondents in each program.

Steps:

( A )Bar Charts:

Editing the chart: You edit the chart in the output box. Double click on the chart. This will pop up the chart editor. You can change the fill, the color, the lines, bar style, bar labels, and the direction (from vertical to horizontal). You need to click apply in the little pop boxes for each of these changes. When done, hit the X button. When done with the editor, you click the X button.

You can move the chart to a word document. In the Output, double click on the chart. It will come up as box. Right click, and select copy as object.or copy: I am not sure how much difference there is. But play with thisBthe differences may become clear.

( B )Pie Charts:

Editing the chart: Go to editor by double click. You can pull out a slice, by clicking on that slice and then selecting the button that looks like pac man; if you don=t like it, click pac man again. You can change color, fill and lines.

You can add percents to the slices. While still in the chart editor box, select Chart from the menu at top of editor box, then click options. In options, and select percent (clicking on it gives it check mark). Click on format, and change decimal places to 0, then click OK to get out of the options box..

( C ) Cluster Charts:

Note: These are the same variables as used in crosstabs. This is another way of describing those variables with charts option. Once you are familiar with the creation of charts to represent your data analysis, you can explore different options provided by SPSS application (like - line charts, area charts, stacked bar charts, scatter charts and boxplots etc., )to better describe your data graphically.

Again, you can go to the chart editor (double click on chart) and make whatever changes you would like. Remember, color does not always reproduce well in black and white, so you might find using different fill to be more effective.