Gail
=sSPSS Lab Workshop
Winter 2002
Research Methods
Gail Johnson
The Evergreen State College
Masters of Public Administration Program
867-6739
johnsong@evergreen.edu
Table of Contents
Purpose and Objectives: 1
Basic Concepts: 1
SPSS Overview: 2
The Data Sets Used in Course 3
Tasks 4
Opening a Data Set:
Creating a Data Set
Entering Data
Saving and Close
Analyzing Data 11
Frequencies: 11
Central Tendency: Mode, Median, Mean. 13
Measures of Dispersion and Standard Deviation 14
Cross Tabs(contingency tables) 14
Comparing Means 16
Confidence Intervals: 18
Statistical Significance: 19
1. Chi-Square: Tests for Nominal and Ordinal Data: 19
2. T-Test: Testing for Statistical Significance for Interval and Ratio Data: 21
3. One-way Anova (Analysis of Variance).......................................................... 23
Relationship: Measures of Association 24
Simple Regression: 27
Multiple Regression: 28
Mananaging Output.............................................................................................................................30
Moving SPSS Tables to Documents 31
Graphics 32
SPSS Workshop Guide
Purpose and Objectives:
The purpose of workshop is to provide hands-on experience in using SPSS.
Note: you may find it helpful to try the tutorial on your own at a later time.
These exercises are designed to familiarize you with the following:
#
Opening a Data Set#
Adding variables to a data set#
Saving a Data Set#
Analyzing a Data Set .
Working with Computers: General Advice
( Approach the computer with a calm mind.
Basic Concepts:
Data Set: A specific set of data that is in a file format which resembles a spread sheet with rows and columns. The rows contain the information for each person or case; this is called a Arecord.@ The columns contain the specific information for each person: age, gender, income, etc. Every record will have a unique identifier, usually called an ID.
Variables: These are the specific items which are collected and entered into the columns of a data set. For example in the typical survey data set, age, gender, race, income, etc. would be considered variables. Each variable has a unique name (less than 9 characters). E.g. V8
Variable Labels: Variable labels are closer to the English language, so when it prints out, it makes sense to you e.g. V8 is labeled PoliticalView.
Values: Each variable is comprised of values. If the variable is gender, then the values are male and female. In a data base, however, values are typically coded to take up less room. So gender might be coded as M or F, or gender could be coded as 0 or 1.
Value Labels: These labels show up on your output. So, in our gender example, instead of having 0s and 1s on the output, the computer will print out the value labels of Male and Female. You can create and change value labels whenever you want.
SPSS Overview:
SPSS looks like a spread sheet with rows and columns. Every row represents an individual record. Like other windows application, SPSS is menu driven. Typically, the menu is at the top of the screen.
Menu Options:
File: To open, save, or close a data set.
Edit: To cut, copy, paste, find data or text. Options: you can make changes to
how things appear. For exmaple, you can change how the variables appear on the lists in the analysis boxes: names or labels, alphabetically or in order of the file.
View: Allows you to change how you view the screen. i.e. make it larger or smaller.
Data: To insert cases or variables.
Transform: Main one we will use. To recode variables; this is useful when you want to change interval level data, such as age, into categories (18-25, 26-35 etc.).
Analyze: This is were all the fun takes place. This menu contains all the tools you will need to crunch numbers and analyze data.
Graphs: This menu will allow you to create a variety of charts, line, bar, and pie, from your data.
Utilities: This allows you to view variable labels, variable definitions, and data set Information.
Help: Help.
You will also find that right clicking the mouse will give you options. You can cut and paste to make creating data sets less labor intensive.
Determining How Variables are Coded:
Output: Whenever you analyze some data, SPSS will create an output file. Each set can be copied, cut, and pasted into other programs such as Word or Word Perfect much like you would text or clip art.
You can also save the output to a disk or to the hard drive.
You can print out the output, as you would any document.
Warning: Each time you analyze data, the new analysis is added to the output file. When you tell the computer to print, all the output will print out. What this means is that if you have stuff you do not want, delete it before printing to save trees.
Data Sets
You have been given a disk with data sets on it: Put your name and phone number on the disk. The following data sets are to be used for your assignments:
USPA Fall 2000: All the students in USPA classes were asked to complete a student survey in Fall 2000. A copy of the survey is on your Lab disk. Some variables have been deleted so data set has <50 variables. It is similar to the 1997 survey in Appendix I in the Practical Guide. You will use this dataset for your data analysis paper.
File: USPA2000.sav
GSS Surveys : These are short version of the General Social Survey which is a random sample of U.S. Adults. These data sets include some demographic information as well as respondents attitudes on a variety of political issues, and music preferences. These set contains mostly categorical and ordinal data, but there are few variables with interval level data.
Files: gss91pol.sav and gss93.sav
Welfare Data: This data set contains mostly interval level data. Information from all 50
states include: poverty data, AFDC benefits for a family of three, teen pregnancy, etc.
File: welfare.sav
The Workshop Begins:
1. Opening SPSS
Steps:
2. Opening an Existing Data Set
Steps:
3. Creating a Data Set
Now you can practice creating a data set on the Aworkshop.sav@ file. Someone started to set up this data set but did not finish. These are the survey questions on page 6. You will finish creating this data set. Your first task is to create the remaining variables (with variable names, variable labels and value labels).
Steps:
Now you completed creating a variable label and value label for one variable, move onto next questions, and finish creating the variables. Instead of just using var1, var2 etc., try to give them names that will help you during data analysis time. Age, race, program, etc. might be useful. If you use an English name as your variable name, you don=t need to label it.
Note: number of credit hours is a real number. You do not label the values for real numbers: they stand as they are.
Also note: You might want to add missing values to Teach1BTeach9 variables. You can either type in 9s and 6s as the discrete values for each one or you can:
4. Entering Data
Once you have set up your data set, you are ready to enter the data. You will find 4 surveys on pages 7-10 Please enter the data into the data set.
Go back to the bottom left on your screen, anc click on data view. You can use the Atab@ key or the arrow key to move around the cells in the data base. Hit the Aenter@ key or tab to the next column once you have typed in the appropriate value.
It is often easiest to work in pairs for this, with one person calling out the data and the other person typing it. It is easy to make errors in typingBtrust me, I know. Go back and verify all your entries.
5. Saving and Closing a Data File
Steps:
Question: 3. Effectiveness of Instructional Approaches:
|
Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material? |
|||||
|
Instructional Approaches |
Very Effective 1 |
Generally Effective 2 |
As Effective as Ineffective or Neither 3 |
Generally Ineffective 4 |
Very Ineffective 5 |
|
a. Lectures |
|
|
|
|
|
|
b. Guest speakers |
|
|
|
|
|
|
c. Case studies |
|
|
|
|
|
|
d. Videos |
|
|
|
|
|
|
e. Class discussions |
|
|
|
|
|
|
f. Individual student presentations |
|
|
|
|
|
|
g. Group student presentations |
|
|
|
|
|
|
h. Group student projects |
|
|
|
|
|
|
i. Exams |
|
|
|
|
|
|
j. Research papers |
|
|
|
|
|
Question: 11. What is your present age group?
_____21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.
Question: 12. What is your gender?
_____ Male _____ Female.
Question: 16. In which program are you currently enrolled?
____MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.
Question: 17. Including this semester, approximately how many credit hours have you completed?
________ credit hours.
ID: 01
Question: 3. Effectiveness of Instructional Approaches:
|
Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material? |
|||||
|
Instructional Approaches |
Very Effective |
Generally Effective |
As Effective as Ineffective or Neither |
Generally Ineffective |
Very Ineffective |
|
a. Lectures |
|
|
|
x |
|
|
b. Guest speakers |
|
x |
|
|
|
|
c. Case studies |
x |
|
|
|
|
|
d. Videos |
|
x |
|
|
|
|
e. Class discussions |
x |
|
|
|
|
|
f. Individual student presentations |
|
|
x |
|
|
|
g. Group student presentations |
|
|
x |
|
|
|
h. Group student projects |
x |
|
|
|
|
|
i. Exams |
|
|
|
x |
|
|
j. Research papers |
x |
|
|
|
|
Question: 11. What is your present age group?
_____21 to 25 _____26 to 35 _____36 to 45 __x___ 46 or over.
Question: 12. What is your gender?
_____ Male ___x__ Female.
Question: 16. In which program are you currently enrolled?
____MPA _____MUS __x__Ph.D., ____Other Program(MS/MA) ____Not enrolled.
Question: 17. Including this semester, approximately how many credit hours have you completed?
_____36_____ credit hours.
ID: 02
Question: 3. Effectiveness of Instructional Approaches:
|
Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material? |
|||||
|
Instructional Approaches |
Very Effective |
Generally Effective |
As Effective as Ineffective or Neither |
Generally Ineffective |
Very Ineffective |
|
a. Lectures |
x |
|
|
|
|
|
b. Guest speakers |
|
x |
|
|
|
|
c. Case studies |
|
|
x |
|
|
|
d. Videos |
|
x |
|
|
|
|
e. Class discussions |
x |
|
|
|
|
|
f. Individual student presentations |
|
|
|
x |
|
|
g. Group student presentations |
|
|
|
|
x |
|
h. Group student projects |
|
|
x |
|
|
|
i. Exams |
|
x |
|
|
|
|
j. Research papers |
|
x |
|
|
|
Question: 11. What is your present age group?
__x___21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.
Question: 12. What is your gender?
___x__ Male _____ Female.
Question: 16. In which program are you currently enrolled?
__x__MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.
Question: 17. Including this semester, approximately how many credit hours have you completed?
___9_______ credit hours.
ID: 03
Question: 3. Effectiveness of Instructional Approaches:
|
Considering the USPA courses, you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material? |
|||||
|
Instructional Approaches |
Very Effective |
Generally Effective |
As Effective as Ineffective or Neither |
Generally Ineffective |
Very Ineffective |
|
a. Lectures |
x |
|
|
|
|
|
b. Guest speakers |
|
|
|
|
|
|
c. Case studies |
x |
|
|
|
|
|
d. Videos |
|
|
x |
|
|
|
e. Class discussions |
|
x |
|
|
|
|
f. Individual student presentations |
|
|
|
|
x |
|
g. Group student presentations |
|
|
|
x |
|
|
h. Group student projects |
|
|
x |
|
|
|
i. Exams |
|
|
|
x |
|
|
j. Research papers |
|
x |
|
|
|
Question: 11. What is your present age group?
_____21 to 25 _____26 to 35 __x___36 to 45 _____ 46 or over.
Question: 12. What is your gender?
_____ Male ___x__ Female.
Question: 16. In which program are you currently enrolled?
___x_MPA _____MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.
Question: 17. Including this semester, approximately how many credit hours have you completed?
_____9_____ credit hours.
ID: 04
Question: 3. Effectiveness of Instructional Approaches:
|
Considering the USPA courses you have taken to date, how effective or ineffective are each of the following instructional approaches in helping you learn the material? |
|||||
|
Instructional Approaches |
Very Effective |
Generally Effective |
As Effective as Ineffective or Neither |
Generally Ineffective |
Very Ineffective |
|
a. Lectures |
x |
|
|
|
|
|
b. Guest speakers |
|
x |
|
|
|
|
c. Case studies |
|
|
|
|
|
|
d. Videos |
|
|
|
x |
|
|
e. Class discussions |
|
|
|
|
x |
|
f. Individual student presentations |
|
|
|
|
x |
|
g. Group student presentations |
|
|
|
x |
|
|
h. Group student projects |
|
x |
|
|
|
|
i. Exams |
x |
|
|
|
|
|
j. Research papers |
x |
|
|
|
|
Question: 11. What is your present age group?
____x_21 to 25 _____26 to 35 _____36 to 45 _____ 46 or over.
Question: 12. What is your gender?
____x_ Male _____ Female.
Question: 16. In which program are you currently enrolled?
____MPA ___x__MUS ____Ph.D., ____Other Program(MS/MA) ____Not enrolled.
Question: 17. Including this semester, approximately how many credit hours have you completed?
_____120___ credit hours.
Analyzing Data: Overview
Nominal: This is data which has a name, i.e. a person=s job, their sex, the city they live. Nominal data should not be averaged.
Ordinal: This is data which has an order (low to high). Survey data is frequently ordinal data: how strongly do you agree or disagree . Gail=s advice: ordinal data should not be averaged.
Interval and Ratio: SPSS calls these Scale. These measures are real numbers, such as: age, weight, dollars. Interval and ratio data can be analyzed using the same statistical procedures.
Descriptive Analysis
A. Using SPSS to get Frequencies and Percent Distributions:
SPSS Exercise 1: What percent of the respondents from the Fall 1997 survey were male or female?
Steps:
SPSS Exercise 2: What is the frequency and percent distribution for the number of credits reported in the Fall 1997 survey?
Steps:
Output # 2:
CREDITS Credit hours completed
|
Values |
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
|
Valid |
0 |
32 |
20.5 |
20.5 |
20.5 |
|
3 |
16 |
10.3 |
10.3 |
30.8 |
|
|
6 |
16 |
10.3 |
10.3 |
41.0 |
|
|
7 |
1 |
.6 |
.6 |
41.7 |
|
|
9 |
15 |
9.6 |
9.6 |
51.3 |
|
|
12 |
12 |
7.7 |
7.7 |
59.0 |
|
|
13 |
1 |
.6 |
.6 |
59.6 |
|
|
15 |
11 |
7.1 |
7.1 |
66.7 |
|
|
18 |
9 |
5.8 |
5.8 |
72.4 |
|
|
21 |
6 |
3.8 |
3.8 |
76.3 |
|
|
22 |
2 |
1.3 |
1.3 |
77.6 |
|
|
24 |
1 |
.6 |
.6 |
78.2 |
|
|
25 |
1 |
.6 |
.6 |
78.8 |
|
|
26 |
2 |
1.3 |
1.3 |
80.1 |
|
|
27 |
3 |
1.9 |
1.9 |
82.1 |
|
|
30 |
7 |
4.5 |
4.5 |
86.5 |
|
|
33 |
4 |
2.6 |
2.6 |
89.1 |
|
|
34 |
1 |
.6 |
.6 |
89.7 |
|
|
36 |
5 |
3.2 |
3.2 |
92.9 |
|
|
39 |
2 |
1.3 |
1.3 |
94.2 |
|
|
40 |
1 |
.6 |
.6 |
94.9 |
|
|
43 |
1 |
.6 |
.6 |
95.5 |
|
|
45 |
2 |
1.3 |
1.3 |
96.8 |
|
|
49 |
2 |
1.3 |
1.3 |
98.1 |
|
|
60 |
1 |
.6 |
.6 |
98.7 |
|
|
153 |
1 |
.6 |
.6 |
99.4 |
|
|
190 |
1 |
.6 |
.6 |
100.0 |
|
|
Total |
156 |
100.0 |
100.0 |
Note:
Output # 2. It will be long because it will tell you how many people (and what percent) for each value of credit hours.
How to Interpret the Output:
The second column contains the values for the variable that were reported. In this table, it is the exact number that they reported. You might note that there are no missing values in the table. The next column is the frequency column, which shows how frequently the various credit hour was selected. For example: 32 people reported they had not completed any credit hours and 12 reported they had completed 12 credit hours. The next is the percent column. This shows the percent distribution of all the people who were included in the survey. 21% reported 0 credit hours and 8% reported 12 credit hours.
The next column shows the valid percent, that is, of the people who gave an answer. In the previous example, there were missing values. In this example, everyone answered the question, so the percent and valid percent columns are the same. The last column, the cumulative percent column, adds the percent as you got down the column. 21% reported 0 credits. 10% reported 3 credit hours, which gets added to the 21%, for a total of 31%. If we want to see where the majority are, we find 51% reported having 9 credits hours or less. Majority is always 51% or more.
Extra Exercise:
1. Are students more likely to prefer lecture classes (teach1) as compared to class discussions (teach5)?
2. Are students more likely to prefer exams (teach 9) as compared to research papers (teach 10)?
3. What percent of students would recommend the program (recommnd)?
B: Using SPSS to Get Means, Medians and Modes
Exercise 3: How would you best describe central tendency of the number of credit hours of the respondents?
Steps:
Output 3: Credit hours:Statistics
CREDITS Credit hours completed
|
N |
Valid |
156 |
|
Missing |
0 |
|
|
Mean |
15.58 |
|
|
Median |
9.00 |
|
|
Mode |
0 |
C. Using SPSS to Get the Standard Deviation
You have options for doing this. You can use the options in the frequency window, as you did above.
Exercise 4: What is the dispersion of credit hours?
Steps:
We can use a different analysis on SPSS to get the standard deviation: Descriptive command. The choice is a preference thing.
Steps:
D: The Concept of Crosstabs (contingency tables): Used with Nominal and Ordinal data.
Cross tabs are perhaps the most interesting and fun statistical tool you can use. It can be used to describe two or more variables, as well as examine relationships.
Dependant Variable: This is the variable you want to explain.
Independent Variable: This is the variable you think affects your dependant variable or causes it to change.
Using SPSS to Do Crosstabs:
Exercise 6: Suppose we want to see if gender matters in terms of extent to which respondents will recommend the program to others.
Steps:
Note: you always percentage the row or column where you place your independent variable.
Output # 6:
SEX Sex * Extent of recommendation Crosstabulation
|
Extent of Recommendation |
Total |
||||||
|
Strongly |
Generally |
Possibly |
Generally not |
||||
|
SEX Sex |
Male |
Count |
17 |
26 |
7 |
1 |
51 |
|
% within SEX Sex |
33.3% |
51.0% |
13.7% |
2.0% |
100.0% |
||
|
Female |
Count |
19 |
20 |
7 |
1 |
47 |
|
|
% within SEX Sex |
40.4% |
42.6% |
14.9% |
2.1% |
100.0% |
||
|
Total |
Count |
36 |
46 |
14 |
2 |
98 |
|
|
% within SEX Sex |
36.7% |
46.9% |
14.3% |
2.0% |
100.0% |
||
Interpreting the Output:
Of the 51 men, 33% would strongly recommend and 51% would generally recommend. Now look at the female row. How many women are there? 47. What percent of them strongly recommended the program? 40% strongly recommended and 43% generally recommended the program.
Gail=s advice: I look for at least a 10% difference between the results. In this case, I would look for at least a 10% difference before I am willing to say there is a difference in extent to which men and women would recommend this program. These results do not have a 10% difference.
Exercise 6A: Let=s Review this setting up the tables stuff:
You can set up the table differently. You can put sex in column and recomnd in the row. When you go to cells, you want to click on percentage in column, since that is where you placed the independent variable.
Steps:
Even more confusing: The computer will do whatever you ask. So what does it look like if we click on other cell options?
Exercise 7: Go back to your analysis on sex and recommendation. Go through the steps to set up the cross tab analysis, but when you go into cells, click on the options for percent: expected, row, column, total. Print this out and bring this output to class. We will go over this mondo bizarro table!!
Exercise 8:
a. Do people from different programs (program) have more or less similar in their willingness to recommend (recommnd) the program?
B. Do students vary in their preference for research papers (Teach10) and exams (teach9) based on what program (MPA, MUS or Ph.D) they are in?
E. Using SPSS to Compare Means:
In situations where your dependent variable is interval/ratio level data, and your independent variable is nominal, you can compare the means.
Exercise 9: Do men earn more than women?
To answer this question, we need to use a different data set.
You need to open a new data set: GSS91pol.sav
Steps:
Output 9
Income recoded to dollars
|
Respondent's Sex |
Mean |
N |
Std. Deviation |
Median |
|
Male |
38965.11 |
609 |
25470.86 |
32500.00 |
|
Female |
33096.23 |
756 |
25365.95 |
27500.00 |
|
Total |
35714.65 |
1365 |
25570.58 |
32500.00 |
Exercises 10: Compare means using the GSS91pol data:
a. find out if men and women watch the same amount of TV (TVhrs).
b. find out if there is a difference in the age of marriage (agewed) between men and women.
Exercise 11: Is there is a difference in income (income recoded as dollars: name incomdol) and educational degree (RS Highest Degree?
Output # 11:
Report
Income recoded to dollars
|
RS Highest Degree |
Mean |
N |
Std. Deviation |
|
Less than HS |
18021.08 |
249 |
18217.58 |
|
High school |
33188.21 |
704 |
22530.35 |
|
Junior college |
41129.31 |
87 |
22645.84 |
|
Bachelor |
49033.56 |
216 |
25542.65 |
|
Graduate |
62275.46 |
108 |
25168.22 |
|
Total |
35738.27 |
1364 |
25565.06 |
Concept of Inferential Statistics: Working with Sample Data
A. Confidence Levels and Confidence Intervals.
Sometimes we want to make estimates about the larger population based on our sample. Because we are working with a sample, there is always some amount of error. But we still want to be able to say with some degree of confidence (usually 95% confident) that the average income is at least between two points. One way to estimate averages back to the larger population is by constructing a confidence interval. Typically, you will ask the computer to tell you, with 95% confidence, what the true mean would be if we had surveyed everyone in the population.
Using SPSS to Calculate Confidence Intervals:
Exercise 12: What is the best estimate of the average salary of the population, and the 95% confidence interval, based on our sample?
Steps:
Output 12:
Descriptives
|
Statistic |
Std. Error |
||
|
Mean |
35714.65 |
692.11 |
|
|
95% Confidence Interval for Mean |
Lower Bound |
34356.94 |
|
|
Upper Bound |
37072.36 |
||
|
5% Trimmed Mean |
34682.13 |
||
|
Median |
32500.00 |
||
|
Variance |
653854367.273 |
||
|
Std. Deviation |
25570.58 |
||
|
Minimum |
500 |
||
|
Maximum |
87500 |
||
|
Range |
87000 |
||
|
Interquartile Range |
41250.00 |
||
|
Skewness |
.714 |
.066 |
|
|
Kurtosis |
-.516 |
.132 |
Exercise 13: Based on our sample:
What is our best estimate of average number of siblings in the population?
What is our best estimate of the average age that people in the population wed?
Print out and Bring to Class.
Another Analysis: You can use explore to obtain CI if you are comparing means.
Exercise 14: Suppose you want to know the means and the confidence intervals of the average salaries for men and women?
Steps:
Output 14:
Descriptives
: Income for Men and Women|
Respondent's Sex |
Statistic |
Std. Error |
||
|
Male |
Mean |
38965.11 |
1032.13 |
|
|
95% Confidence Interval for Mean |
Lower Bound |
36938.13 |
||
|
Upper Bound |
40992.08 |
|||
|
5% Trimmed Mean |
38254.70 |
|||
|
Median |
32500.00 |
|||
|
Variance |
648764713.845 |
|||
|
Std. Deviation |
25470.86 |
|||
|
Female |
Mean |
33096.23 |
922.55 |
|
|
95% Confidence Interval for Mean |
Lower Bound |
31285.16 |
||
|
Upper Bound |
34907.30 |
|||
|
5% Trimmed Mean |
31792.99 |
|||
|
Median |
27500.00 |
|||
|
Variance |
643431538.750 |
|||
|
Std. Deviation |
25365.95 |
B. Concept of Statistical Significance:
When you are working from a random sample you must figure out if somehow you got your results by chance. If the results are .05 or less, we will say the results are statistically significant and therefore results we observe are not likely to be caused by random chance.
1. Using SPSS to Do Chi-Square Tests for Nominal and Ordinal Data:
To do a chi square, you first do the cross tab analysis.
Exercise 15: Is there a statistically significant difference between men and women willingness to recommend the program.
Steps
Note: this is Chi Square for the crosstab you did earlier that answered whether or not men and women differed in there responses on recommending the program (Exercise 6, output 6). As you recall, we did not see much of a difference. The Chi Square Table looks like this:
Output 15:
Chi-Square Tests
|
Value |
df |
Asymp. Sig. (2-sided) |
|
|
Pearson Chi-Square |
.732 |
3 |
.866 |
|
Likelihood Ratio |
.733 |
3 |
.865 |
|
Linear-by-Linear Association |
.136 |
1 |
.713 |
|
N of Valid Cases |
98 |
Interpreting the Output:
Remember, we are pretending this is a random sample for a large population of students for this exercise.
The only column of importance is the last one: Sig. (2 sided). For the results to be statistically significant, the test should be .05 or less. Clearly, .866 is greater than .05. The relationship between sex and extent to which respondents recommended the program is not statistically significant.
Exercise 16: Is political outlook (Liberal/conservative) (politics) related to views on legalization of marijuana (grass)?
Open the GSS91 Pol data set. Do a cross tab, with Chi Square test for statistical significance. Remember, to click cell and percentage on wherever you place your independent variable.
2. T-Test: Testing for Statistical Significance for Interval and Ratio Data:
The T-Test is an interval statistic that determines if the means are statistically significant.
One Sample: Use when you are comparing sample results to a known population.
Paired Means: Use when two variables are paired like in a pre and post test: before and after design. You must use the same people for your results to be correct.
Independent Means: Use when you are comparing the means of two groups (for example, men and women)..
Using SPSS to Do a One Sample T-Test
Exercise 17: If you were trying to figure the number of hours worked per week and whether the results you got were pretty accurate of the population in general. Use GSS91pol.sav
One Sample T-Test: Steps:
Output 17:
One-Sample Statistics
|
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
|
HRS1 Number of Hours Worked Last Week |
900 |
41.76 |
14.63 |
.49 |
One-Sample Test
|
Test Value = 0 |
||||||
|
t |
df |
Sig. (2-tailed) |
Mean Difference |
95% Confidence Interval of the Difference |
||
|
Lower |
Upper |
|||||
|
HRS1 Number of Hours Worked Last Week |
85.636 |
899 |
.000 |
41.76 |
40.81 |
42.72 |
Using SPSS to Do an Independent Sample T-Test:
Let=s go back to our gender question about income. In prior analysis, we found that men earned more, on average, than women (Exercise 9).
Exercise 18: Are these differences in income statistically significant? We use an independent sample t-test.
Steps:
Output 18:
Group Statistics
|
Respondent's Sex |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Male |
609 |
38965.11 |
25470.86 |
1032.13 |
|
Female |
756 |
33096.23 |
25365.95 |
922.55 |
Independent Samples Test
|
Levene's Test for Equality of Variances |
t-test for Equality of Means |
||||
|
F |
Sig. |
t |
df |
Sig. (2-tailed) |
|
|
Equal variances assumed |
.474 |
.491 |
4.241 |
1363 |
.000 |
|
Equal variances not assumed |
4.239 |
1299.580 |
.000 |
Interpreting the Output:
We get two tables. The first table tells us the mean and standard deviation. The second gives us the statistical test for significance. This is presented in the last column, Sig. 2-tailed. The statistically test is less than .05. We can ignore all the rest of this stuff.
Exercise 19:
How likely are the average number of hours spent watching TV (TVhours) reported by our respondents a fairly accurate portrayal of the population as a whole?
Are there differences in the average number of hours of TV watched by men and women, and are those results statistically significant?
Analysis of Variance:
You may have noticed that the t-test grouping variable only accepts two values. When you have a variable with more than two values, such as race, religion and educational degrees, you will need to use one-way Anova (analysis of variance) to test for statistical significance.
Exercise 20: Is there a statistically significant difference in income based on educational degree (highest degree obtained)?
Steps:
Output 20:
Descriptives
Income recoded to dollars
|
N |
Mean |
Std. Deviation |
95% Confidence Interval for Mean |
Minimum |
Maximum |
||
|
Lower Bound |
Upper Bound |
||||||
|
Less than HS |
249 |
18021.08 |
18217.58 |
15747.22 |
20294.94 |
500 |
87500 |
|
High school |
704 |
33188.21 |
22530.35 |
31521.05 |
34855.37 |
500 |
87500 |
|
Junior college |
87 |
41129.31 |
22645.84 |
36302.83 |
45955.79 |
2000 |
87500 |
|
Bachelor |
216 |
49033.56 |
25542.65 |
45607.95 |
52459.18 |
500 |
87500 |
|
Graduate |
108 |
62275.46 |
25168.22 |
57474.50 |
67076.42 |
5500 |
87500 |
|
Total |
1364 |
35738.27 |
25565.06 |
34380.35 |
37096.19 |
500 |
87500 |
ANOVA
Income recoded to dollars
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
|
Between Groups |
199504503974.035 |
4 |
49876125993.509 |
98.048 |
.000 |
|
Within Groups |
691314308342.681 |
1359 |
508693383.622 |
||
|
Total |
890818812316.716 |
1363 |
VII. Relationships
Sometimes it is important to determine how strong the relationship is between variables. Measures of relationships are generally measured from zero to one, with 1 being a perfect relationship and 0 being no relationship at all. Most of the time you will see the relationship reported in a number which has a plus(+) or a minus(-) sign next to it. These signs refer to whether the relationship is an inverse(-) relationship, where as one goes up the other goes down, or a positive(+) relationship, where the variables change in the same direction. As one variable goes up so does the other or as one goes down so does the other.
Measures of Association:
Using SPSS to Measure Relationships for Nominal Data:
Exercise 21: Is one=s religious preference related to one=s veiws on programs to provide birth control to teenagers? Use the GSS91pol data set.
Steps:
Chi-Square Tests
|
Value |
df |
Asymp. Sig. (2-sided) |
|
|
Pearson Chi-Square |
32.672 |
12 |
.001 |
|
Likelihood Ratio |
35.095 |
12 |
.000 |
|
Linear-by-Linear Association |
22.588 |
1 |
.000 |
|
N of Valid Cases |
970 |
a 6 cells (30.0%) have expected count less than 5. The minimum expected count is 3.17.
Symmetric Measures
|
Value |
Approx. Sig. |
||
|
Nominal by Nominal |
Phi |
.184 |
.001 |
|
Cramer's V |
.106 |
.001 |
|
|
N of Valid Cases |
970 |
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Interpreting the Output:
The first table is our crosstab which you are expert at interpreting by now. The second table is our test for statistical significance, which you also know how to interpret. The last table is our strength of association measures. Although the mathematics behind these measures vary, the results are often very similar. The second column tells you the name of the measure of association. The Value column shows the strength of the relationship. Since both measures are not all that far from 0, this is a weak relationship. The next column tells you the statistical significance. In this analysis, the results are statistically significant.
Exercise 22:
Is there a relationship between gender and views on spanking? Is that relationship statistically significant?
Is there a relationship between religious preference and spanking? Is that relationship statistically significant?
2. Measuring Relationships for Ordinal Data:
Tau Tau B is considered to be a conservative function because it subtracts ties, therefore the results tend to be smaller. A tau b of .2 or more is respectable.
Gamma: Gamma is based upon probability. If you select on individual at random it computes what the chances of getting the same answer if you selected another at random. A gamma of .3 or more is worth examining.
Spearman=s Rho: Converts interval data to data which can be ranked (highest to lowest)
Using SPSS: Output for Relationships Using Ordinal Data
Exercise 23: What is the strength of the relationship between political ideology and legalization of marijuana?
Steps:
You will get a normal crosstabs table like the one for exercise 16. At the bottom, the computer will generate additional tables with the measures of association.
Output 23:
Symmetric Measures
|
Value |
Asymp. Std. Error |
Approx. T |
Approx. Sig. |
||
|
Ordinal by Ordinal |
Kendall's tau-b |
.210 |
.029 |
6.917 |
.000 |
|
Kendall's tau-c |
.220 |
.032 |
6.917 |
.000 |
|
|
Gamma |
.391 |
.052 |
6.917 |
.000 |
|
|
N of Valid Cases |
899 |
Using SPSS to Do Interval Level Correlations:
The most common test to determine relationships between interval level variables is Pearson=s R. The coefficient compares the data set to the ideal perfect data set and to no relationship. Then it assigns a score (0-1) depending on how close the data is to the ideal set.
Exercise 24: What is the relationship education and income? Our hypothesis is that those with more education would earn more. Both years of education and income are at least interval level data.
Steps:
Output 24:
Correlations
|
EDUC Highest Year of School Completed |
INCOMDOL Income recoded to dollars |
||
|
EDUC Highest Year of School Completed |
Pearson Correlation |
1.000 |
.460 |
|
Sig. (2-tailed) |
. |
.000 |
|
|
N |
1496 |
1362 |
|
|
INCOMDOL Income recoded to dollars |
Pearson Correlation |
.460 |
1.000 |
|
Sig. (2-tailed) |
.000 |
. |
|
|
N |
1362 |
1365 |
** Correlation is significant at the 0.01 level (2-tailed).
Regression Analysis:
When doing regression analysis you will be working with an equation. This equation will allow you to describe a data set, to estimate population parameters, to infer causality, and to forecast.
Measures of Association: R
r= multiple correlation coefficient (overall fit).
r2 = proportion of explained variation
1-r2 = proportion of unexplained variation (a.k.a. coefficient of determination).
Using SPSS to Do a Simple Regression
Exercise 25: Let=s look at education and income, using a simple regression
Steps:
Income (Incomdol) and place in dependent box
Educ (highest year of education in independent box
Output 25:
Model Summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1 |
.460 |
.212 |
.211 |
22699.57 |
a Predictors: (Constant), EDUC Highest Year of School Completed
ANOVA
|
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
|
1 |
Regression |
188426936995.327 |
1 |
188426936995.327 |
365.685 |
.000 |
|
Residual |
700768148035.877 |
1360 |
515270697.085 |
|||
|
Total |
889195085031.204 |
1361 |
a Predictors: (Constant), EDUC Highest Year of School Completed
b Dependent Variable: INCOMDOL Income recoded to dollars
Coefficients
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
|||
|
Model |
B |
Std. Error |
Beta |
|||
|
1 |
(Constant) |
-13936.587 |
2671.534 |
-5.217 |
.000 |
|
|
EDUC Highest Year of School Completed |
3802.543 |
198.848 |
.460 |
19.123 |
.000 |
a Dependent Variable: INCOMDOL Income recoded to dollars
Interpreting the Output:
We have three tables. The first table, The Model Summary, shows us how well a model explains variation in income. The R square is .21, which indicates a moderate relationship: 21% of all the variation in income is explained by education. The second table, ANOVA, tells us the statistical significance of the model. We want to look at the last column which gives us the statistical significance. In this case, it is less than .05, so the results are statistically significant. The coefficient table shows the impact of education on income. For every year of education, income goes up $3,802. This is also statistically significant.
Using SPSS to Do Multiple Regression:
To determine the regression coefficients and the constant you follow the same steps as if you are doing a bi-variate analysis. Once you have entered you dependant variable you can simply enter more than one independent variable. The computer will generate for you the coefficients the constant and the R value.
Exercise 26: What explains income?
Steps:
Output 26:
Model Summary
|
Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
1 |
.577 |
.333 |
.330 |
21048.82 |
a Predictors: (Constant), DWELOWN Homeowner or Renter, AGEWED Age When First Married, EDUC Highest Year of School Completed
ANOVA
|
Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
|
|
1 |
Regression |
160736543078.188 |
3 |
53578847692.729 |
120.931 |
.000 |
|
Residual |
321656336117.017 |
726 |
443052804.569 |
|||
|
Total |
482392879195.205 |
729 |
a Predictors: (Constant), DWELOWN Homeowner or Renter, AGEWED Age When First Married, EDUC Highest Year of School Completed
b Dependent Variable: INCOMDOL Income recoded to dollars
Coefficients
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
|||
|
Model |
B |
Std. Error |
Beta |
|||
|
(Constant) |
7174.054 |
5114.637 |
1.403 |
.161 |
||
|
EDUC Highest Year of School Completed |
3880.293 |
255.097 |
.481 |
15.211 |
.000 |
|
|
AGEWED Age When First Married |
-142.368 |
161.734 |
-.027 |
-.880 |
.379 |
|
|
DWELOWN Homeowner or Renter |
-13176.228 |
1601.745 |
-.253 |
-8.226 |
.000 |
a Dependent Variable: INCOMDOL Income recoded to dollars
Interpreting the Output:
How good is our model?
The Model Summary gives us the answer here in the R square value. This tells us how much variation in income is explained by this model. Overall, this model explains .33 of the variation, which indicates there is a moderately strong relationship here. (Note: this is higher than what we got from our simple regression, so we are on the right track here).
Is our overall model statistically significant?
The ANOVA table gives us a lot of information. The only thing of importance for us is the last column, which gives us the significance. In this case, it is less than .05, so the model is statistically significant.
What explains income?
The Coefficient Table: The unstandardized coefficients tell you how much change there will be in income for every unit change in each independent variable while holding the others constant. (Sometimes we talk about this as the independent effect of each of the variables in our equation on income). So if we look at the independent effect of education while holding age when married (age wed) and home ownership (dwelown) constant, for every year increase in years of education, we get a $3,880 increase in income. Looking at home ownership, we see a negative sign, indicating an inverse relationship. In this case, we need to know how this variable was coded (right click on the variable to get information when you are in the regression window). This variable was coded 0 if the respondent owned a home and 1 if respondent rented. This gets interpreted as we move from a home owner (0) to a renter (1), income goes down by $13,176. Because of the way the data was coded, this is an inverse relationship.
Standardized Coefficients: Beta weights: The measurement of our independent variables are not the same; we use years for education and homeowner/renter for dwelling. They are not comparable. So, the computer standardizes them so we can see which is stronger in changing the dependent variable. It is called the Standardized Coefficient (or beta or beta weights). In this case, Education has a higher Beta weight.
Other Stuff:
Managing the Output:
Deleting stuff from an Output file:
Option A: right click on the chart or table, then hit delete button.
Option B: go to the file menu on the left side of your screen. Click on the yellow file folder, and it will delete everything in that folder when you hit the delete button. Or you can click on the specific files and hit delete button.
Print an Output File
Click File C> Print
You can print the material you have selected and highlighted by clicking on selection
Or You can print the whole file (remember to deleted stuff you don=t want) by clicking All Visible Output
Helpful Hint: You can save time and ink by clicking on properties and selecting econofast print.
You can choose between Portrait (which this is) or Landscape.
Crosstabs print better using landscape.
Note: all your output gets saved to a single file until you close it (either save or delete). If you hit print, all the output done in the output session will be printed out.
Saving an Output file:
I don=t usually save output since it is easy enough to recreate. But you may want to save the output from assignments at least until after your receive feedback.
Managing the Hardcopy Output:
I tend to save the hardcopy output while I am working on a report because it maintains an audit trail as GAO folks refer to it. I label output and keep it in a looseleaf binder. I label it in a way I can find it when I write a draft report using data and I can source it (e.g. Table 1 from Run 1, pages 12-20). It makes it easier to find the data if there is a question about the analysis or if I want to have someone check that my numbers matchBaccuracy check.
You can label SPSS output before printing or saving.
Exercise: For example, you want to label the run that has information about demographics:
You can change the label of any table as well. Double clicking creates an edit box.
Moving a SPSS Table into a Word Document:
You may find it helpful to move SPSS tables into your report. You want to open a word document. You can do this my hitting the B button at the top of the SPSS menu, then open up word and open a blank file. Got it? Then minimize that and go back to SPSS.
Let=s say you want to move the table reporting sex to your word document.
Steps:
Creating Summary Tables: You cannot create a single table that shows the percent rating various instructional approaches as effective in SPSS. You will have to create the table in word or wordperfect, and type in the percent distributions.
Using Graphs to Analyze and Display Data
This section will walk you through using graphs options. Remember to use Bar or Pie charts for ordinal and nominal variables.
Exercises: Show a picture of the percent of respondents in each program.
Steps:
( A )Bar Charts:
Editing the chart: You edit the chart in the output box. Double click on the chart. This will pop up the chart editor. You can change the fill, the color, the lines, bar style, bar labels, and the direction (from vertical to horizontal). You need to click apply in the little pop boxes for each of these changes. When done, hit the X button. When done with the editor, you click the X button.
You can move the chart to a word document. In the Output, double click on the chart. It will come up as box. Right click, and select copy as object.or copy: I am not sure how much difference there is. But play with thisBthe differences may become clear.
( B )Pie Charts:
Editing the chart: Go to editor by double click. You can pull out a slice, by clicking on that slice and then selecting the button that looks like pac man; if you don=t like it, click pac man again. You can change color, fill and lines.
You can add percents to the slices. While still in the chart editor box, select Chart from the menu at top of editor box, then click options. In options, and select percent (clicking on it gives it check mark). Click on format, and change decimal places to 0, then click OK to get out of the options box..
( C ) Cluster Charts:
Note: These are the same variables as used in crosstabs. This is another way of describing those variables with charts option. Once you are familiar with the creation of charts to represent your data analysis, you can explore different options provided by SPSS application (like - line charts, area charts, stacked bar charts, scatter charts and boxplots etc., )to better describe your data graphically.
Again, you can go to the chart editor (double click on chart) and make whatever changes you would like. Remember, color does not always reproduce well in black and white, so you might find using different fill to be more effective.