In this presentation, we will discuss the use of Survey Monkey to collect data online. We will then show how to use SPSS to analyze the data.
When starting a project such as this, it is often best to think “backwards”. For example, what conclusions do you wish to draw from this research? Once you have a list of conclusions, consider each conclusion in turn and think about what type of analysis you would need to draw that conclusion. Next, think about variable or variables that would be needed to run that analysis. Finally, consider how to write the question or questions that will provide the data needed for the analysis. As you can see, you are working backwards from the conclusions to the questions that will appear on the online survey.
There are many online services one could use for collecting survey data. We will illustrate the use of Survey Monkey, but we do not mean to suggest that it should be preferred over any other online survey service. In the same vein, we are analyzing the data with SPSS, but any statistical package could be used.
Let’s start off with a few comments about Survey Monkey.
The first thing that you will notice when you visit the Survey Monkey website is that there are different levels of service. Please visit the Survey Monkey website to see the current list of available services. Most students, and many researchers, are quickly drawn to the free service, because that is what fits easily into their budget. However, you need to carefully read what is available with the free service. You also need to pay very careful attention to what is not included in the free service. What is available as part of the free service changes over time, so please check the website for the most current information. At the time of this writing, one of the things not included in the free service is the ability to download the data you have collected. This means that you can do some analyses using Survey Monkey’s analysis tools, but you cannot save the data file to your computer to use in your own statistical package. For some purposes, that may be fine. However, for most researchers, this is a serious limitation. Additionally, with the free service you are limited in the number of items that you can have on your questionnaire and in the number of respondents that you can have. There are other limitations that may be more or less important to you. The point is that you should fully understand what you can and cannot do with each level of service so that you can make the best choice for your research project.
At the time of this writing, there are approximately 15 different question types available, including forced choice, multiple choice and open-ended. You can make some questions required, and you can have responses validated. You can have random assignment of surveys to respondents (A/B testing), and you can have skip patterns (e.g., if a respondent is male, he would skip questions regarding pregnancy and giving birth).
Now that we have done that, let’s talk about the questions that we hope our questionnaire will answer for us. In my example (which I admit is rather silly), I want to gather information to help answer the following questions:
- Do study habits differ between males and females?
- Do study habits change over time?
- Do study habits vary by major?
While these questions may seem straight forward, we will need to have very specific definitions of “differ”, “change” and “vary”. For example, with respect to the first question, when we ask “Do study habits differ between males and females?”, we need to be specific about how much difference is a meaningful difference to us. We also need to be specific about which aspect of study habits we are seeking a difference. Clearly we cannot ask about every aspect of study habits, so we need to have a fairly narrow definition so that we can include the right question or questions on the survey.
I have written a short, 10-item questionnaire on Survey Monkey. In this questionnaire I tried to use as many different types of questions as possible while still writing a reasonable questionnaire. I also tried to include a few problematic elements so that I can illustrate the difficulties that can arise.
Here is the questionnaire:
1. What is your gender?
2. What year of school are you in? (Enter a number) – Open-Ended Response
3. What is your major? – Open-Ended Response
4. Did you attend another college/university before coming to UCLA?
5. How do you study? (Rate each Always Often Rarely Never)
Read/rewrite notes from class
Listen to a recording of the lecture
Find more information on the internet
Read text assigned in class
Read text suggested in class
Work practice problems
Do assigned homework
Hire a tutor
Attend study sessions lead by the TA
Study with a friend or classmate
Study in a group
Get tips from someone who took the class in a previous quarter
Other (please specify)
6. On average, how many hours do you study for each class you take?
0-5 6-10 11-15 16-20 More than 20 7. Please tell us where you prefer to study and where you actually do study. (Choose one response per column) – Prefer to study here; usually study here
Home
Library
Park/garden
Donut shop
Other people’s homes/apartments
Common room in dorms
Coffee shop
other
8. What kind of noise level do you prefer when studying? Please rank your answers from 1 to 4. Absolute silence
A little background noise (such as soft music)
A moderate amount of background noise (such as a TV)
A lot of background noise (such as a party or concert)
The amount of background noise does not matter to me.
9. At what time of the day do you do your most effective studying? (Choose all that apply)
Any time that I have time to study
Morning Afternoon
Evening
Night
10. If you could change one thing to make your studying more effective, what would it be? – Open-Ended Response
Before we discuss my silly little questionnaire, here are a few things to watch for as your write your questionnaire:
- Keep the number of questions reasonable. The longer the questionnaire is, the more likely respondents will quit before answering all of the questions.
- The order of the questions on your questionnaire matters. You may want to start with interesting questions (rather than, say, demographic questions, which respondents may find boring). Questions that respondents may not wish to answer, such as questions regarding income, should probably come at the end of the questionnaire.
- Do not make too many of the questions required. If a respondent wants to skip a question that you have made required, the respondent may simply stop answering the questions on the questionnaire entirely.
- Make sure that an exhaustive list of options is available. This does not mean including a list of dozens of options for each question. Rather, everyone who responds to your questionnaire should have an accurate response to select. For example, you can include the options of interest to you, and then include “other” or “don’t know” as an option. These can be a catch-all for options between which you do not need to distinguish. Here is a silly example to illustrate this point. Let’s say that your questionnaire is about animal companions. You may ask: Is your animal companion a cat or a dog? If my animal companion is a bird, which response should I select? In this case, you could add “other” to the list of possible responses if you were not interested in distinguishing between other types of animal companions. While this seems like an obvious example, there are occasions when the error is not so obvious. Let’s say that you wanted to ask about respondents’ age. You offer five categories: less than 20, 20 to 30, 30 to 40, 50 and older. If I am 30 years old, which response option should I select? If I am 45 years old, which response option should I select?
- Remember the distinction between having an opinion and the intensity of that opinion. For example, you may ask respondents if they like ice cream. Let’s say that half of the respondents like ice cream and half do not. From this, you may start to draw conclusions regarding the popularity of ice cream. However, if you additionally asked about the intensity of the opinion, you may find that the respondents who like ice cream like it a whole lot, while those who do not like ice cream tend to be rather mild in their dislike. Further, a distinction can be made between intensity and extremity. A respondent may hold an extreme opinion with little intensity, while another respondent may hold a middle-of-the-road opinion with great fervor.
- Check the wording of each question on your questionnaire. To the greatest extent possible, you want to minimize the possibility that some respondents will misunderstand what you are asking or understand it in a manner other than you intend. Remember that people from different generations may have different ways of expressing their opinions, and that the same wording may have a very different meaning to persons of different generations.
- Pilot your questionnaire. Ask some friends to take your survey and give you feedback. Next, export the data from Survey Monkey into SPSS. You want to be certain that this process will go smoothly before you start collecting your real data. You may need to pilot your survey multiple times before you get it “just right”. Take the time to do this.
- Be careful with open-ended questions. Questions of this type often require lots of coding when data collection is complete. Think about the coding scheme before you start collecting data. Open-ended questions also tend to have more missing data than closed-ended questions (the type of question for which respondents choose from a list of possible responses). This can be problematic if you have only one open-ended question to address a hypothesis.
- Do not change questions on your questionnaire while you are collecting data. Even if the response options do not change, the interpretation of the results will change. The time to change, alter and rewrite questions is after you pilot your questionnaire and before you start to collect your data. Remember that you may need to pilot your questionnaire more than once. While this can be frustrating, it is usually much less frustrating than needing to exclude the respondents who responded to the earlier version of the question. (In other words, the responses to the earlier form of the question are usually set to missing.) Another reason not to change questions while you are collecting data is that sometimes how you phrase one question influences responses on a later question.
- Consider how you will handle missing data. Not all respondents will answer all of the questions on your questionnaire. The resulting missing data can complicate the analysis of the data. If you have only a few respondents to your questionnaire, then your options for handling missing data are relatively few. If you have thousands of respondents, then you have more options, but each of these will take time and careful consideration. If you are planning on using multiple imputation to handle your missing data, you may need to add some questions to your questionnaire to use as auxiliary variables in the multiple imputation.
There are many good books on the topic of writing good questionnaire items. Most of these texts give examples of mistakes to avoid and best practices to follow. If this is your first time writing a questionnaire, I suggest that you read through some of these before you pilot your questionnaire. If you have already put one or more questionnaires in the field, browsing the latest and greatest findings in this area is never a bad idea.
Let’s talk about a few more issues that need to be considered before you can start collecting your data.
Other issues
Coverage
Coverage refers to the percentage of the population of interest that is included in the sampling frame. Let’s return to our idea of working backwards. You want to generalize the results from your questionnaire to a specific population, such as adults in the United States in the current year or the students at a particular university in a particular year. Clearly, you cannot get every member of the population to complete your questionnaire, so you select some of the members of the population to complete your questionnaire. The sampling frame is the list of members of the population. In reality, you will probably never have a complete sampling frame. However, that does not mean that you can’t have good coverage. Having good coverage means that you have a reasonable number of respondents from each segment of the population of interest. For example, you would not have good coverage if your population of interest was adults in California in 2013 and only 5% of your sample was female.
Formal sample versus a convenience sample
In a formal sample, you would carefully define your population of interest and then develop a sampling frame. You would send your questionnaire to selected respondents included in the sampling frame. This requires a lot of time, a lot of work, and you may not have the information that you need to do this. Instead, many researchers use a convenience sample. A convenience sample is just what it sounds like: a convenient sample of respondents. As you might guess, there can be some difficulties with a convenience sample. One potential difficulty could be poor coverage; for example, you may not get as many female respondents as you would like. In fact, convenience samples are rarely representative of the population of interest. Because of this, caution needs to be used when generalizing the results beyond those particular respondents. Some researchers try to develop weights to correct for the differences between their sample and known population characteristics, but creating these weights presents its own set of difficulties.
Human subjects’ approval
Most researchers (and all researchers from UCLA) need to have approval for their survey before they can begin collecting data. Please see the UCLA Office of the Human Research Protection Program website for more information.
Now, after lots of thinking, planning, writing and rewriting questions, developing an analysis plan, obtaining human subjects’ approval, piloting the questionnaire, making necessary modifications and collecting data, the time has finally come to export the data file from Survey Monkey to your computer. In other words, it is time again to make a decision.
Exporting data
There are several options for exporting data from Survey Monkey. My suggestion is to export the data in several different formats, as some formats will require less preparation for analysis than other formats. If you are able to export the data in SPSS format, then most of the work will be done for you (except variable names for your survey items). Also, you will almost always want to export the individual rows of data rather than the summary data. If you want summaries of your data, you can create them from the individual rows of data. However, you may not be able to do all of the analyses that you want from the summary data that you download from Survey Monkey, and there is no way to get back to the individual rows of data if you downloaded only the summary data.
Below we will show four ways that you could export the individual rows of data. No matter which of these four methods of data extraction you use, you will probably want to rename the variables. The SPSS command rename variables can be used in a few different ways to rename variables, and you can rename multiple variables in a single call to the command. Also, many of the variables come into SPSS as string variables (even if the variable contains numbers), so you will need to convert those to numeric variables to use them in analyses such as ANOVA and regression. Again, there are many ways to do this in SPSS. After you have renamed your variables and converted them into numeric form, you may want to recode or collapse some variables. We will show some examples of this. Also, if a variable has no responses, it will be a numeric variable. We will see an example of this (in question 7, for example).
Let’s see a few examples of some of the commands used most commonly to prepare the data for analysis. Please remember that all commands in SPSS MUST end in a period. Also, think about how you would prefer to organize your SPSS syntax file. One way to organize the file is to work with each variable in turn. For example, you might rename, make a numeric version of the variable, add a variable label and then a value label to the first variable, and then go through the same process for the second variable, and so on. Another way is to do each type of task for each variable. For example, you might recode all of the variables in one section, and then add value labels in another section. You may find a different way to organize your syntax file. How you organize the file doesn’t really matter, but having some form of organization does matter. This will help ensure that you don’t forget to do something. It will also make the file more understandable if you need to share it with someone else. You will also want to add comments to the syntax file. You can do this either by using the comment command or by starting the line with an asterisk. As will all other commands in SPSS, you need to end your comment with a period. In my experience, these types of files tend to get “inherited”, meaning that the person who wrote the syntax file is no longer working on the project and someone else needs to step in and take over. You should try to write a syntax file that you would want to inherit.
The get file command
I would suggest that this be the first command in your syntax file. It is one way to ensure that you are running your syntax on the correct data file. Here is an example of the command:
get file = "D:datasurvey monkey 2013spreadsheet_condensed_actual_values.sav".
Although sometimes not technically necessary, I would suggest that you enclose the path specification and data file name in quotes. This is necessary if you have blanks in a folder name or the data file name.
The get data command
The get data command is used to import data into SPSS. For example, you would use this command if you were trying to import data in an Excel file into SPSS.
get data /type = xlsx /file = "d:dataSurvey Monkey 2013Sheet_1_export_0.xlsx" /sheet = name "Sheet_1_export_0" /cellrange = full /readnames = on.
The save command
Once you have finished your data cleaning tasks, I suggest that you save your dataset with a new name. When you do your analyses, you can open and use this dataset rather than re-running all of the data cleaning commands that are necessary to transform your raw dataset into an analysis-ready dataset. Also, it is possible that your analysis-ready dataset may contain only the numeric versions of your variables. Here is an example of this command:
save outfile "D:datasurvey monkey 2013numerical_values_cleaned.sav".
The rename variables command
The rename variables command does exactly what you think it should do: it renames variables.
When Survey Monkey names variables, it often uses the first part of the question. Here are a few examples from my survey: WhatyearofschoolareyouinEnteranumber, DidyouattendanothercollegeuniversitybeforecomingtoUCLA, Pleasetelluswhereyouprefertostudyandwhereyouactually, and WhatkindofnoiseleveldoyoupreferwhenstudyingPleasera. Some of the variable names are more than 50 characters long. Now you can see why you would want to rename your variables! You have some choices as to how you write your rename variables syntax, as well as how you organize your syntax file. One possibility is that you rename each of your variables one at a time. If you do this, you may want to do any other operations regarding that variable next, so that your syntax file is organized by variables. Alternatively, you could rename all of your variables in a single call to the rename variables command.
rename variables whatisyourgender = gender. rename variables (WhatyearofschoolareyouinEnteranumberOpenEndedRe WhatisyourmajorOpenEndedResponse DidyouattendanothercollegeuniversitybeforecomingtoUCLA = yearschool major other_univ_s).
The compute command
The compute command is one of the commands that can be used to create a new variable. When cleaning the data, you can use the compute command to create a numeric version of a string variable, or you can use it to create a collapsed version of a multi-category variable. Here are some examples:
compute other_univ = $sysmis. if other_univ_s = "No" other_univ = 0. if other_univ_s = "Yes" other_univ = 1. variable labels other_univ "Did you attend another college or university before coming to UCLA?". value labels other_univ 0 "no" 1 "yes". crosstabs /tables other_univ_s by other_univ.
compute q9total = sum(q9any to q9night). freq var = q9total.
The if command
The if command is another command that you can use to create a new variable. You can also use the if command to recode the value of one variable based on the value of another variable. This command can also be used to create a numeric version of a string variable.
compute female = $sysmis. if gender = "Female" female = 1. if gender = "Male" female = 0. value labels female 0 "male" 1 "female". crosstabs /tables gender by female.
The recode command
The recode command is used, of course, to recode variables. It can also be used to create a new variable (with the into keyword) and convert certain string variables into numeric variables (with the convert keyword).
recode q8silence_s q8little_s q8moderate_s q8lots_s q8nomatter_s (convert) into q8silence q8little q8moderate q8lots q8nomatter. exe.
The autorecode command
The autorecode command does exactly what its name suggests: it automatically recodes variables. Sometimes this command is very useful, and other times it produces undesired (or unexpected) results. Care should be taken when one or more of the variables has missing values.
autorecode q8silence_s to q8nomatter_s /into q8silence q8little q8moderate q8lots q8nomatter /group.
The crosstabs command
The crosstabs command is very useful for ensuring that the recoding of a variable went as planned. You can have one or more tables subcommands in your call to the crosstabs command. I strongly recommend that all recoded variables be checked against the original variable to be certain that the recode worked as intended. I understand that this can quickly become tedious, but making a mistake when recoding variables can cause lots of problems when you use that recoded variable in analyses, and at that point, the error may be quite difficult to uncover. Also, if you eliminate the original version of the variable from your dataset before you checked the recode, you may have considerable trouble finding the error.
crosstabs /tables q8silence_s by q8silence /tables q8little_s by q8little /tables q8moderate_s by q8moderate /tables q8lots_s by q8lots /tables q8nomatter_s by q8nomatter.
The alter type command
The alter type command has at least three purposes. It can change some string variables into numeric variables, it can alter then length of string variables, and it can change the format of numeric variables.
alter type q8silence_s q8little_s q8moderate_s q8lots_s q8nomatter_s (a2).
alter type yearschool (f2.0).
The value labels command
The value labels command associates descriptive text to the values of categorical variables. This is very useful, because it reminds you what the values of the variable mean. The descriptive text is also given in output involving the variable, which often makes the output easier to interpret.
value labels q7home to q7coffee 1 "Prefer to study here:" 2 "Usually study here:".
The variable labels command
The variable labels command allows you to associate descriptive text to a variable. For example, you may make the actual question from your questionnaire the variable label.
variable labels female "gender of respondent".
The delete variables command
The delete variables command does exactly what it says it does: it deletes variables from your dataset. You may find that some of the variables in your dataset have nothing but missing values; in other words, they are useless. You can use the delete variables command to remove these variables from the dataset. In general, we do not suggest that researchers remove string variables from the dataset once a numeric version of them has been created. Rather, we suggest that after all of the data cleaning has been done, that dataset gets saved. Then you can make a copy of that dataset and remove unneeded string variables from it. This will result in a dataset that is cleaned and rid of unnecessary variables; in other words, a dataset that is ready for analysis.
delete variables collectorid startdate enddate ipaddress emailaddress firstname lastname customdata.
The document command
The document command can be used to associate text with a dataset. You can use the add document command to make additional notes as needed. These commands are very useful for keeping important information with the dataset, as opposed to writing the important information in a notebook that may get separated from the dataset.
document These data were collected via Survey Monkey at www.surveymonkey.com/s/z3bxppp. Questions 2 and 9 are problematic. Some response options from question 5 and question 7 should be replaced. These data were extracted using the advanced spreadsheet option.
The SPSS syntax files
Below are the SPSS syntax files that I wrote to prepare the data for analysis. Although these files have an .sps extension, they are simply text files. This means that you do not need SPSS to open the files; you can open them with any text editor, such NotePad or WordPad. If you look at each of these files, you will notice that when the data were extracted using “SPSS format”, the least amount of data cleaning was necessary. The other extraction methods required more steps to prepare the data for analysis. Furthermore, there are many differences between the different types of extraction in terms of what needed to be done to prepare the data for analysis.
Advanced spreadsheet
If you download the data using this method, most of the variables will be string variables and the values will be words associated with the options. You can access the SPSS syntax used to clean the data for the example questionnaire used in this workshop here. The original comma-separated values file is here.
Actual values
If you download the data using this method, most of the variables will be string variables and the values will be words associated with the options. You can access the SPSS syntax used to clean the data for the example questionnaire used in this workshop here. The original comma-separated values file is here.
Numerical values
If you download the data using this method, most of the variables will be numeric variables and the values will be numbers associated with the options. You will not know from the information in the dataset which value labels should be associated with each numeric value, but you can probably get that information from the questionnaire. You can access the SPSS syntax used to clean the data for the example questionnaire used in this workshop here. The original comma-separated values file is here.
SPSS format
If you download the data using this method, most of the variables will be numeric variables and the value labels will be correctly associated with the numeric values. Also, most of the variables will have variable labels. The amount of data cleaning needed should be much less than with the other possible methods of downloading the data. You can access the SPSS syntax used to clean the data for the example questionnaire used in this workshop here. The original SPSS data file is here.
Improvements that could (should!) be made to this questionnaire
Notice that there is a problem with the question about the year in school. It is a good thing that we caught this error in our pilot testing. Also, question 9 about when people study is difficult to analyze because we specified it as “choose all that apply”. While in theory we may want to know all of the times that people study, we need to consider how we are going to analyze that variable if we allow respondents to choose multiple responses. The point is that piloting your questionnaire is important for working out problems with items, but it also allows you to discover potential difficulties with analyses. Remember that you want to know how you are going to analyze data before you collect them.