Excel is not a statistical package. Despite this, Excel is likely the software most often used for data analysis. So, even though we do not advocate its use, here are some Excel tips. We offer these tips in the hopes that they make it easy to move your data out of Excel and into a statistical package appropriate to your analysis. Hopefully, you will find them worthwhile, even if only in a "know thine enemy" kind of way.
Looking at Data in Excel: Freeze Panes, Splits, and Hides
Excel has rows and columns, not variables. If your variables have names other than the column headings (A, B, C), you are not able to see the names as you scroll through your data. Nor can you easily look at more than about 50 consecutive rows and 15 consecutive columns at a time. However, Excel does include options that make it easier to view your data as you wish to view it.
Freeze Pane allows you to pick a cell and freeze the rows above the cell and the columns to the left of the cell. Then, you can see row headings while scrolling down and row IDs as you scroll across your dataset. You can also use Freeze Panes if you wish to have certain columns or rows side by side for easy comparison, even though the columns or rows are not side by side in your dataset.
To use Freeze Panes, select it from the Window drop-down list. To unfreeze panes, select Unfreeze Panes from the same list.
Split is a similar command. It divides your Excel window into four parts, and you can scroll down or across two of the four parts at a time. To split your spreadsheet, select a cell and choose Split from the Window drop down.
Another option that can make viewing data in Excel easier is the Hide option for rows and columns. Simply select any rows or columns that are, for the time being, getting in the way. Then choose Row or Column from the Format menu, and select Hide.
Very Special Pasting
A surprising number of Excel’s more subtle tricks use the Paste Special option after "copying" data. In Excel, if you wish to use a formula or a format from a certain cell repeatedly, you can do so using Copy and Paste Special. If you’ve calculated values using a formula, and you want to copy the values (and not the formulas) into other cells, you can use Paste Special. Paste Special can even be used to transpose data or (in a weird, way) apply mathematical operations to a set of cells.
The full set of options can be seen by selecting Paste Special from the Edit menu after selecting and copying cells.
Subsetting in Excel: Data Filtering
AutoFilter is a function that allows you to view and subset your data easily and quickly without making any permanent changes to your data. You can filter your data by selecting your worksheet or the portion of your worksheet of interest. The first row is assumed to contain headers, and drop-down menus are created for each column containing the values within the column. Using these menus you can temporarily subset your data. There are options that allow for somewhat complex filtering.
To filter your data in this way, choose Filter, then AutoFilter from the Data menu.
AutoFilter is also a useful tool for: identifying outliers or invalid values in certain variables, finding "subtotal" rows that may have been inserted into a long dataset, identifying the most common values that appear in a given column, identifying missing values.
User Written Formulas and Excel Functions
If you wish to create a new variable, conduct a check of an existing variable, or do any calculations based on other cells in Excel, you will find yourself writing formulas, using Excel functions, or doing both at once. To start, type "=" into a cell. This is necessary to tell Excel that you will be entering a formula or a function. You can type in a formula, referring to other cells by their column/row coordinate (A3, for example) and making use of Excels built in functions. These are worth exploring. Some are more useful than others. The Count functions are particularly nice.
"Programming" in Excel
If there is something that you will need to do repeatedly in Excel, it is possible to save these steps in a macro to be used as needed. For instance, you may want to use the same specific formatting for all tables you create in Excel with certain information in the header and footer above and below the table. The easiest way to create a macro in Excel is to select Macro from the Tools menu, and then Record New Macro. Once you’ve started "recording", execute the steps that you want tracked and will want to be repeated. When you’re done, click on the Stop Recording button on the macro window on the screen.
This might be useful if you have data arranged in strange, but consistent, ways and you need to put you data into a more comprehensible form.
Separating Variables in Excel
A common issue in Excel seems to be multiple variables in one column. Perhaps the best way to address this is the Text to Columns function. This allows you to specify how you would like to divide the contents in a given cell into multiple cells, and previews how your splitting strategy will work. This function can be found under the Data menu.
Common Problems in Excel
Assuming that we are looking at Excel primarily as a data vessel, ultimately aiming to get the data out of Excel and into a package appropriate for the given analysis, there are some common obstacles to moving data cleanly out of Excel.
1. Subtotal Rows – Excel makes it very easy to add subtotal rows to a dataset: sort on one variable, and then add a line between group breaks with subtotals. While this may be very useful for checking your data, it adds some very strange-looking observations to your dataset that can skew your analysis if they are not recognized and removed. If you are looking at an original dataset that has not been copied and pasted and passed around, these can be removed by selecting Data, then Subtotals, and then clicking the Remove All button. If you are not sure you are looking at the original dataset, AutoFilter can be good for finding such lines.
2. Pivot Tables – These are commonly used to summarize data. They are never a good thing when you are interested in transferring data from Excel to another package. If you are aiming to transfer data from Excel to another package, you will want the information that underlies the pivot table rather than the pivot table. This underlying data is usually in another tab in the same workbook.
3. Grand Totals – This is another variety of summary row that can look, at a glance, like another observation, but can have huge, misleading effects on results.
4. Data on Multiple Tabs – There are copy and paste methods and macro methods that are not perfect, but will often work. Using StatTransfer, it is likely easier to export each tab as a seperate dataset and then stack/merge them in SAS/Stata/SPSS. Before doing this, however, it is worth making sure that all of the worksheets in your file have the same column formatting. This can be done by formatting one worksheet in the way you would like to see on all others, and then using Paste Special to paste the formatting into all the other worksheets.
5. Bad Variable Names – Excel does not impose any rules on the text in a cell, so often the header row that one would like to use as the variable names contains spaces or names that begin with numbers that will be invalid in a statistical software. Spaces in the names can be removed with a find/replace on the header row.