‘Tis the season for spring flowers and… surveys, apparently. Many groups are doing surveys at YLS right now. If you’ve got a batch of data, and it’s been awhile since you’ve crunched numbers, here are some preliminary questions to consider…
1. Do you need to perform advanced statistical testing? If a pie chart will compellingly convey a key proportion to your audience (e.g., 1/3 of YLS students prefer X), or a bar chart will capture your story perfectly (e.g., X spent on category 1 items, Y spent on category 2 items), then you might want to churn out your results quickly to keep the discussion going. Excel can produce excellent pie/proportion, bar/categorical, and other charts.
Sometimes simple charts and summary statistics (34%, 2 of 3) won’t satisfy your audience. If that’s the case, you should ask…
2. What statistical work will my data permit? Different data “measurement levels” permit different statistical tests. There are three key “measurement levels”:
-Nominal/categorical: nominal variables use numbers as place-holders but the numbers themselves do not indicate true numerical differences. Race and gender are nominal. If males=1, females=2, other=3, that doesn’t mean that females are double some aspect of males. Rather, you simply see the gender of each person recorded in the gender column via the number 1, 2, or 3. The statistical possibilities with nominal data are quite limited and boil down to little more than counting or estimating how many should be in a count (chi-square).
-Ordinal: ordinal variables use numbers to record vague degrees of difference. A typical ordinal scale is: 5=Extremely likely, 4=Likely, 3=Neutral, 2=Unlikely, 1=Extremely unlikely. Logically, a 5 is considerably more likely than a 2, but how much more? 300%? 1000%? We cannot truly say, but unlike nominal data, we can at least calculate some degree of more-ness. The statistical possibilities with ordinal data are more robust than with nominal data, but interval-ratio data gives us the most choice among statistical tests…
-Interval-ratio: interval-ratio variables use numbers to record discrete and meaningful differences. A person that makes $75,000 makes $5,000 more than a person who makes $70,000, for instance. We can do many more types of quantitative slicing and dicing with this sort of data.
To decide which statistical test(s) you can use, you need to determine the measurement level of your independent variable(s) (influencer, trigger, pusher) and your dependent variable(s) (outcome, variable you really care about). Then, consult this wonderful chart from UCLA: https://stats.idre.ucla.edu/
Here’s an example:
Hypothesis: 2L students of color will predict that they are more likely to face discrimination during the 2L summer hiring process than Caucasian students.
Independent variable: Race (nominal, see survey item below)
Dependent variable: self-reported likely discrimination (ordinal, see below)
Race: What is your primary self-identified racial classification?
5=None of the above categories
Discrimination: Given your 1L experience, how likely are you to face discrimination during this hiring season (for your 2L summer)?
The questions, Do you need to perform advanced statistical testing? and What statistical work will my data permit? are the tip of the iceberg. But, they’re a great place to start!
After answering those two questions, consider discussing further questions and issues with Sarah Ryan, Head of Empirical Legal Research Services at YLS, or the consultants at the university’s Statlab (they’re here late!). Also see the Empirical Legal Research section of our website. And happy crunching!
Image courtesy of Yale digital image collection