What is a study design?
An empirical legal study design is a pre-research plan of action. An effective study design enables you to cost your project (e.g., time, personnel), set boundaries, articulate your evidentiary standards and research rules, and explain your research to potential faculty supporters, co-authors, institutional review boards, funding agencies, fellowship committees, and more. For help with any aspect of creating a study design for your empirical legal research project, contact the Empirical Services Librarian, Michelle Hudson.
Components of a study design
Hypothesis and/or Research Question
The core of empirical research is the hypothesis being tested or research question guiding the inquiry. In general, a good hypothesis includes specific independent and dependent variables, a verb/phrase that relates the variables to each other (e.g., X increases Y), and boundary words that include some groups/time periods/conditions, etc. and exclude others. For a review of hypothesis terminology (e.g., one-tailed), visit the Research Methods Knowledge Base (RMKB).
Research questions do not offer testable premises like hypotheses do (i.e., juxtaposed against a contrary null). Still, they should be “clear, focused, concise, complex and arguable.” Since you need to answer the research question after completing your data collection, the question needs to indicate the boundaries of what you will collect (e.g., who, what, when, and where is included/excluded). For more on research question construction, see chapter 5 of Patton’s Qualitative Research & Evaluation Methods.
No empirical study occurs in a vacuum. Previous scholarly discoveries, political and cultural happenings, and natural events provide context for new research. Typically, a quality literature review will identify a framing/guiding theory (i.e., in disciplines such as Anthropology and Communication), list and describe seminal essays, survey debates in the field, and highlight unresolved theoretical, methodological, or applied research issues.
When a literature review is found lacking, the author is assumed to be unprepared or unable to fully contribute to the discussion. A successful empirical literature review conveys the same competence as a well-shepardized brief. In that vein, a thorough literature review should surface community norms, assumptions about rules and operating procedures (e.g., acceptable significance level), and the state of affairs on a particular topic. Although the social sciences do not adhere to the principle of stare decisis, they also do not encourage radical departures from accepted rules or solid prior findings (e.g., findings that have been replicated numerous times, showcased in a systematic review, etc.). Of course, they simultaneously caution against over-reliance on prior literature. As Michael Quinn Patton explains in chapter 5 of Qualitative Research & Evaluation Methods, “Review of relevant literature can… bring focus to a study. What is already known? Unknown? What are the cutting-edge theoretical issues? Yet, reviewing the literature can present a quandary in qualitative inquiry because it [might] bias the researcher’s thinking and reduce openness to whatever emerges in the field” (p. 226). Acknowledging Patton’s caution, it is still important for newer researchers to ground themselves in the literature of their field. And as the RMKB advises: do the review [of the literature] early.
The research methodology, or how the study will unfold, is what most people think of when they hear “study design.” Only once a clear direction and boundaries have been set by the hypothesis/research question and literature review can the researcher describe what s/he plans to do. The research methodology flows logically from the prior parts of the study design and prescribes specific goals, foci, and activities of the study’s broader parameters. The methodology section is divided up differently depending upon the study design purpose/template. The following are typical components of the “how the research will get done…” section of the study design.
Unit of analysis
The unit of analysis is the focal segment of study. Supposing that “the known universe” is the largest possible unit of study and a given sub-atomic particle is the smallest, a typical social science unit of analysis is somewhere in between. Units of analysis include: inter-state region, nation-state, province/state, metropolitan area, community/neighborhood, group, family, dyad, individual, etc. A researcher may be interested in a phenomenon that cuts across several units, such as a city-neighborhood-family issue. Selecting a focal unit of analysis clarifies (and sometimes provokes re-writing of) the research hypothesis or question. It guides data collection or data searching and introduces contextual literature review research topics. For more on units of analysis, see chapter 5 of Patton’s Qualitative Research & Evaluation Methods.
Description of population
Once a unit of analysis has been selected, the researcher needs to describe who or what is in that unit. This can include geographic markers (e.g., census tracts 1413, 1414, 1415, 1418), demographic statistics (e.g., 76% under the age of 25), political and cultural indicators (e.g., 1 synagogue and 2 Christian churches within study area), etc. The more thorough the population description, the better the researcher can analyze the representativeness of a sample of that unit, if the unit is large enough to sample. A gross rule of thumb is that if there are fewer than 100 people or items in a unit, it might be better to try to capture data from all or most people/items rather than to sample.
When a population is sufficiently large, researchers usually select a sample of that population to study. If the sample is selected randomly and near-optimal data collection methods are followed, then the researcher can use statistical techniques to infer information about the broader population. This set of circumstances is a basis of inferential statistics. While most of us know the ideal conditions for sampling (e.g., each participant has an equal chance of being selected, most of them opt to participate, etc.), sampling is perhaps the messiest part of inferential empirical work. The sampling plan/techniques portion of a study design should thus read like a best case/worst case handbook. It should announce the researcher’s aims, describe impediments to those aims, and detail and justify workarounds. For example:
- Sampling aim: Randomly select 20% of New Haven city residents for telephone survey
- Issue: No master list of New Haven residents exists
- Sub-issue: “White pages” telephone listing is skewed older, with a median age of X (whereas New Haven’s median age is Y, according to recent census)
- Sub-issue: Voter registration lists are skewed wealthy, with a median income of…
- Work-around 1: Select a certain number of people (n) from telephone listing by using every 9th number (i.e., “systematic sampling”), because…
- Work-around 2: Select a random sample of 100 voters from the voter rolls, because…
- Work-around 3: Analyze two prior samples and identify underrepresented groups; implement nonprobability sampling strategy Z, because…
- Sampling is far too complex to unpack fully in this guide. For definitions of key sampling terms, see the RMKB. See also chapter 6 of Lawless et al.’s Empirical Methods in Law
Data collection procedures
Once a plan for sampling has been established, the researcher can describe how research data will be collected from the individuals or groups in the sample. Data can be collected in myriad ways, including: surveys, interviews, focus groups, participatory activities, observation, etc. Often, specific research instruments will be designed to collect the data (e.g., interview script) and attached as appendices to the study design. Like every other part of the study design, data collection procedures and instruments need to map onto the hypothesis or research question. Most research methods textbooks contain several chapters on data collection procedures. See chapters 3-5 of Lawless et al.’s Empirical Methods in Law and chapters 5-7 of Patton’s Qualitative Research & Evaluation Methods.
Within the researcher’s discussion of data collection procedures, the issue of consent must also be addressed. When data is being collected directly from participants (e.g., via survey, interview) for the first time, the researcher must assess the participants’ willingness and ability to voluntarily consent to participation. This subject is discussed at length throughout Patton’s Qualitative Research & Evaluation Methods. A researcher needs to devote a portion of the study design to explaining how they will assess and record voluntary participation. Often, the researcher will create scripts and/or forms to aid in gaining and recording voluntary consent. Consent form templates are available from Yale’s Human Research website.
Data storage procedures
Data “storage” starts at the sampling stage and concludes years after the research is completed. Data about potential participants (e.g., telephone numbers), from participants (e.g., a response to a question), and related to the research (e.g., names of research assistants) all need to be stored in a reliable and ethical manner. In terms of reliability, the data needs to be consistently available to the researcher nearly on-demand and in a readable and useable format. This aspect of data storage requires conscious planning on the part of the researcher, and investments in hardware and software technology ranging from locks for file cabinets to the updating of Stata dictionary files. Funding agencies such as the NIH and NSF require researchers to include formal data management plans in their grant proposals, and Yale enacted its Research Data & Materials Policy in 2017.
In addition to the technical aspects of data management planning, research data needs to be handled in ways that safeguard potential and actual research participants from harm. Data storage ethics concentrate on two concerns: anonymity and confidentiality. Anonymity shields the identity of the participant from the researchers and/or readers of the study results. Confidentiality safeguards the participants’ data (e.g., answers to questions) from parties outside of the research process. Anonymity is like a veil, confidentiality is like a lock. Some studies provide both, nearly every study aims at confidentiality. These safeguards become irrelevant only once potentially compromising data has been destroyed, typically years after the study has concluded. For more on these topics, see the RMKB.
Data processing/analysis procedures
Now that the researcher has constructed a hypothesis or research question, identified sampling strategies, addressed the protection of human subjects, etc., the researcher can complete the data processing procedures sections of the study design. Essentially, these sections lay out a pre-plan for data analysis. They can include proposed statistical operations and/or qualitative analysis procedures (e.g., thematic analysis), as well as technical specifications related to those procedures (e.g., significance desired). Writing this part of the study design can stimulate refinement of other parts of the study design. For instance, once a researcher realizes that they want to employ a particular statistical test, they might revisit the level of measurement (e.g., nominal) of a particular survey item. A UCLA page entitled “What statistical analysis should I use?” [and how do I do that analysis in SAS, Stata, SPSS, and R] suggests some of the measurement level refinement that might result from the writing of this final section of the study design. For more on data processing/analysis, see chapters 7-13 of Lawless et al.’s Empirical Methods in Law and chapters 8-9 of Patton’s Qualitative Research & Evaluation Methods.
Scott Matheson contributed resources for this guide, which was originally written by Sarah Ryan and has been updated and maintained by Michelle Hudson.