Designing Empirical Research

Week 5

 

Use the Handouts: Research Design and Research Soundness

 

RESEARCH DESIGN--How good is it?

In theory want to determine how accurate the estimated influences are. (Think comparison to something other than 0 here). How accurate the estimates are is determined by statistical considerations, to be sure, but statistics is only part of the picture. Without a well planned designed, nonrandom biases may be included in the estimates that are not "picked up" by statistical tests. The research design will often determine the validity (more to come) of the study, by determining the soundness of the study. Once a study is "sound" statistical estimates can be used to test hypotheses. Without a sound design, statistical tools are often inaccurate. Lets first examine the criteria by which research is judged and then build a designs.

 

SOUNDNESS

External Validity (generalizability): Can the study be generalized? To what population, settings, treatment variables, and measure? External validity is based on.

 

1.      Interactive effect of testing

Individuals may answer survey questions based on the survey. (Remember at some level all data is based on some data gathering instrument like a survey). Respondents may respond to a particular interviewer (Interviewer bias), phrasing of questions, or flow of questionnaire (e.g., asking about someone's health at the beginning of a survey may elicit a different response than asking about it after a series of questions about their diseases). If individuals are asked a series of questions repeatedly over time, they may learn the answers (e.g., pensions on NLSM).

 

2.      Selection interactions (between selection and experimental variables)

Individuals may be selected (consciously or unconsciously) for inclusion in a study because of a particular factor, which may interact with the key independent variable studied. For example, if we look at earnings of CSUH grads as compared to Chabot grads, CSUH graduation is contingent upon being accepted to CSUH (a condition for inclusion in the study), which would carry higher earnings that not being accepted, ceteris paribus. Asking people about health at a pharmacy may disproportionately select individuals in poor health because healthy individuals don't need prescriptions.

 

3.      Hawthorne effect (reactive effects of experimental arrangements)

Experiments, per se, may change people's reactions. In other words, people may react differently simply because they are part of a study. For example, if you pay respondents, they may react to undertake a study, respondents may react different than in they would without payment. A real concern for marketing types who often pay individuals to try products, but then find that without the payment, people like the product less. (The Hawthorne experiment was one in which researchers noted that productivity of workers increased whenever there was a change in environment. If you made the room brighter productivity increased. If you rearranged the office, productivity increased. Poor Hawthorne thought he discovered productivity increases (with the light change) until researchers found out it was the change not the lights that mattered.

 

4.      Multiple treatment interference (effects of prior treatments are not erased)

The overstudied population's behavior may not reflect the "true" population simply because respondents are responding to another study not the one at hand. They may know about certain programs (for example) simply because they were described in a research questionnaire from another study or they may know how to answer questions based on phrasing because of constant surveying.) Or (in the health field), the effect of one drug on health may reflect the fact the participant was in another study and received a different drug there.

 

Internal Validity (Are the results meaningful (interpretable)?)

 

1.      History (the effect of outside events)

Specific events may occur while study is occurring (or between first and last measurement) that could affect key independent variable. For example, the wages (and job possibilities) of CSUH grads could be severely affected by a devastating earthquake in the Bay Area a month prior to graduation.

 

2.      Maturation (the effect of time)

Changes may occur simply with the passage of time and may not be related to specific events. For example, wages of CSUH grads may increase during their 4-year duration in school simply because respondents are four years older and more responsible and not because of the education they received.

 

3.      Testing (the effect of taking the test)

Individual may learn from the survey instrument. For example, low income individuals in government programs are often evaluated after every program. They could "learn" how to take the test after repeated testing.

 

4.      Instrumentation (the effect of how a test is scored or how a variable is measured)

Changing the way a variable is defined, how a test is scored, or how an interviewer rates an item may produce a different result. For example, if I define healthy as only individuals who rate their health as outstanding, and unhealthy as those who rate their health as very good, good, poor or very poor, I might get different results than if I defined healthy as those who rate their health as outstanding, very good, and good.

 

5.      Regression toward the mean (there's nowhere to go but up you're the worst and nowhere to go up down if you're the best)

If you study respondents at an extreme tail of a measure they are bound to move toward the mean. If I look at the individuals in all classes at CSUH who scored the lowest on their midterm exam to see if they improve on the final the answer will be yes. They can't go lower than the lowest in the class so--by laws of probability--some will go above the lowest.

 

6.      Selection bias at entrance (the effect of studying only the best)

CREAMING. If you study a population that is likely to succeed initially, your research results will likely show success. For example, if we see that all Fortune 500 firms during 2000 made a profit, we can't say the economy is robust or that all firms make a profit because we selected the best to begin with.  Creaming produces a biased sample and results cannot be extrapolated to the "true" population.

 

7.      Selection bias--experimental mortality (the effect of losing respondents)

If you are studying individuals who have a chance of dropping out of the study you also have a biased sample. For example, a certain drug may be deemed to cure a virus because everyone alive after taking it no longer has a cold BUT those who aren't cured by the drug died.

 

8.      Interactions

Any of the above can interact and produce real problems (e.g., selection-maturation)

 

YOU CAN HAVE INTERNAL VALIDITY WITHOUT EXTERNAL VALIDITY (E.G., ACADEMY STUDY) BUT CANNOT HAVE EXTERNAL VALIDITY WITHOUT INTERNAL VALIDITY

 

Now lets look at the design issues. How can we design a research study so that the results have both external and internal validity? (Hint: it takes a lot of time and money).

 

1.      Case Study                                              X (cross-sectional on one population: e.g., health care of individuals in Pleasanton)

 

Simple. The study is executed in one time period and for one group of individuals. One shot. Estimates are based on a guess as to what would happen to sample had they been in another location, subject to other circumstances, or other TREATMENTS (e.g., college education).

 

If we look at the value of a MA in economics from CSUH and examine earnings all the individual who took Research Methods, we have a case study. What would have happened without the degree (treatment)? What would happen in a different location, time etc.?

 

Benefits:

·                   Cheap

·                   Allows the analyst to "pick up" on many outcomes that are not detected in rigorous analysis (especially if qualitative)

Drawbacks:

·                   External validity depends on reactive effects (perils of participant observation--the fact someone is watching)

·                   Little internal validity

·                   History--other events--may determine outcomes

·                   Maturation--time causes change (in earnings)

·                   Difficult to tell if selection problems because no comparison

 

 

2.      Before and After                                     OXO or OOXOO (longitudinal data: e.g., individuals before college, influence of college on building human capital, and labor market outcomes after college)

 

Examine participants before "treatment" (program, key activity to study) and then examine after "treatment". Can observe, survey, or "test".

 

If we examine individuals before they entered the MA program (e.g., survey them prior to entrance) and then survey them again after they leave the program this would be a before-after-design.

 

Benefits

·                   Selection problems reduced since a comparison is being made over time with the same individuals (e.g., those with above average earnings prior to entering MA program will have above average earnings after leaving it)

 

Drawbacks

·                   Selection into the sample (e.g., MA program) still a problem

·                   External validity issues are the same (is the sample representative?)

·                   Little internal validity (no external sample to compare behavior to)

·                   History--other events-- may determine outcomes. Intensified because extended the time period

·                   Maturation--time causes change (in earnings). Intensified because extended the time period

·                   Regression toward mean may be an issue

 

 Can reduce drawbacks by…

·                   Repeating observations before and after "treatment"

·                  

Using regression to predict expected before -after changes (to cover maturation--NOT regression toward mean). For example predict the pattern of earnings that would hold based on pre"treatment" information and see if "treatment" had an impact (Y axis is earnings)


Time                "Treatment"

 


Dotted line is observed and straight line is regression based on pre"treatment" characteristics. If observed differs significantly from predicted more evidence that the treatment is creating the earnings increase.

 

 

3.      Comparison Group Designs                 OOXOO

                                                                        OOOOO

Use a comparison group to represent the expected experience of the participants in the program in absence of the participation. Key is to use comparison group that represents the experience that the program participants would have had in absence of the program. Unless there is random assignment to the "treatment" and the "comparison" group, it is hard to make a case that the groups are the same. For example:

·                   If use individuals who applied to the MA program and didn't get in as the comparison group, they're different because they didn't meet the requirements initially and therefore their experiences would lead to a different earnings trajectory (ceteris paribus) than the MA students.

·                   If use neighbors (for example) who live in the same area as the MA students but who didn't apply to the program, the groups differ (even if use individuals with the same socioeconomic characteristics) because the group who applied to the MA program may be more motivated (for example)

 

Can use variables in regression model to statistically control for differences but not all differences can be quantified (e.g., motivation). Vast literature here on how to statistically control for initial differences in the samples but currently--unless its random assignment you will be dinged big time (i.e., research estimates not believed) because of sample selection issues.

 

With a random control group all qualified persons are contacted prior to treatment. Persons are randomly assigned between the control and treatment groups and the control group is given a placebo (sugar pill). Placebos in labor research are often normal program services. The random control group designs get rid of all external validity problems.

 

If we examine individuals and a group of individuals not going into the CSUH MA program before they entered the MA program (e.g., survey them prior to entrance) and then survey both groups again after the "treatment group" of MA students left the program this would be a before-after-design with a comparison group. For the random control group, we would have to take a (random?) group of students and randomly select them for entrance into the program. MDRC academies study is an example here.

 

 

Drawbacks

·                   Hawthorne effect (feel special because in a program)

·                   Neither group is representative of the general population

·                   Over-time validity and spatial validity

·                   If not selected (and know it) can affect behavior of control group

·                   Ethical not to give someone the treatment?

·                   Politically feasible not to give someone the treatment?

·                   Staff may not understand random assignment and treat groups differently

·                   VERY EXPENSIVE AND ONLY FOR LARGE SCALE RESEARCH

 

 

4.      Regression Discontinuity (e.g., time series--is there a blip?)

Sometimes used if there are practical problems in random assignment.

 


Select criteria that can be ranked (must be cardinal) for selecting participants (e.g., income levels, size of companies) and use a cut off point (e.g., before 1945) or plan discontinuities in cross sectional sample (e.g., income between $10,000-$20,000 or companies not sized 50-150 employees). Post treatment outcomes are then regressed separately on the rankings of the two groups (pre and post 1945 or small and large firms). If significant differences exist between the two groups, support exists for treatment impacts--or for a period effect (WWII).

 

Text Box: GDP

 



Time             1945

 

 

 


Benefits

·                   Easy

 

Drawbacks

·                   Few natural discontinuities (although "natural experiments" are a godsend)

·                   What happens if regression is not properly specified?

·                   Still have external validity.

 

 

``