The Planning Stage: Developing your Proposal
Week 2
Based on the diagram: Beginning the Design Process
1. State questions and hypotheses, identify variables (AKA focusing the study)
Most research starts with a description of a situation that logically produces a puzzle or question that the researcher wants to answer. From that question, an hypothesis is developed and tested. (We note that questions can also produce problem statements that propose solutions to the problem and evaluate alternative solutions instead of rival hypotheses. Because most economic research is hypothesis testing, we focus the course around the more theory-based research described last week). See handout from Black.
Hypothesis: An expression of anticipated outcomes, as predicted by a given theory or expected consequences of principles applied to a specific situation, that result from actions, characteristics, or events.
Note the components:
· Anticipated outcomes (dv)
· Predicted by theory or principles
· Specific situation (controls)
· Actions, characteristics or events (iv)
Note that data are not mention. Data are the supporting evidence used to test the theory (reject the null).
Note that causation is implied by theory.
Note that the heart of the hypothesis is predicted relationships between variables (defined as nonconstant traits that allow us to consider the possibility of relationships).
Hypotheses must be testable. It should be of sufficient scope to be resolvable with given resources and stated in a way as to clearly define the problem to be investigated.
Most questions start too broad (as might be expected given the magnitude of complexity of the world and the grandiose tools of economics) given the resources (time, brain power, and dollars) available. Most questions also start out too vague leaving operationalization virtually impossible. Broad or vague questions produce untestable hypotheses.
Questions to ponder in developing hypotheses that will help your focus:
1. What are the core concepts to be studied?
· How are the core concepts to be defined?
· Is this definition realistic (i.e., can core concepts be operationalized)?
· What are alternative definitions?
· How might alternative definitions alter findings?
2. What is the time frame?
3. What is the specific situation (this places bounds on the research to make it manageable)?
4. What is the unit of analysis (individual or aggregate)?
5. What are the expected outcomes? (endogenous variables)
6. What are the predictor variables? (exogenous variables)
7. What are extraneous factors (control variables)
8. What are the potential intervening factors that could produce irregularities in the relationship? (How are the variables related and under what conditions?)
9. What are the mechanisms by which relationships exist?
10. What are rival hypotheses?
11. What is the null hypothesis to be rejected? (no relationship exists)
2. Determine design structure
How will I structure the research (logical line of inquiry) to test the hypothesis? Key elements:
·
Internal validity
The structure of the research design that allows us to draw unambiguous conclusions from our results.
Internal validity depends on:
1. Logical nature of the inquiry. If A ® B then we can conclude C unambiguously. Logic is essential here. Does it make sense that wages are estimated in log form? Does it make sense that A and B are causally related? Does the data and population make sense? Use theory, logic, and common sense here.
2. Measurement
· If we alter the definition of the construct will we still get the same results?
· If we ask the question is a slightly different manner will we get the same results? (John and testing)
· Does the measure capture the construct? (education and skills)
· Error in measurement
· Random (everyone randomly misreports income—some over report and some overreport and by same amount) the Mean and estimates are accurate and the problem is not serious
· Constant error (everyone overestimates income by 10%) then intercept off but relationships ok
· Correlated error when amount and direction vary with characteristic of respondent (low income overstate and high income understate) then have to model/statistically correct for error for study to be valid
·
Externality
validity
Can results of the study be generalized beyond the population at hand? Will the results change if I change:
· Location
· Population
· Samples
· Time periods
A study can have internal validity but lack external validity (e.g., Career Academy study). You can’t have external validity without internal validity.
·
Modeling
(defining relationships)
How are the variables to be studied related? Theory is essential here and statistical tests can support theory.
· Direct (recursive) A ® B: Does A cause B? Is A correlated with B? (e.g., education increases wage)
· Indirect A ® C ® B (e.g., education produces skills which increase lm productivity/wages)
· Direct and indirect A ® B and A ® C ® B (e.g., education directly increases wages by motivating hard work and indirectly increases wages through skill acquisition).
· Spurious correlation/relationship: correlation without meaning—could be mismeasurement (e.g., Age is found to be correlated with laser surgery but age could merely be a proxy for income or poor eyesight and not a demand factor per se). Spurious correlation is a real problem is don’t use theory
· Nature of the relationship—what does the “line” look like
· Log (as in wages and schooling)
· Squared (increasing at a decreasing/increasing rate—as in experience and wages)
· Linear (as in study time and grades)
· Ethics
Should we withhold money from individuals for the sake of research?
Do I have right to information that you would find embarrassing for the sake of research?
Nazis medical research during WWII
3. Identify population and sample
· What is the unit of analysis (does the hypothesis rely on individual or aggregate data for testing)?
· Who/what is the group to be studied and who/what is the comparison group?
· What time frame must the data be drawn?
· Crossectional (one observation per individual during one time period)
· Time series (one observation per individual with data collected at different time periods)
· Longitudinal (multiple observations of an individual with data collected at different time periods)
· Do data already exist?
· Do existing data bases cover the population desired, contain the relevant variables, and during the relevant time period?
· Are existing data bases in a form that I can use?
· Are existing data bases available to me (expense, confidentiality, ethics)?
· Is primary data collection necessary?
· Do I have access to the relevant population?
· Will respondents give me the necessary information?
· Do I have the resources to conduct primary data collection?
· Am I violating privacy of or offending potential respondents? Am I raising ethical dilemmas for them?
· How will I collect the data—phone, in person, mail?
4. Design instruments and classify: operationalize definitions
· Open-ended and close-ended questions
· Data levels
· Categorical/Nominal: Variables have name value only with at least two categories existing (e.g., gender, race, ethnicity, city/location). Usually developed into binary variables in estimations.
· Ordinal: Some rank order can be established but the intervals between the categories is not necessarily equal (e.g., hierarchies of occupational structure, utility)
· Interval: Rank order on a continuous scale (i.e., intervals between categories are same) but there is no zero point when the trait does not exist. Magnitude is what is measured (e.g., IQ—a 0 is a vegetable).
· Ratio: Scale is continuous with an absolute zero with meaning (e.g., production)
5. Select statistical tests for resolving hypotheses
Description of the problems (means, variances, charts, tables) usually set up the question
Regression is the statistical tool of choice of economists (OLS, 2SLS, Probit/Logit)
Modeling and tests for robustness are critical
An Application: The labor market for entry level workers
1. State questions and hypotheses, identify variables
Several trends suggest that the demand for skills in the labor market is increasing:
· Increasing demand for skills that is often thought to be associated with technology (?)
· Declining relative wages and relatively high levels of unemployment among individuals with low levels of skills
· Increasing wages and relatively low levels of unemployment among individuals with high levels of skills
· (Increasing wage gap between skilled and unskilled)
and that the “need” for labor market wages is also increasing:
· Changes in federal legislation on welfare place lifetime (5 years) and continuous time (3 years) limits on general income assistance to individuals with children
Question arises: What skills do individuals with low levels of education need to gain employment at relatively high wages?
Hypothesis: In a given labor market, workers with low levels of education and work experience, but with more technical skills will have higher wages than similarly educated and experienced workers without technical skills.
2. Determine design structure
How can the research be structured to obtain?
· Individual level data in a restricted labor market (minimizes extraneous factors like heterogeneity in demand and prices so increases internal validity but reduces external validity and generalizability)
· Individual wages within a series of labor markets (better)
· Occupational wages that are offered by employers
· Detailed measures of skills (defined as????) with control factors of education and work experience (at a minimum) Can we ask individuals for this??????
3. Identify population and sample
· Does secondary data exist on individuals (HSB, NLS-NC, Census)
· Does secondary data exist on employers (multicity, Dunn and Bradstreet/ES202)
· Primary data collection
4. Design instruments and classify: operationalize definitions
· Identify employers
· Get details of skills
· Geographic restrictions
5. Select statistical tests for resolving hypotheses
· LnW earnings equations
· Potential Problems (limited variance, occupations are unit of analysis)