A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born, leave school, lose their job, are exposed to a drug or a vaccine, etc.). Thus a group of people who were born on a day or in a particular period, say 1948, form a birth cohort. The comparison group may be the general population from which the cohort is drawn, or it may be another cohort of persons thought to have had little or no exposure to the substance under investigation, but otherwise similar. Alternatively, subgroups within the cohort may be compared with each other.
In medicine, a cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected association between cause and disease; failure to refute a hypothesis strengthens confidence in it. Crucially, the cohort is identified before the appearance of the disease under investigation. The study groups, so defined, are observed over a period of time to determine the frequency of new incidence of the studied disease among them. The cohort cannot therefore be defined as a group of people who already have the disease. Distinguishing causality from mere correlation cannot usually be done with results of a cohort study alone.
The advantage of cohort study data is the longitudinal observation of the individual through time, and the collection of data at regular intervals, so recall error is reduced. However, cohort studies are expensive to conduct, are sensitive to attrition and take a long time to generate useful data.
Some cohort studies track groups of children from their birth, and record a wide range of information (exposures) about them. The value of a cohort study depends on the researchers' capacity to stay in touch with all members of the cohort. Some of these studies have continued for decades.
An example of an epidemiologic question that can be answered by the use of a cohort study is: does exposure to X (say, smoking) correlate with outcome Y (say, lung cancer)? Such a study would recruit a group of smokers and a group of non-smokers (the unexposed group) and follow them for a set period of time and note differences in the incidence of lung cancer between the groups at the end of this time. The groups are matched in terms of many other variables such as economic status and other health status so that the variable being assesed, the independent variable (in this case, smoking) can be isolated as the cause of the dependent variable (in this case, lung cancer).
In this example, a statistically significant increase in the incidence of lung cancer in the smoking group as compared to the non-smoking group is evidence in favor of the hypothesis. However, rare outcomes, such as lung cancer, are generally not studied with the use of a cohort study, but are rather studied with the use of a case-control study.
Shorter term studies are commonly used in medical research as a form of clinical trial, or means to test a particular hypothesis of clinical importance. Such studies typically follow two groups of patients for a period of time and compare an endpoint or outcome measure between the two groups.
Randomized controlled trials, or RCTs are a superior methodology in the hierarchy of evidence, because they limit the potential for bias by randomly assigning one patient pool to an intervention and another patient pool to non-intervention (or placebo). This minimises the chance that the incidence of confounding variables will differ between the two groups.
Nevertheless, it is sometimes not practical or ethical to perform RCTs to answer a clinical question. To take our example, if we already had reasonable evidence that smoking causes lung cancer then persuading a pool of non-smokers to take up smoking in order to test this hypothesis would generally be considered quite unethical.
An example of a cohort study that has been going on for more than 50 years is the Framingham Heart Study.
The largest cohort study in women is the Nurses' Health Study. Started in 1976, it is tracking over 120,000 nurses and has been analyzed for many different conditions and outcomes.
An example of a nested case-control study is Inflammatory markers and the risk of coronary heart disease in men and women which was a case control analyses extracted from the Framingham Heart Study cohort.[2]
Household panel surveys are an important sub-type of cohort study. These draw representative samples of households and survey them, following all individuals through time on a usually annual basis. Examples include the US Panel Study on Income Dynamics (since 1968), the German Socio-Economic Panel (since 1984), the British Household Panel Survey (since 1991), the Household, Income and Labour Dynamics in Australia Survey (since 2001) and the European Community Household Panel (1994-2001).
Because the non-randomized allocation of subjects in a cohort study, several statistical approached have been developed to reduce confounding from selection bias.
A comparison of study in which three approaches (multiple regression, propensity score and grouped treatment variable) were compared in their ability to predict treatment outcomes in a cohort of patients who refused randomization in a chemotherapy trial.[3] The comparison study examined how well three statistical approaches were able to use the nonrandomized patients to replicate the results of the patients who consented to randomization. This comparison found that the propensity score did not add to traditional multiple regression while the grouped treatment variable was least successful.[3]
Multiple regression with the Cox proportional hazards ratio can be used to adjust for confounding variable. Multiple regression can only correct for confounding by independent variables that have been measured
Creating a grouped treatment variable attempts to correct for unmeasured confounding influences.[4] In the grouped treatment approach, the "treatment individually assigned is considered to be confounded by indication, which means that patients may be selected to receive one of the treatments because of known or unknown prognostic factors."[3] For example, in an observational study that included several hospitals, creating a variable for the proportion of patients exposed to the treatment may account for biases in each hospital in deciding which patients get the treatment.[3]
Principal components analysis was developed by Pearson in 1901.[8] The principal components analysis can only correct for confounding by independent variables that have been measured.
The prior event rate ratio has been used to replicate with observational data from electronic health records the results of the Scandinavian Simvastatin Survival Study[9] and the HOPE and EUROPA trials. [10][11] Like the grouped treatment variable, the prior event ration attempts to correct for unmeasured confounding influences. However, unlike the grouped treatment variable which controls for the proportion of subjects selected for treatment, the prior event rate ratio uses the "ratio of event rates between the Exposed and Unexposed cohorts prior to study start time to adjust the study hazard ratio".[10]
Limitations of the prior event ratio is that it cannot study outcomes that have not occurred prior to onset of treatment. So for example, the prior event ratio cannot control for confounding in studies of primary prevention.
The propensity score was introduced by Rosenbaum in 1983.[12][13] The propensity score is the "conditional probability of receiving one of the treatments under comparison ... given the observed covariates."[3] The propensity score can only correct for confounding by independent variables that have been measured.
Cohort studies with propensity matching may[14] or may not[15] resemble the results of randomized controlled trials. This may depend on how closely the cohort study emulated the protocol of a randomized controlled trial as done in the RCT-DUPLICATE[14].
Sensitivity analysis can estimate how strong must a unmeasured confounder be to reduce the effect of a factor under study.[17] An example of this analysis was a nonrandomized comparison of when to initial treatment for asymptomatic Human Immunodeficiency Virus in the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) study.[6]
If statistically significant associations are found, the Bradford Hill criteria can help determine whether the associations represent true causality. The Bradford Hill criteria were proposed in 1965:[18]
Strength or magnitude of association?
Consistency of association across studies?
Specificity of association?
Temporality of association?
Plausibility based on biological knowledge?
Biological gradient: or dose-response relationship?
Coherence? Does the proposed association explain other observations?
Many scales and checklists have been proposed for assessing the quality of cohort studies.[20] The most common items assessed with these tools are:
Selecting study participants (92% of tools)
Measurement of study variables (exposure, outcome and/or confounding variables) (86% of tools)
Sources of bias (including recall bias, interviewer bias and biased loss to follow-up but excluding confounding) (86% of tools)
Control of confounding (78% of tools)
Statistical methods (78% of tools)
Conflict of interest (3% of tools)
Of these tools, only one was designed for use in comparing cohort studies in any clinical setting for the purpose of conducting a systematic review of cohort studies[21]; however, this tool has been described as "extremely complex and require considerable input to calculate raw scores and to convert to final scores, depending on the primary study design and methods".[20]
The Newcastle-Ottawa Scale (NOS) may help assess the quality of nonrandomized studies.[22][23]
Rare outcomes, or those that slowly develop over long periods, are generally not studied with the use of a cohort study, but are rather studied with the use of a case-control study. Retrospective studies may exaggeration associations.[29]
Randomized controlled trials (RCTs) are a superior methodology in the hierarchy of evidence, because they limit the potential for bias by randomly assigning one patient pool to an intervention and another patient pool to non-intervention (or placebo). This minimizes the chance that the incidence of confounding variables will differ between the two groups.[30][31]
Empiric comparisons of observational studies and RCTs conflict and both find[32][33][34][35][36] and do not find[37][38] evidence of exaggerated results from cohort studies.
Nevertheless, it is sometimes not practical or ethical to perform RCTs to answer a clinical question. To take our example, if we already had reasonable evidence that smoking causes lung cancer then persuading a pool of non-smokers to take up smoking in order to test this hypothesis would generally be considered quite unethical.
↑Adams TD, Gress RE, Smith SC; et al. (2007). "Long-term mortality after gastric bypass surgery". N. Engl. J. Med. 357 (8): 753–61. doi:10.1056/NEJMoa066603. PMID17715409.CS1 maint: Explicit use of et al. (link) CS1 maint: Multiple names: authors list (link)
↑Pai JK, Pischon T, Ma J; et al. (2004). "Inflammatory markers and the risk of coronary heart disease in men and women". N. Engl. J. Med. 351 (25): 2599–610. doi:10.1056/NEJMoa040967. PMID15602020.CS1 maint: Explicit use of et al. (link) CS1 maint: Multiple names: authors list (link)
↑Hill J (2008). "Discussion of research using propensity-score matching: Comments on 'A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003' by Peter Austin, Statistics in Medicine". Stat Med. 27 (12): 2055–2061. doi:10.1002/sim.3245. PMID18446836.