Home
HOME ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
PubMed
Right arrow PubMed Citation
The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 58:S338-S346 (2003)
© 2003 The Gerontological Society of America


RESEARCH ARTICLE

Trends in Scores on Tests of Cognitive Ability in the Elderly U.S. Population, 1993–2000

Willard L. Rodgers, Mary Beth Ofstedal and A. Regula Herzog

Survey Research Center, University of Michigan, Ann Arbor.

Address correspondence to Dr. Willard L. Rodgers, Institute for Social Research, University of Michigan, Ann Arbor, MI 48104-1248. E-mail: wrodgers{at}umich.edu


    Abstract
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Objective. This study investigates cohort differences in cognitive functioning among Americans aged 70 or older in 1993, 1995, 1998, and 2000.

Methods. The study draws on self-respondent data from four waves of the Asset and Health Dynamics Among the Oldest Old Study and the Health and Retirement Study surveys collected between 1993 and 2000. Cognitive performance scores for each of four components (immediate recall, delayed recall, serial 7s, and mental status) and their sum are compared across cohorts, unadjusted and with adjustments for survey design features and demographic characteristics.

Results. Unadjusted scores suggest cohort improvements in several components of cognitive functioning between 1993 and 1998, and little change between 1998 and 2000. However, these improvements largely disappear after confounding features of the survey design (changes in age distribution of the sample across waves and prior exposure to the cognitive tests) and changes in the demographic composition of the sample (race, ethnicity, and gender) are adjusted for.

Discussion. There appears to have been little improvement of cognitive functioning across recent cohorts of older Americans. However, the study points out the complexities of using panel data to study cohort differences, particularly when the measures of interest are likely influenced by prior wave participation. Future studies based on other data sources are needed.

Findings that suggest a cohort-related decline in disability among recent cohorts of older Americans have generated considerable interest in the research as well as the policy arena. Several studies have described lessened functional impairment measured by physical functions such as lifting and carrying and climbing and walking and by activities of daily living (ADLs) such as dressing and eating among recent cohorts of elderly people as compared with previous cohorts when assessed at the same chronological age (Freedman & Martin, 1998Go, 2000Go; Manton, Corder, & Stallard, 1993Go, 1997Go). Such findings have important implications for projecting the ability of older adults to function independently and care for themselves and for their need for family care, support services, and institutionalization—in short for the costs of supporting America's oldest population. Whereas past investigations have focused primarily on physical impairments, it is well recognized that the ability to function independently depends at least as much on cognitive functioning as on physical functioning, and that this will increasingly be the case in an environment of growing technological and informational complexity. It is therefore of equal interest to learn whether the suggested cohort-related decline extends to cognitive functioning.

A few studies have in fact suggested that impairments of instrumental ADLs (IADLs) such as shopping for food and making telephone calls have declined at least as much as physical impairment and ADLs across cohorts (Crimmins, Saito, & Reynolds, 1997Go; Freedman & Soldo, 1994Go; Manton et al., 1997Go; Waidman & Liu, 2000Go). IADLs are thought to depend more heavily on cognitive functioning than ADLs. Similarly, cohort-related declines in dementia have been reported (Corder & Manton, 2001Go; Manton, Stallard, & Corder, 1995Go, 1998Go). Broad societal changes would seem to be consistent with such findings. For example, the quantity and quality of education have increased throughout this century, and subsequent cohorts have reaped the benefit of improving education. During this same time, the workplace was transformed from predominantly labor and low-tech jobs to service and high-tech jobs. Both education and complex work benefit cognitive functioning (Fillenbaum, Hughes, Heyman, George, & Blazer, 1988Go; Kohn & Schooler, 1983Go; Mortimer & Graves, 1993Go; Schmand et al., 1997Go).

Probably the most direct test to date of improvement in cognitive functioning among more recent cohorts of older Americans was reported by Freedman, Aykan, and Martin (2001)Go. These authors used nationally representative longitudinal data from the 1993 wave of the Asset and Health Dynamics Among the Oldest Old Study (AHEAD) and the 1998 wave of the Health and Retirement Study (HRS) to compare two cohorts of Americans aged 70 and older in 1993 and in 1998 in terms of severe cognitive impairment, as measured by means of a short battery of standard cognitive tests. They reported a decline from 6.1% to 3.6% between 1993 and 1998 in severe cognitive impairment and found that this decline was not explained, at least in full, by differences in demographic, socioeconomic, and health composition of the two cohorts. This represents a rather dramatic shift over a relatively short period of time and deserves notice.

The combined HRS and AHEAD studies—which began in 1992 and 1993, respectively, and were merged in 1998—represent an attractive opportunity for investigating cohort differences in cognitive performance. First, the study is designed to be nationally representative of Americans who are 51 years old and older. Second, interviews are conducted every 2 years and the sample is periodically replenished with incoming cohorts (every 6 years). Third, a short battery of cognitive performance measures is administered as part of each interview.

Despite this attractive design, some of its specific features must be carefully considered with regard to making valid cohort comparisons. One set of design features affects the composition of the sample. First, to date, the HRS survey has added new birth cohorts only once, in 1998, providing at that point a sample representative of the population born in 1947 or before. The remaining biennial surveys only sought interviews with those in a specific age range (e.g., aged 51–61 in 1992; aged 70+ in 1993) and thus are not representative of the entire 51+ age range. Second, proxy respondents are interviewed when selected respondents are unable or unwilling to participate themselves. Third, not all respondents who are unable or unwilling to participate for themselves are represented by a proxy. Some die, some are lost to follow-up, some simply refuse, and for some no proxy is available. Many of these reasons for attrition are related to cognitive impairment; those who are more impaired are more likely to become nonrespondents, and therefore attrition may bias the samples.

Another set of design features refers to the specific cognitive performance measures included in the studies. First, the HRS is a panel study in which the same respondents are re-interviewed in each survey wave. Despite replenishment of newly entering cohorts in later surveys, most respondents had been surveyed before, sometimes repeatedly. A number of studies that have examined individual-level changes in cognitive functioning over time have found strong learning effects associated with repeated cognitive testing (Jacqmin-Gadda, Fabrigoule, Commenges, & Dartigues, 1997Go; Unger, van Belle, & Heyman, 1999Go; Zelinski & Burnight, 1997Go); these effects tend to be most pronounced between the first and second exposure to the cognitive tests (Jacqmin-Gadda et al., 1997Go). Such learning effects may confound cohort comparisons using the HRS–AHEAD studies. Second, as we have noted before (Herzog & Rodgers, 1999Go), the changing historic and societal context can affect knowledge of information from real life, confounding comparisons of cognitive performance over time; the knowledge of the President's and Vice President's names are good examples, in that it is likely to be influenced by the timing of elections or other political events featuring these individuals vis-à-vis the interview. Third, the cognitive tests were revised slightly after the AHEAD 1993 survey, a change that possibly further confounds the comparisons. Most importantly, the single list of 10 nouns for the free recall test that was used in the 1993 wave was replaced starting in the 1995 wave with four lists (each containing 10 nouns) assigned to respondents in a counterbalanced manner (for more detail, see Ofstedal, McAuley, & Herzog, 2001Go). Preliminary comparisons of the four new lists in Wave 2 indicated equivalence of list difficulty (Herzog & Rodgers, 1999Go). However, more recent data (discussed in the following paragraphs) provide some evidence that one of the lists is more difficult than the other three, and also that the list administered at Wave 1 is more difficult than any of those introduced in Wave 2. Fourth, HRS uses a mixed-mode design according to which baseline interviews, and follow-up interviews with those over the age of 80, were mostly conducted face to face, whereas follow-up interviews with younger respondents were done mostly over the telephone. Cognitive performance is possibly related to the mode by which it is measured, and the mixing of modes therefore creates a possible source of confounding of differences and changes in cognitive scores.


    METHODS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Data Sets
The data used in this study are from the AHEAD and HRS panel studies. AHEAD is a longitudinal survey of a nationally representative sample of persons who were born in 1923 or before and who were living in the community at the time of the baseline interview in 1993–1994. Interviews were conducted with sampled respondents and their spouses (of any age). To date, respondents have been reinterviewed three times: in 1995–1996, 1998, and 2000. The response rate was 80% at baseline, and follow-up response rates for surviving respondents have reached or exceeded 94% in all subsequent waves. The HRS, which is similar in design and content to AHEAD, began in 1992 as a longitudinal survey of a somewhat younger cohort (those born in 1931–1941). The HRS respondents have also been reinterviewed every 2 years (in 1994, 1996, 1998, and 2000), and response rates have tended to be slightly higher than those for AHEAD. The AHEAD and HRS surveys were combined in 1998 and the sample was supplemented with new birth cohorts so as to be representative of the entire U.S. household population born in 1947 or before (i.e., aged 51 and older in 1998). Data from four waves of interviews were collated for the present analysis: AHEAD 1993 and 1995, and HRS 1998 and 2000. Information was included only about respondents who were at least 70 years old at the time of the specific interview. Further detail on the design of the AHEAD and HRS studies is provided elsewhere (Juster & Suzman, 1995Go; Soldo, Hurd, Rodgers, & Wallace, 1997Go).

Cognitive Measures
In this article we focus exclusively on the cognitive performance tests that were administered to self-respondents in all four waves. These cognitive performance measures were chosen for AHEAD and HRS to cover a range of cognitive skills and to be suitable for administration over the telephone. To this end, many of the items were drawn from the Telephone Interview for Cognitive Status (TICS) screen, which is modeled after the widely used Mini-Mental State Exam and has been validated specifically for phone use (Brandt, Spencer, & Folstein, 1988Go). The individual items may also be mapped onto the cognitive dimensions of fluid intelligence or process (counting backwards), crystallized intelligence or product (naming objects and president), orientation in time (dates), immediate memory (immediate free noun recall), delayed memory (delayed free noun recall), and working memory (serial 7s). For both immediate and delayed free recall tests, the interviewer read a list of 10 nouns, and respondents were asked to repeat as many as they could immediately afterward and again several minutes later in the interview. In the serial 7s subtraction test, respondents were asked to subtract 7 from 100 for a total of five times. Respondents were also asked to name the date (day, month, year, and day of the week), to name cactus and scissors from a brief description, and to identify the U.S. President by last name. Finally, respondents were required to count backward from 20. On the basis of prior factor analyses (Herzog & Wallace, 1997Go), we retained for the analyses reported in this article the Immediate Word Recall (scores ranging from 0 to 10), the Delayed Word Recall (from 0 to 10), the Serial 7 (scores ranging from 0 to 5), and a composite Mental Status (MS) score, formulated by adding scores on all remaining items, resulting in a composite score ranging from 0 to 9. (We omitted from our analyses an item asking for the name of the U.S. Vice President, because an inspection of the proportion answering this correctly each year strongly suggested that there was a period effect that would have contaminated the cohort effects on which we wanted to focus our attention. Details on the scientific justification for excluding this item are provided in the working version of this article, which is available upon request from W. Rodgers.) We also formed an overall cognitive score that is the sum of the four components. Alpha (reliability) coefficients for the overall score range from.72 to.76 across the four waves.

Missing data rates on the cognitive measures are not particularly high: approximately 2% (averaged across waves, among those aged 70 and older at each wave) for the immediate and delayed recall tasks; from 0.2% to 1.9% for the items on the MS component; and almost 8% for the serial 7s test. It is clear, however, that the data are not missing at random. Those persons with poor cognitive ability are more likely to refuse to participate in these tests, as evidenced first by looking at the average scores on the other components for those with and without missing data on any one component (see also Herzog & Wallace, 1997Go) and second by looking at the average scores at one wave for those with and without missing data at either the preceding or the following wave. Those with any missing data at wave t (t = 1, 2, or 3) but none at wave t + 1 have total cognitive scores 6 or 7 points lower at wave t + 1 than do those with no missing data at wave t, and similarly those with any missing data at wave t (t = 2, 3, or 4) but none at wave t -1 have total cognitive scores 6 or 7 points lower at wave t -1 than do those with no missing data at wave t. Moreover, and directly relevant to the purposes of this article, the proportion of missing data is not constant across waves. Across the four waves for the 11 items in the cognitive tests considered in this article, the item missing data rate was 1.4%, but it ranged from a high of 2.5% at Wave 1 to a low of 0.8% at Wave 3.

Because the data are not missing at random, it is important to take account of missing data rather than to delete cases with any missing data. We have done so by using an imputation procedure that took account of both stable and time-varying covariates and of covariation of the cognitive measures both within and across waves. (The imputations were done with IVEware; see Raghunathan, Lepkowski, Van Hoewyk, & Solenberger, 2001Go. The three-stage procedure that we used is described in the working paper version of this article.)

Analysis Methods
To assess trends in the cognitive test scores, we first examined unadjusted average scores for each wave for the overall cognitive scale and each of the four component scales. We then used multivariate regression techniques to evaluate the extent to which the observed trends could be accounted for by study design features and by differences in the demographic composition of cohorts across waves. We consider these factors extraneous to cohort differences. Finally, we evaluated the hypothesis that cohort differences in educational attainment could explain cohort differences in cognition.

Regression models were estimated by pooling the samples across all four waves and including indicators for the interview wave to test for differences in average score across waves. Four models were estimated: Model 1 includes only the wave indicators (the unadjusted model), Model 2 adds the study design variables, Model 3 adds demographic characteristics of respondents, and Model 4 adds education. Following these primary analyses, we conducted a set of additional analyses to evaluate the potential confounding effects of interview mode, differences in word lists, and selection effects. Results of those refinements are discussed briefly at the end of the article. Because persons in nursing homes and other institutional settings are not represented at Wave 1, respondents in nursing homes at later waves are excluded from these analyses to avoid the introduction of a spurious change component. In all analyses, the data were weighted to take account of both the sample design (which included oversamples of Blacks and Mexican Americans) and poststratification adjustments to Current Population Survey estimates of the proportion of the household population in cells defined by race, gender, marital status, and age ranges. Moreover, all standard errors (and significance levels) were adjusted to account for the complex sample design of the AHEAD–HRS studies and the overlap in cases between waves.


    RESULTS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Description of Samples at All Waves
Respondent characteristics that will be treated as covariates of cognitive status are listed in the top panel of Table 1, with means or proportions shown for those with self-interviews at each of the four waves. The average age increased almost 2 years between Waves 1 and 2; this reflects the study design, because in those two waves the target population was restricted to individuals born in 1923 or earlier. At Wave 3, younger birth cohorts were represented, and the average ages at Waves 3 and 4 are approximately the same as those at Wave 1. The differences in proportions with a prior self-interview also reflect the AHEAD–HRS design: In Waves 2 and 4 almost all respondents had been previously interviewed, whereas in Wave 3 new cohorts were sampled and interviewed for the first time. Table 1 shows that there are also differences on other characteristics across waves. Level of education increased monotonically across the four waves. This we interpret as a real difference in the target population, one that reflects a trend toward higher educational attainment in more recent birth cohorts. The proportion of respondents who were female and who reported themselves to be Black or African American did not differ significantly across waves, but the proportion who said that they were Hispanic increased from approximately 3% at Waves 1 and 2 to approximately 4% at Waves 3 and 4.


View this table:
[in this window]
[in a new window]
 
Table 1. Average Values on Covariates and Cognitive Scores for Self-Respondents.

 
Description of Cognitive Scores at All Waves
The mean scores on each of the four cognitive tests at each of the four waves are shown in the second part of Table 1. The number of words recalled both immediately and after a delay increased significantly from Wave 1 to Wave 2, and again from Wave 2 to Wave 3 ( in each case), but the change in average scores on each of these measures between Wave 3 and Wave 4 was small, negative, and not statistically significant. Scores on the MS scale increased significantly every wave. The pattern for the serial 7s scores is, by contrast, nonmonotonic: there is a significant decline from Wave 1 to Wave 2 (), followed by a return to the Wave 1 level at Waves 3 and 4.

The next to last row of Table 1 shows the mean values on the total cognitive score. There are statistically significant increases between Waves 1 and 2 and between Waves 2 and 3 (), but the difference between Waves 3 and 4 is small and not statistically significant. The final row of Table 1 shows that the proportion of respondents with low (10 or fewer) total scores declined monotonically across the four waves, from 6% at Wave 1 to 4% at Wave 4.

Adjustments for Design-Driven Covariates
The observed changes in the cognitive scores presented in Table 1 may be affected by factors related to the design of the study and thus may not accurately reflect true differences in the target population. As noted before, two such factors are of special concern because they are expected to be related to performance on cognitive measures—the changes in the age distribution of the sample at the different waves and the prior experience with the cognitive tests for many of those interviewed after Wave 1. The correlations between age and each of the component cognitive scores are negative, in the range of -.16 (for serial 7s) to -.30 (for delayed word recall), and that between age and the total score is -.33. These correlations are strong enough that the changes in the age distribution across waves (which are statistically significant as shown in Table 1) could contribute substantially to the differences in cognitive scores between waves. The prior experience with cognitive tests is due to the nature of a longitudinal design, which generally builds in a confounding between period (i.e., the dates on which the data are collected) and exposure to the content of the data collection instrument. If all respondents enter the study at the same time and are reinterviewed on the same schedule, then it is impossible to separate the effects of period from those of experience. The design of the HRS is such that prior exposure to the cognitive tests is not completely confounded with period (because, e.g., new birth cohorts were interviewed for the first time at Wave 3), but there is a strong relationship. The implication is that the pattern of scores that we have observed across waves may be biased because of the effects of prior exposure to the tests.

To take account of these two design-driven covariates and thereby obtain more accurate estimates of changes over time in the cognitive abilities of the target population, we used regression analyses in which the dependent variable was one of the cognitive measures and the predictors included (in addition to dummy variables for the wave of data collection) the age of the respondent and a dummy variable indicating whether or not the respondent had done a prior self-interview (and so whether or not the respondent had previously taken the cognitive tests as part of the HRS interview).

The findings from the regression analyses for the total cognitive score are displayed in Table 2. (Parallel regression analyses predicting each of the four component scores were also run but cannot be included in this article because of space constraints. They are included in the working paper version.) Results from logit regression analyses predicting the probability of a low total score are displayed in Table 3. Model 1 in Table 2 simply repeats the information given in Table 1 in a different format to facilitate comparisons with Model 2, which adds the design-driven covariates. Looking first at the coefficients for the covariates included in Model 2, we see that age has the expected negative relationship to the cognitive score, and the expected positive relationship to the probability of a low total score ( in each case), with an estimated decline of more than.3 points on the total score per year. Prior exposure to the cognitive tests is also strongly related to the cognitive scores: those who have previously been interviewed score over 1 point higher, in total, than those interviewed for the first time, after age is adjusted for.


View this table:
[in this window]
[in a new window]
 
Table 2. Regressions to Total Cognitive Score for Self-Respondents.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Logistic Regressions to Low Cognitive Score.

 
After the design-driven covariates have been accounted for, all of the differences in cognitive scores between waves are estimated to be much smaller than the unadjusted differences, and the adjusted means for Waves 2 and 4 are in fact lower than that for Wave 1. The only improvement that is statistically significant is that for the total score between Waves 2 and 3, and that is followed by a significant decline between Waves 3 and 4. None of the differences between waves in the probability of low total scores remains statistically significant.

Adjusting for Respondent Characteristics
To further understand the observed changes in the cognitive scores across the four waves, we estimated a third regression model to take account of respondent demographic characteristics that are related to the cognitive scores. These covariates include race, ethnicity, and gender (in addition to age, which was already included as a design-driven covariate in Model 2). Real changes in the composition of the elderly population on these characteristics across the approximately 6-year period could affect the aggregate cognitive performance. Taking these characteristics into account does not change the pattern of statistically significant differences between waves, either for the total scores or for the probability of a low score.

Educational attainment represents another respondent characteristic of interest, but in this case the rationale is somewhat different. We hypothesize that educational activities are causally related to cognitive performance. The literature on crystallized intelligence—also termed knowledge component or product (Salthouse, 1999Go)—suggests that formal education plays an important role in developing cognitive performance (Perlmutter & Nyquist, 1990Go). On the basis of this argument we believe that it is inappropriate to adjust for educational attainment because the growing educational attainment across more recent cohorts is part of the phenomenon of increasing cognitive performance. Of course, the direction of causation between education and cognitive performance may be the reverse—a higher level of cognitive performance enables higher levels of educational attainment; in other words, education is the outcome. On the basis of this argument it would be particularly inappropriate to control education away, we suggest, because we would in part be controlling for the very characteristic that we are trying to explain and thereby we would produce biased estimates of the cognitive cohort trends. Nevertheless, because of the wide interest among researchers in the role of educational attainment in cognitive functioning, we test a model that includes three indicators of educational attainment (Table 2). The first indicator is the number of years of education, 0–17, that the respondent reported completing. The other two indicators are dummy variables for completion of high school and obtaining a college degree, respectively.

These educational variables explain a fairly substantial proportion of the variance in the cognitive scores as evidenced by the increase in R2 from Model 3 to Model 4. However, despite the substantial relationship between education and cognitive scores and the substantial increases in educational attainment across the four waves (as shown in Table 1), the coefficients for the wave indicators in the fully controlled model (Model 4) do not differ much from those in the model without the education controls (Model 3). The only statistically significant difference between any pair of waves on the total cognitive score that remains after education as well as the design-driven and demographic characteristics are controlled for is the decline between Waves 3 and 4.

Refinements
Mode of data collection
One characteristic that is not included in any of the models shown in Tables 2 and 3 is the mode of the interview. Cognitive scores of those who were interviewed by telephone had substantially higher average scores on all four of the cognitive tests ( in each case). Moreover, the proportion of respondents interviewed face to face varied considerably across the four waves, ranging from a low of 29% at Wave 4 to a high of 49% at Wave 3. However, we do not think it is appropriate to include mode as a covariate when we are trying to understand the trend in cognitive scores. This is because the interview mode may well be the consequence of cognitive status, rather than an influence on cognitive scores, so that a model that included mode would be misspecified and the estimates of the effects of other variables, such as wave, would be biased. By design, the preferred mode of interviewing depends on the age of the respondent, and age is a known predictor of cognitive status, so the inclusion of mode would bias the estimate of the effect of age on cognitive status; but the actual mode of the interview is influenced by characteristics of the respondents, including their physical and cognitive limitations. Across the four waves, 28% of those aged 70–79 were interviewed face to face, and 41% of those aged 80 and older were interviewed by telephone. An experiment conducted as part of the HRS–AHEAD research to test mode differences in randomly assigned modes further confirms the fact that mode per se is not related to cognitive scores (for details, see Herzog & Rodgers, 1999Go).

Differences between word lists
There are two important patterns with respect to the adjusted trends in components of the cognitive scores: Scores on the two-word recall tasks were consistently higher at Wave 2 than at Wave 1, and scores on the serial 7s task were consistently lower at each of the later waves relative to Wave 1. We have not found a satisfactory explanation for the lower scores on the serial 7s task at the three later waves; we suspect that it may lie in differences in how the interviewers were trained to administer the task at the first wave compared with later waves. We have, however, found a possible explanation of the increase in scores on the immediate and delayed word recall tasks from Wave 1 to later waves: this may lie in the change in the word lists introduced at Wave 2. As already described, at Wave 1 all respondents heard a single list of 10 words, but starting at Wave 2 respondents were randomly assigned to one of four word lists, all four of them different from that used at Wave 1. All four lists show a positive trend across the first three waves: that is, at Wave 2 the respondents recalled more of the words on all four of the lists than respondents at Wave 1, and those who heard a given list at Wave 3 recalled more of those words than did the (different) respondents who heard that list at Wave 2. It appears, however, that the four word lists are not equivalent to one another, and in particular one of the lists appears to be more difficult than the three other lists. This difference between the word lists becomes even more dramatic when the scores are adjusted by using a regression analysis that includes the design-driven covariates and respondent characteristics listed for Model 3 in Table 2. The adjusted scores for those given List 2 at Waves 2–4 are at least slightly lower (and both substantially and significantly lower at Wave 4) than those of all respondents at Wave 1, whereas those given the three other lists had consistently (and sometimes significantly) higher scores at Waves 2–4. These results show the sensitivity of the scores on the word recall tasks to the specific words in the list, and the difficulty of coming up with equivalent lists.

There is no direct evidence concerning the difficulty of the word list used at Wave 1 relative to the four lists used at subsequent waves. We have examined the difficulty of individual words used in these lists as assessed by the proportion of respondents who were able to recall each word in the immediate and delayed recall tasks, and although there is a great variability in difficulty, we did not find any outliers—that is, no words that were so much more difficult or easy than other words that their inclusion in a particular list could explain why that list was easier or harder than other lists. A plausible hypothesis, however, is that the observed increase from Wave 1 to later waves is partly, if not completely, explained by the original word list being more difficult, on the average, than the lists used at subsequent waves.

Selection effects
Although the data analyzed in this article come from a panel study, the specific respondents from whom cognitive scores were obtained are not constant from wave to wave because of features of the HRS–AHEAD design described earlier, such as sample selection, proxy interviewing, moving into a nursing home, and sample attrition related to death, loss to follow-up, and refusal. Together, these factors add up to a substantial turnover in those who were self-interviewed from one wave to the next. Of all the 19,069 self-interviews at Waves 1, 2, and 3, 20% (3,890) did not provide a self-interview in the second wave of any 2-wave pair—the "dropouts." Contrariwise, of the 18,859 self-interviews at Waves 2, 3, and 4, 20% (3,680) did not provide a self-interview in the first wave of any two-wave pair—the "recruits." The dropouts and recruits, moreover, are unlike those who provided self-interviews at two waves of any two-wave pair—the "stay-ins." The dropouts are approximately 3 years older on average than the stay-ins, whereas the recruits are an average of more than 6 years younger than the stay-ins, in large part reflecting the addition of new, younger birth cohorts at Wave 3. Stay-ins are more likely to be female than either recruits or dropouts. Stay-ins also have a higher average income and net worth than either recruits or dropouts. Minority groups (Blacks and those interviewed in Spanish) comprise a larger proportion of the dropouts than of the stay-ins, Blacks comprise a smaller proportion of the recruits, and those interviewed in Spanish and of non-Black or non-White ethnicity comprise a larger proportion of the recruits.

Each of the conditions that result in missing cognitive scores (mortality, inability to participate, moving into a nursing home, and unwillingness to participate) may be related to cognitive status. If the prevalence of each of these conditions were constant across waves, then they could be ignored because they would have no impact on changes in the average observed scores between waves. There is evidence, though, that mortality rates among those over the age of 70 have declined in the U.S. population, and that the proportion of interviews done by proxy informants has increased across the four waves of this study, from 11% at Wave 1 to 15% at Wave 4.

To quantitatively assess the success with which Model 3 takes account of the probability of providing cognitive scores (or, contrariwise, of not doing a self-interview), we estimated Heckman (1976)Go selection models. This procedure simultaneously models level on a dependent variable and the probability that the level on the dependent variable is missing, and it obtains maximum likelihood estimates of the prediction model by taking the selection model into account. The Heckman procedure was implemented by using Stata (StataCorp, 2001Go) to predict the total cognitive score, using the design-driven and respondent characteristics included in Model 3, and using the following, partially overlapping set of variables to predict participation at each wave: age, gender, race (Black, White, or other), language in which interview was conducted (English or Spanish), logarithm of total household income and of total net worth, whether the respondent reported having had a stroke, and marital status (married, living with a partner, or other).

As shown in Table 4, the coefficients for the regression model, and in particular those for the wave indicator variables, are quite similar whether or not selection is taken into account.


View this table:
[in this window]
[in a new window]
 
Table 4. Estimate Model 3 for Total Cognitive Score, With and Without Taking Account of Selection Effects.

 

    DISCUSSION
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
In this article we investigated potential trends in cognitive functioning across consecutive cohorts of Americans aged 70 years old and older. We used the 1993, 1995, 1998, and 2000 waves of the AHEAD and HRS studies. These are powerful studies because they collect cognitive performance measures on large representative samples of America's old population. At the same time, these studies use a complex, multipurpose design that poses considerable challenges to the investigation of cohort differences in cognitive functioning, including a panel design with slightly different definitions of the sample at each wave, across-wave modifications in the measurement of cognitive functioning, a mixed mode of telephone and face-to-face interviews, and various forms and levels of attrition across waves.

Our major finding is that there is little improvement of cognitive functioning across the most recent cohorts of Americans aged 70 years old and older. However, we did not reach this conclusion until we had performed a number of statistical adjustments for the nature of the cognitive measurement and for the changing composition of the cohort samples. Specifically, we removed one particular item from the cognitive battery about naming the Vice President because the item appeared very dependent on the specific political context at each wave and thus was not a good indicator of individual cognitive functioning. Using regression analysis, we then adjusted the so-modified cognitive performance scores for differences between the cohorts that purely reflected features of the HRS–AHEAD design and that might artifactually produce cohort differences—repeated exposure to the cognitive measures necessitated by a panel design and the changing age composition caused by the sample replenishment in some, but not all waves. The adjustments drastically reduced the apparent increase in unadjusted scores of cognitive performance across waves, in most cases leaving only nonsignificant differences. In other words, the difference between the first and later waves was largely a function of the fact that respondents faced the cognitive tests for the first time in 1993 but most of them were familiar with the tests in later waves. We then additionally adjusted the modified cognitive scores by the sociodemographic characteristics of gender, race, and ethnicity. These characteristics varied across AHEAD and HRS waves and are known to relate to cognitive functioning; thus they could artifactually produce cohort differences in cognition. Controlling on these characterisitcs did not change the statistical significance of differences between waves from the pattern observed after controlling only on the design-driven covariates. Our final adjustment was for the respondent's level of educational attainment, which also did not change the significance of any between-wave differences.

We also addressed potential effects of attrition that varied somewhat across waves and of the mixed-mode design. Using the Heckman procedure, we tested the potential biasing effect of attrition and found little or none. On the potential confounding by mode, we argued for not adjusting for it, because the assignment of mode is entirely confounded with respondent characteristics, and including in the regression a mode effect in addition to the respondent characteristics would lead to a misspecified equation.

Our results differ from the results by Freedman and colleagues (2001Go, 2002)Go. These authors had demonstrated a decline in cognitive impairment between the cohorts captured in the 1993 and the 1998 AHEAD waves. We believe that the differences in findings can be attributed to a number of differences between their studies and our studies. First, Freedman used only the first and the third AHEAD waves. According to our analyses, the first wave was somewhat atypical, particularly when no adjustments for previous exposure to the cognitive tests were made. The other three waves were more similar. Second, Freedman did not adjust for previous exposure. Our analyses (and the literature) demonstrate that previous exposure improves performance scores on cognitive tests and that this factor was particularly critical in explaining differences between the first wave, when respondents saw the cognitive measures for the first time, and later waves, when most respondents had seen the cognitive tests before. Third, we modified the total cognitive score to remove a "misbehaving" item, whereas Freedman did not. Fourth, Freedman combined cognition data from self-respondents and proxy respondents and developed a cut-point to dichotomize respondents into those with and without severe cognitive impairment. In our analysis we focused only on self-respondents and used both continuous measures for the total and component scores, as well as a cut-point for the total cognition score that captures both moderate and severe cognitive impairment.

Other aspects of the analyses by Freedman and her colleagues were similar in objectives to ours but utilized different methodologies to address the objectives. For example, Freedman and colleagues were equally concerned as we were about the missing answers to some of the cognitive tests, but where they established in a sensitivity-type analysis the range of effects with different assumptions about the nature of the missing answers, we imputed the missing answers by reference to other nonmissing answers. (We also did a type of sensitivity analysis: we redid all of the models shown in Tables 2 and 3 by using only cases that had no missing data on any of the cognitive items. The coefficients were all either not statistically significant or, if significant, indicated decline rather than improvement in cognitive status across the four waves.) Similarly, both sets of authors were concerned about nonrespondents to the entire interview. The analysis by Freedman and associates again used a sensitivity analysis to test the influence of different assumptions about the nature of those not available for an interview. We instead estimated a Heckman-type of selection model to evaluate the effect of interview nonresponse. Neither their analysis nor our analysis of item and interview nonresponse changed the respective main findings; Freedman and her colleagues continue to find cohort differences, and we continue not to find them.

Our analyses have their own limitations. First, our observation period is relatively short. We compare cohorts of 70- year-old and older Americans at four time points spanning a 7-year period. The societal changes mentioned at the beginning of this article may require a longer time period to exert their effect on cognitive functioning. Second, and in a related fashion, the cohorts are broadly defined in our investigation and as a consequence overlap to a considerable extent; in other words, the 1923+ cohort in 1993 encompasses still mostly the same birth years as the 1930+ cohort in 2000. This fact makes the finding of pronounced cohort differences between them quite unlikely. Third, in our investigation cohort effects cannot be separated from time effects because we do not track cohorts younger than 70 years of age. Therefore, it is possible that the lack of cognitive improvement observed for old Americans is in fact true for the entire population over the years of 1993, 1995, 1998, and 2000. Fourth, the battery of cognitive tests—although more detailed than is typical for a representative population survey—is limited in its dimensionality and reliability compared with the extensive batteries used by mainstream cognitive psychologists. This is a limitation inherent in many large, multipurpose studies and is best addressed by coordinating smaller intensive studies with larger, less intensive ones, thereby allowing the strengths of both types to be combined. We believe that, with these qualifications in mind, our article provides a piece of the answer to the exceedingly important question about whether cognitive performance has improved across recent cohorts of older Americans.


    Acknowledgments
 
This study was supported by a cooperative agreement (U01-AG09740) between the National Institute on Aging (NIA) and the University of Michigan.

W. Rodgers and M. B. Ofstedal dedicate this article to the memory of their dear colleague and coauthor, A. Regula Herzog, who was the person primarily responsible for the development of the cognitive section of the HRS and AHEAD interviews and who fully participated in all phases of the writing of this article prior to its submission but who died before it could be published.


    Footnotes
 
Decision Editor: Charles F. Longino, Jr., PhD

Received for publication August 19, 2002. Accepted for publication January 29, 2003.


    References
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 




This article has been cited by other articles:


Home page
GerontologistHome page
M. G. Parker and M. Thorslund
Health Trends in the Elderly Population: Getting Better and Getting Worse
Gerontologist, April 1, 2007; 47(2): 150 - 158.
[Abstract] [Full Text] [PDF]


Home page
Research on AgingHome page
D. Alley, K. Suthers, and E. Crimmins
Education and Cognitive Decline in Older Americans: Results From the AHEAD Sample
Research on Aging, January 1, 2007; 29(1): 73 - 94.
[Abstract] [PDF]


Home page
Journals of Gerontology Series A: Biological Sciences and Medical SciencesHome page
M. G. Parker, K. Ahacic, and M. Thorslund
Health Changes Among Swedish Oldest Old: Prevalence Rates From 1992 and 2002 Show Increasing Health Problems
J. Gerontol. A Biol. Sci. Med. Sci., October 1, 2005; 60(10): 1351 - 1355.
[Abstract] [Full Text] [PDF]


Home page
Journals of Gerontology Series B: Psychological Sciences and Social ScienceHome page
V. A. Freedman and L. G. Martin
Commentary on "Trends in Scores on Tests of Cognitive Ability in the Elderly U.S. Population, 1993-2000": Beyond Inconsistent Results: Finding the Truth About Trends in Late-Life Cognitive Functioning
J. Gerontol. B. Psychol. Sci. Soc. Sci., November 1, 2003; 58(6): S347 - 348.
[Full Text] [PDF]


Home page
Journals of Gerontology Series B: Psychological Sciences and Social ScienceHome page
W. L. Rodgers and M. B. Ofstedal
"Trends in Scores on Tests of Cognitive Ability in the Elderly U.S. Population, 1993": Authors' Reply to the Commentary
J. Gerontol. B. Psychol. Sci. Soc. Sci., November 1, 2003; 58(6): S348 - 349.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
PubMed
Right arrow PubMed Citation


HOME ARCHIVE SEARCH TABLE OF CONTENTS