| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
RESEARCH ARTICLE |
1 Age and Cognitive Performance Research Centre, University of Manchester, England.
2 Department of Biomathematics and Statistics, University of Lancaster, England.
3 Department of Psychology, University of Northumbria, England.
Address correspondence to Patrick Rabbitt, Age and Cognitive Performance Research Centre, Zochonis Building, University of Manchester, Manchester, M13 9PL, United Kingdom. E-mail: rabbitt{at}psy.man.ac.uk
| Abstract |
|---|
|
|
|---|
MANY excellent longitudinal studies of cognitive change have had the same basic aims. The most general has been to determine the average form of the trajectory of age-related change and, in particular, whether or not the average rates of change accelerate in old age (e.g., Hertzog & Schaie, 1988
; Rabbitt, 1993a
; Schaie & Strother, 1968
). A corollary aim has been to determine whether rates of change differ between different mental abilities or are similar for all (e.g., Arenberg 1974
; Colsher & Wallace, 1991
; Heron & Chown, 1967
; Hertzog & Schaie, 1988
; Hultsch, Hertzog, Small, McDonald-Miszczak, & Dixon 1992
; Johansson, Zarit, & Berg, 1992
; Lansen, 1997
; Owens, 1953
, 1966
; Powell, 1994
; Rabbitt, 1993a
; Schaie, 1996
; Schaie & Labouvie-Vief, 1974
; Schaie & Strother, 1968
; Schaie & Willis, 1993
; Terman & Oden, 1947
, 1959
). A third aim has been to test how rates of cognitive change are affected by demographic factors such as educational and social advantage (e.g., Bosworth, Schaie, & Willis, 1999
; Evans et al., 1993
; Forner, 1972
), by gender (e.g., Bosworth et al., 1999
; Voitenko, & Tokar, 1983
), by epidemiological factors such as general health (e.g., Bell, Rose, & Damon, 1972
; Birren, Butler, Greenhouse, Sokoloff, & Yarrow, 1963
; Costa & McCrae, 1980
; McInnes & Rabbitt, 1997
; Rabbitt, Bent, & McInnes, 1997
), by specific pathologies (e.g., Hertzog, Schaie, & Gribbin, 1978
), by maintenance of physical mobility and engagement in everyday physical activities (e.g., Clark, 1960
; Clement, 1974
; Dirken, 1972
; McInnes & Rabbitt, 1997
), or by genetic factors (e.g., Bank & Jarvik, 1978
; Payton et al., 2003
; Pendleton et al., 2002
; Terman & Oden, 1947
, 1959
). This raises the general issue of the extent and etiology of individual differences in trajectories of aging. Prima facie, because individuals are affected in different ways and to different extents by their lifestyles, health histories, and genetic factors, we might expect that their trajectories of aging correspondingly diverge so that variance in performance between members of a sample will increase as the members age (Morse, 1993
; Rabbitt, 1982
, 1993a
). It follows that individual differences in rates of change provide more information about the functional determinants of cognitive aging than do average trajectories of decline.
Achievement of these general aims has been frustrated by persistent methodological problems. One issue is that analyses have simply regressed performance data across successive timeassay points. Neglect of longitudinal correlations in the data can lead to incorrect inference. A second issue is that when participants are repeatedly assessed on the same or similar tasks, improvements with practice may lead to underestimates of true rates of change and, in particular, may disguise an age-related acceleration of rate of decline. Further, if participants improve more on some tasks than others, analyses may incorrectly conclude that the particular mental abilities that support these tasks decline at different rates. Useful discussions and some empirical investigations (e.g., Zelinski & Burnight, 1997
; Zelinski, Gilewski, & Stewart, 1993
; Zelinski & Stewart 1998
) suggest that, because patterns of correlations between scores on different tests remain stable across successive assessments, improvements are similar across tasks and thus do not mimic or mask differences in the rates of decline of different mental abilities. This hope has been vitiated by cross-sectional studies showing that practice effects vary with complex interactions between individuals' overall levels of general fluid mental ability and the particular kinds of tasks on which they are compared. Less able and older individuals show greater initial and overall improvements on easy tasks (Rabbitt, 1993b
), but on difficult tasks the more able and younger show much greater immediate and sustained gains (Rabbitt, Banerji, & Szemanski, 1989
). Similar interactions among the effects of practice, of individual differences in ability, and of task difficulty in longitudinal data would conceal age-related declines on simple tasks on which less able older individuals show relatively greater improvements and exaggerate apparent declines on difficult tasks on which they show relatively smaller improvements. Unless analyses determine the relative sizes of practice effects both for different tasks and for older and younger and more and less able individuals, they will incorrectly estimate true rates of overall age-related decline and may also misleadingly suggest that performance declines more rapidly on some tasks than on others.
A third, well-documented, but incompletely resolved methodological problem has been that older, frailer, and less able participants drop out of longitudinal studies earlier than the younger, healthier, and more able. Thus, successive data points reflect the performance of a progressively more elite subset of the original sample, and the true extent of cognitive changes is disguised (e.g., Baltes, 1968
; Forner, 1972
; Lachman, Lachman, & Taylor, 1982
; Lindenberger, Singer, & Baltes, 2002
; Mason & Mason, 1973
; Nesselrode & Baltes, 1979
; Palmore, 1978
; Schaie, Labouvie, & Barrett, 1973
; Schlesselman, 1973a
, 1973b
; Schulsinger, Knop, & Mednick, 1981
).
Parenthetically, a typical practice in longitudinal investigations is not to recruit a single sample of participants who are thereafter followed until the study ends but rather to continue to recruit new waves of participants, at least throughout the early years. The possibility that the cohorts recruited in successive waves may differ from each other both in demographics and in overall levels of ability gives rise to a corollary, and largely unexplored, methodological problem that may be termed the "drop-in effect." Unless analyses take recruitment cohort differences into account, estimates of rates of cognitive decline will be misleading, especially if cohorts differ more on performance of some tasks than of others.
Apart from obvious age differences, participants who withdraw early from longitudinal studies tend to have poor levels of general health, education, and socioeconomic advantage. Men also tend to drop out earlier than women (Rabbitt, Watson, Donlan, Bent, & McInnes, 1994
). Such trends can lead to complex misinterpretations. For example, because women tend to perform better than men on some verbal learning tasks (Rabbitt, Donlan, Watson, McInnes, & Bent, 1996
; Rabbit et al., 2002
), the rates at which verbal learning declines with age may be underestimated unless gender differences in drop-out are taken into consideration.
Some investigators have estimated the effects of selective drop-out by comparing patterns of differences between age groups observed in initial cross-sectional screenings of a volunteer population against the patterns of age-related changes that become apparent as longitudinal data are accumulated. Because patterns of age-related differences revealed by cross-sectional and longitudinal comparisons seem very similar, investigators have concluded that selective drop-out may not always lead to serious misinterpretations (e.g., Sliwinski & Bushke, 1999
; Zelinski & Burnight, 1997
; Zelinski et al., 1993
; Zelinski & Stewart, 1998
). The comparison of cross-sectional against longitudinal trends is a useful exploratory step that can tell us whether drop-out affects the relative amount of change in different cognitive abilities, but it does not reveal the extent to which drop-out has masked the actual amount of changes. In particular, such comparisons do not show whether substantial progressive increases in variability between members of an aging population have been masked by selective withdrawal of the oldest and less able. We believe that only longitudinal studies can properly address all of the effects of drop-out on population changes in performance over time. Thus, a main aim of the analyses described here was to model changes over time to take account of drop-out effects.
To consider how this may be done, we find it important to distinguish among three different drop-out scenarios (Lindenberger, 2002
; Rubin, 1976
). The first is the completely random drop-out: The drop-out process is independent of the measurement process. The second is the random drop-out: The drop-out process is dependent on the observed measurements prior to drop-out but is independent of the measurements that would have been observed had the participant not withdrawn. The third is the informative drop-out: The drop-out process is dependent on the measurements that would have been observed had the participant not dropped out. Not surprisingly, analyses made under the informative drop-out assumption are fraught with difficulty. The results of such analyses typically depend on modeling assumptions that are difficult or impossible to check from observed data. For example, in most observational studies it is extremely difficult even to identify the precise time at which a participant made a decision to drop out. In contrast, analyses under the assumption of completely random drop-out are generally straightforward because no distinction need be made between measurements that are unavailable because of drop-out and those that are unavailable because they were never intended. Put another way, completely random drop-out implies that the incomplete data can simply be treated as if from an unbalanced experimental design, with no commonality to the times at which measurements are made on different subjects.
The simplicity of analysis under the completely random drop-out assumption is bought at a price. If this assumption is invalid, then so may be the resulting inferences about the measurement process. However, if likelihood-based methods of inference are used, validity is retained under the weaker assumption of random drop-out. This is important because longitudinal data are typically correlated over time. This means that even when the true drop-out process is informative, the most recent measurements on a given subject before drop-out are partially predictive of the missing measurements after drop-out. By allowing for the effects of these measurements on drop-out (which is what the random drop-out assumption implies), we can partly compensate for the missing information (see, e.g., Scharfstein, Rotnitzky, & Robins 1999
, and the associated discussion). To appreciate how the likelihood-based methods automatically make this kind of compensation, Diggle, Liang, and Zeger (1994
, chap. 11) showed a simulated data set from a model in which the mean response is constant over time, but the probability of drop-out for any given subject at any given time is a decreasing function of that subject's most recent measurement. The effect of this random drop-out mechanism is that low-responding subjects progressively drop out, leading to an apparent rising trend in the mean response over time as the observed mean is calculated from the progressively more selective subpopulation of survivors. This rising trend is what would be estimated by a naive regression analysis of the data that ignores both the drop-out process and the longitudinal correlation in the data.
An important implication is that, for longitudinal data with drop-out, there is no reason why a fitted mean response curve should track the observed mean response trajectory of the survivors. In contrast, a model fitted by likelihood-based methods under the assumption of random drop-outs estimates what the mean response would have been if it had been possible to follow up the entire study population. That is, the observed means are estimating the mean response conditional on not dropping out before the end of a determined census period. The unconditional and conditional means coincide only if the data are uncorrelated in time, or if the drop-out process is completely random. We would argue that neither of these assumptions is plausible for most data from longitudinal studies of cognitive aging. We therefore conclude that a likelihood-based analysis under a random drop-out assumption is a sensible analytic strategy because it focuses on the complete study population, rather than on the progressively self-selected subpopulation of subjects who do not drop out of the study. We used this analysis to examine the data set described in the paragraphs that follow.
The present study was made to investigate the extent to which improvements associated with repeated testing and by selective drop-out and drop-in effects have obscured answers to the basic questions about the nature of age-related cognitive changes. The analyses also addressed substantive hypotheses on the true relationships that would be revealed when practice effects have been identified and drop-out has been taken into consideration.
First, practice effects would be found to be substantial and large enough to disguise the true rates and forms of trajectories of longitudinal changes. Second, the sizes of practice effects would be found to differ both between different kinds of tasks and also between more and less intellectually able, and so implicitly younger and older, participants (as has been found in brief, cross-sectional laboratory studies by Rabbitt, 1993b
, and Rabbitt et al., 1989
). Third, when the true forms of trajectories of age-related changes can be established, rates of cognitive change will be found to accelerate with age and to differ with task demands. It will also be possible to more accurately determine the extents to which individual differences in rates of change vary with demographic variables such as gender and socioeconomic status and with individual differences in level of general fluid mental ability (gf). Fourth, after the effects of demographics, gender, and individual differences in ability have been taken into consideration, variance in cognitive performance between participants will be found to increase significantly as the sample ages.
To examine the effects of practice, we used a drop-out and drop-in random effects model to analyze data from successive presentations of the same battery of six different tasks during a 17-year longitudinal study of 5,899 healthy, community resident, older people. The model allowed us to determine the true sizes of practice effects, the extent to which practice effects differ between tasks, and the extent to which practice effects and Task x Practice interactions differed between individuals of different ages and levels of gf.
| METHODS |
|---|
|
|
|---|
|
|
Procedure
Volunteers traveled independently to laboratories in Manchester or Newcastle-upon-Tyne, where they were tested in groups of 10 to 15. Sessions were conducted in large quiet rooms by two experimenters who checked that participants with visual or auditory problems had brought their prescribed prostheses and were not inconvenienced. Sessions lasted, on average, for 90 min with 15-min tea and coffee breaks. Volunteers each received £5 (U.K.) for each session to cover their travel expenses. The tests were administered over two successive sessions within a period of 8 weeks.
Cognitive Tests
During the first testing session, volunteers completed the Heim (1970)
AH4-1 and AH4-2 intelligence tests, and the Raven (1965)
Mill Hill "A" and "B" (MHA and MHB) vocabulary tests. During the second session, they completed a cumulative verbal learning (CVL) task and a verbal free recall (VFR) task.
Both AH4 tests are well-standardized measures of gf and correlate strongly (R =.7.8) with instruments such as the Wechsler Adult Intelligence Scale battery (see Heim, 1970
). In each test, volunteers are given untimed and unscored practice on 5 demonstration problems with provided solutions and then have 10 min to solve as many as possible out of a total of 65 problems. AH4-1 problems include equal numbers of logical reasoning tests, verbal comparisons, and arithmetic and number series problems. AH4-2 problems are nonverbal, involving addition and subtraction of complex shapes, completion of logical series of shapes, and matching by mental rotation of irregular shapes. For each AH4 test the scores analyzed are the numbers of problems correctly solved in 10 min.
The MHA recognition vocabulary test comprises a list of 33 different rare words, each accompanied by a set of 6 different words from which participants must select the most appropriate synonym. The MHB production vocabulary test comprises a list of 33 different rare words for each of which participants provide as accurate and precise a definition as possible (Raven, 1965
). Both MH tests are untimed. Scores analyzed consist of the number of correct synonyms identified or definitions given.
In the CVL test, a list of 15 different three-syllable words matched for concreteness and for frequency (1/10,000, Kucera & Francis, 1967
, norms) are projected, one item at a time, by a Kodak "Carousel" projector at a rate of 1.5/s. Words appear in Times Roman print, boldface, as a string that is 150 cm long and 15 cm high on a projection screen that is no further than 5 m from any member of the group tested. The room is blacked out to maximize visibility, and participants' results are not recorded unless the participants remembered to bring prescribed spectacles and have no difficulty reading the displays. After the first presentation of the 15 words, participants write down as many as they can recall, in any order they wish, on the first page of an answer booklet. They then turn to a new page and the 15 words are then presented in a new random order, and recall is again attempted. Scores analyzed are the total numbers of words correctly recalled over four such presentations.
In the VFR test, 30 words, selected and matched as for the CVL task, are presented once, in random order, at a rate of 1 item/1.5 s. Volunteers then immediately write down as many as they can recall in any order they wish. Scores analyzed are the total numbers of words correctly recalled.
Description of Test Data and Exploratory Analysis
Because the tests have total scores ranging from 30 to 65, each participant's raw scores were converted to a percentage correct to provide a common measurement scale. Mean scores at entry, broken down by the selected demographic groups, are shown in Table 3.
|
|
|
Our substantive hypotheses concern claims that, first, particular elements of ß are nonzero, reflecting real, group-level effects on test outcomes; second, particular terms in the variance matrix of the random effects are nonzero, reflecting dependencies between the different outcomes achieved by a given subject on different tests or between a given subject's general level of achievement and his or her rate of change with age (in both cases, measured relative to the subject's peer group as defined by the explanatory variables).
The normalized score corresponding to each Yijk is Yijk,* where
|
|
Combining Equations 1 and 2 leads to
|
|
Comparing Equations 1 and 3 with respect to the random effect terms A and B, we see that they are essentially identical. Specifically, if the Aij and Bij are multivariate normally distributed, then so are Aij* and Bij* and vice versa. Further, because the transformations from the variants with and without asterisks are made componentwise, then any two random effects that are uncorrelated before transformation remain so after transformation and vice versa. In particular, a test for whether any two "without asterisk" random effects are or are not correlated is equivalent to a test that the corresponding "with asterisk" random effects are or are not correlated.
With respect to the regression parameters, the important difference between Equations 1 and 3 is that the ß* parameter has acquired a subscript i. This implies that the substantive meaning of a hypothesis involving interactions between explanatory variables and cognitive tests is indeed different on the raw and standardized scales (e.g., a hypothesis that the average effect of a 1-year increase in age is numerically the same across all cognitive tests). However, the substantive meaning of a hypothesis concerning main effects is unchanged (e.g., a hypothesis that an increase in age does or does not affect the average response to a particular cognitive test).
The overall conclusion is that although the numerical values of estimated parameters would be affected by a change from raw to standardized scores, the substantive conclusions sought from the analysis and claimed in this article are not.
Model specification
The first methodological issue to which we have drawn attention is the need to take account of the fact that measurements taken over time on the same individual tend to be correlated. There are several approaches to model this correlation structure, and the choice depends on the main scientific questions of interest. Our present aims are (a) to identify factors predictive of cognitive decline at the population level and (b) to gain insight into individual differences relative to the population levels. This leads us naturally to the random effects model, which has two parts: a model for the average response over time for respondents with given values of all explanatory variables, and a model for the random variation about the mean response. For the second component, we postulate a set of latent variables, or "random effects," which represent deviations of individual respondents from the population average for some relevant features. This random variation occurs in addition to the residual variation.
To address the substantive hypotheses set out in the introduction, we need to describe for each cognitive test how the population-average scores depend on the following set of explanatory variables: age, gender, socioeconomic status (SES), city of origin (Manchester or Newcastle), wave of recruitment to the study (cohort), level of general intellectual ability (gf) as indexed by AH4 test scores, and whether tests are being taken for the first, second, third, or fourth time (practice effects). Accordingly, the analysis was made to address the combined effects of age, gender, SES, and practice effects, which are of substantive interest, whereas effects distinguishing cities and year-of-entry cohorts are included as a means of adjusting for unidentified confounding factors. The effects of differences in level of intellectual ability (I gf) are then also explored.
We now introduce the following notation. For a given response, let Yij denote the percent correct score for the ith subject on the jth occasion. Hence i = 1-n (the number of subjects with at least one response measure) and j = 14. We use xijk, where k = 1-p, to denote the values of the set of p explanatory variables associated with each Yij. In particular, let xij1 denote age (in years over 49, the minimum entry age) and xij2 denote age squared. Improvement at the three repeat test occasions (j
2) is modeled as a series of step functions. Then the mean value of Yij is µij defined by
|
|
|
|
|
|
2E. Further random effects (such as quadratic age terms or practice effects) are not considered. As described in the paragraphs that follow, age appears to have little or no effect on the MH tests. For this pair of tasks, we therefore omitted the subject-specific random slope effects Bi from the model. We used maximum likelihood, using the mle( ) function within the Splus software environment, for model estimation under the assumption that drop-out was random (i.e., the probability that a respondent drops out may depend on his or her observed measurement history, but not on the unobserved responses).
We fitted separate models to each response, adopting the same method for selecting the mean structure.
| RESULTS |
|---|
|
|
|---|
Selection of the Mean Structure
Based on the empirical evidence of a declining age effect for the AH4-1, AH4-2, CVL, and the VFR tests, the following steps are used to derive a mean model for these responses.
Assuming that the relationship between cognitive decline and age is captured adequately by a linear and a quadratic term, we adopt the following step-down approach to test whether there is any evidence that the overall age trajectory differs with SES, gender, or practice. Based on participants from Newcastle who, overall, were followed up for longer than those from Manchester, a sequence of models is fitted. Under the assumption that the improvement at the second data-collection point adequately captures the Practice x Age interaction, we fit models allowing the quadratic curve to depend on any combination of SES, gender, and improvement. We then fit simpler interaction models, retaining the simpler model at each step if the generalized likelihood ratio test (LRT) for the additional terms is nonsignificant at the conventional 5% level. We do not separately test for an interaction with the linear or quadratic component of age. Hence, a test comprises 2(m - 1) degrees of freedom, where m is the number of levels of the factor in question.
There are no significant interactions with age for AH4-1 or VFR, whereas there is a significant single interaction for Improvements at Visit 2 x Age for AH4-2, and for Gender x Age for CVL. Repeating these steps but considering only main effect practice terms results in the same "final" models for AH4-1, CVL, and VFR, whereas for AH4-2 the age quadratic now depends on the Gender x SES combination. Although they are statistically significant, all these interactions have very small effects that we considered unimportant in substantive terms. We also noted that the covariance matrix estimates for the random effects were robust to the choice of mean model (i.e., main effects only or with age interactions). For these reasons we proceed to fit models without interactions to the combined data set, including a term for city of residence, and test the quadratic age trend against the linear term for these four responses. In all cases this was highly significant (p <.0001, 2 df).
For the vocabulary tests the linear age term based on the complete Manchester and Newcastle data set was statistically significant for MHB but not MHA scores, with estimated mean declines of -0.06 and -0.03 per annum, respectively. Although these trends are clearly too slight to be of substantive importance, predicting a decline of only 0.6% and 0.3% respectively over 10 years, the linear age term was retained in order to better estimate the covariance structure.
Before describing the mean estimates for each response pair and comparing the parameter estimates of substantial interest, namely the age trends and practice components across tests, we now give a brief explanation of the parameter estimates. In all models the first level of each factor is the reference group. Thus, the intercept parameter represents the percentage score for a respondent who is in socioeconomic category (C) 1; female; of age 49; and a resident in Manchester and taking the test for the first time in 1983. The improvement parameters measure the average step increase between successive testing occasions (j
2). The effects for socioeconomic categories (Cs) from 2 to 45 represent the estimated difference in mean percent correct scores between socioeconomic groups 2 to 45 and socioeconomic group 1. The entry year values represent the difference in scores between 19841992 and 1983, and the city term gives the mean difference between Newcastle and Manchester participants. This allows us to test the substantive working hypotheses that, when practice and drop-out have been taken into consideration, tests will show accelerated age-related declines, and that rates of decline will be seen to differ between tests, and also between individuals of higher and lower gf.
AH4-1 and AH4-2 Scores
Tables 4 and 5 summarize the estimated mean effects for AH4-1 and AH4-2, respectively. The AH4-1 intercept of 66.0 is 4.9 points higher than the AH4-2 intercept. The fact that both quadratic coefficients are negative indicates that scores on both these tests show accelerated decline with age. The rates of decline are very similar. For example, consider the entry scores: On AH4-1, the average score for a 60-year-old would be 64.6, and for a 70-year-old, 58.6 (a 6-point drop), falling to 46.5 (a further 12-point fall) for an 80-year-old. On the AH4-2 task, the corresponding scores would be 58.2, 52.1 (6-point drop), and 38.6 (a further 13-point drop). This supports the first working hypothesis that, after practice and drop-out are taken into consideration, at least on tests of gf, the rates of decline are seen to accelerate with increasing age.
|
|
On average, men performed significantly better than women on both of the AH4 tests. Tables 4 and 5 show that, after the other covariates in the model are adjusted for, scores on both AH4 tests markedly vary with SES category. Participants in C1 score at least 20 points more, on average, than participants in C45. Another demographic factor, city of residence, also affects AH4 test scores. On the AH4-1, Manchester residents score higher than Newcastle residents, but on the AH4-2, this city effect is not significant. Implications of this male advantage are discussed in the paragraphs that follow. The overall effects of SES are as expected.
There are also significant differences between waves of recruitment; that is, there are significant drop-in effects. Individuals who entered the study in 1989 and 1990 have markedly higher scores on both AH 4 tests than those who entered in the first year of the study, 1983. Note that a principal aim of the analysis is to examine the hypothesis that rates of change over time are affected both by demographic factors and by overall levels of gf, and this cannot be properly addressed if this drop-in effect is neglected.
Cumulative Verbal Learning and Verbal Free Recall Scores
Tables 6 and 7 summarize these results for CVL and VFR, respectively. Note that the intercept estimate of 75.6 for CVL is more than double the 35.5 point score on VFR, which is the lowest score overall.
|
|
On both the CVL and VFR tests, women performed significantly better than men. On the CVL and VFR tests, the differences in trends for SES Cs are much less marked than on the AH4 tests. Again, the largest difference is between C1 and C45 (10.9% and 6.7% points for CVL and VFR, respectively). On these tests, as on all others, average scores are higher for Manchester than for Newcastle residents, but in this case these differences are not significant. Interestingly, the mean CVL scores are from 3.9 to 8.3 points higher for participants who entered the study from 1984 onward, compared with those entering in 1983. VFR scores show a similar but less marked recruitment wave, or drop-in effect.
Mill Hill A and Mill Hill B Scores
Tables 8 and 9 show the results for MHA and MHB, respectively. The intercept estimate for the MHA test is 80.6, which is higher than for any other test and is 15.4 points higher than the intercept estimate of 65.1 for the MHB test.
|
|
Tables 7 and 8 show negligible positive or negative changes of 1.5 or less on MHA at all repeat testings, and on MHB for the first and second repeat testings. The estimated improvement of 9% points on MHB on the last testing occasion is an unexplained anomaly. Scores on both the MHA and MHB tests are significantly higher for men than for women, and for Manchester residents than for Newcastle residents. The differences between SES Cs were substantial, and they were comparable in magnitude with the effect sizes for AH4. Note that, in contrast to all other tests, on MHA the average entry year scores for cohorts recruited from 1984 onward were lower, though not significantly so for those starting in 1983. There was no clear pattern for MHB. This illustration that successive recruitment cohorts may differ on some tests though not on others emphasizes that drop-in effects may be complex and must be taken into consideration when one analyzes longitudinal data.
Comparisons of Practice Effects Between Tasks
For AH4-1, AH4-2, and CVL, rates of decline are similar and accelerate with age. For VFR, the rate of decline is linear, and MH scores remain stable over time. The size of the average practice effect on the second occasion varies between tasks: Gains of over 4.5% points on the AH4 tests contrast with gains of 1% or less on the other tasks. On the third and fourth occasion, a gain of over 4.5 for AH4-2 and CVL contrasts with negative estimates for several other tasks. However, note that these differences may reflect a lack of fit between the quadratic and improvement parameters at older ages for which relatively few data points are available.
A further question about practice effects is whether their sizes vary with the interval between successive repetitions of the tasks. The fact that some individuals missed particular retesting sessions but later returned to the study allowed us to make a secondary analysis to determine whether the substantial practice effect of over 4.5 points on the second occasion of taking the AH4-1 and AH4-2 tests varies with the duration of the interval between initial and second experiences. Restricting the analysis to individuals with scores at entry and at the second scheduled visit for each test, we found that the interval between these two time points ranged from 1 to 8 years. Because of small numbers in the lowest and highest categories, the categories we used were
2, 3, 4, 5, 6, or 7+ years. We fitted a model to each response, replacing the single-step function at the second occasion by a six-level term corresponding to the gap times. For the AH4-1 response, the mean practice effects are similar across the intervals, ranging from 3.5 to 5.2 points, with no clear trend over time. The estimates are slightly more variable for AH4-2, ranging from 3.8 to 7.9. For both responses, this model provides an improvement in fit compared with the simpler model with practice at the second visit coded as a two-level factor (p =.006 and p <.0001, 5 df). That is to say, the average sizes of improvement caused by a previous experience of either AH4 test remained the same over intervals of 2 to 7 years.
Deviation Around the Mean Response: Random Effects
The final hypotheses examined are that participants' rates of cognitive decline vary with their overall levels of gf, and that variance in performance between participants increases as the study continues, and the mean age of the sample increases.
In these models, the Ais reflect the extent to which individuals deviate from the average response value, and the Bis measure their deviations in slope, that is, in rates of decline. The maximum likelihood estimates (and 95% confidence intervals, or CIs) of the standard deviations (
A and
B) and correlation (
AB) of these random effects, assumed to be normally distributed with mean zero, are presented in the table notes of Tables 4 through 9. After adjustment for covariates, the individual estimates of the Ai and Bi are of interest as they can be used to predict intercepts and slopes of individual trajectories of change. They are usually estimated as the conditional expectation of the effects given the observed data, and they are sometimes termed empirical Bayes (EB) estimates. In addition, histograms and scatterplots can be used to detect unusual individuals.
From values of Âi and
i estimates for individuals based on the AH4-1 model, we calculate sample standard deviation estimates of 12.84 and 0.15. These are considerably smaller than the mle estimates of 15.43 and 0.41. The estimated sample correlation is weakly negative (
= -0.29), whereas the mle is strongly negative (
AB = -0.51, 95% CI of -0.58 to -0.44). Discrepancies of similar magnitudes are observed for the other responses. This suggests that the empirically observed estimates do indeed substantially underestimate the true variability in the random effects. Actually, for any linear combination of the random effects, the EB estimates are less than or equal to the true variability in the random effects (Verbeck & Molenberghs, 1997
, chap. 3). This latter result provides theoretical support for our findings.
Within this model, the issue of whether individuals' trajectories of cognitive decline vary with their basal levels of mental ability (AH4 test scores) can be approached only if we make a strong assumption that there is a particular age before which decline proceeds at a constant rate. Given this assumption, we see that estimates
AB from the models have substantive value. Assuming that this critical age is the lowest entry age in the sample, 49 years, we find that the outcome is that, among individuals who were aged 49 at entry, those with higher initial AH4-1 scores tended to show relatively more rapid cognitive decline on AH4-1 than did those with lower initial scores. As shown in Tables 4 through 7, the correlation estimates for AH4-1, AH4-2, CVL, and VFR are all significantly negative.
Nevertheless, it is important to note that these correlations are arbitrary and depend on the age used for "centering." For example, if age 65 is used instead of age 49, the correlation is approximately zero for the AH4-1 scores, indicating parity in rates of change for individuals at all levels of AH4-1 scores. They become increasingly positive as centering ages older than 65 are selected, indicating faster rates of decline for individuals with lower AH4-1 scores.
The question of whether variability between participants increases with sample age can be addressed in a similar way. For each response, the
A standard deviation estimate can be used to calculate the 95% expected range (the range over which 95% of the population values would fall) for the intercept. For example, the 95% expected ranges for the AH4-1 and AH4-2 mean entry levels are 66.0 + 1.96 x 15.4 = [35.896.3] and [32.589.8], respectively. The expected ranges for the other tests are smaller because the
A are less. We can also examine the relative variability of the Ai with respect to the corresponding fixed effect estimate. Based on the parameter estimates presented in Table 4, the relative variability is 15.4/66.0 = 23% for AH4-1. Estimates of similar size were obtained for AH4-2 (24%), VFR (27%), and MHB (21%), whereas CVL and MHA showed markedly less relative variability (14% and 15%).
The
B estimates ranged from 0.41 to 0.49 on the AH4 and CVL responses, whereas there was less variability in the individual slopes for the VFR response (
B = 0.23). Approximately 95% of the individual Bi values lie within ±1.96
B of the zero mean. Although the
B estimates for these four responses appear small, they are amplified by the multiplication with age (in years over 49) in the model. For example, a difference of 0.45 (roughly 1 SD on the AH4 and CVL tests) in the slopes for any two participants with equal cognitive function on entry to the study would result in a difference of 4.5 and 9.0 percentage points after 10 and 20 years, respectively. Because participants have different rates of decline, this implies that between-individual variability in performance increases with sample age and also that this increase in variability is most marked for those tasks with large between-participant variability. It is important to note that, because the effects of covariates such as gender, SES, and city of residence were taken into consideration in computing this variance, they cannot provide functional explanations for it. Because age is modeled as a quadratic function in the mean part of the models, neither the expected ranges nor the relative variability of the Bi can be calculated. Finally, under the specified random effect models, the residual standard deviation estimates were similar between tests and ranged from 5.7 to 9.0 percentage points.
| DISCUSSION |
|---|
|
|
|---|
Practice Effects
Practice effects are significant and substantial on both AH4 tests and on the CVL test. For example, on the second occasion, gains of over 4.5 percentage points on the AH4 tests contrast with improvements of 1% or less on the other tasks. On the third and fourth occasions, an improvement of over 4.5 is predicted for AH4-2 and CVL, contrasting with negative estimates for several other tasks. Note that these gains from practice are comparable with the declines in average scores, after practice and drop-out have been taken into consideration, of 6 points between age 60 and 70 (64.6 and 58.6, respectively) on the AH4-1 and 6 points (58.2 to 52.1, respectively) on the AH4-2.
The sizes of practice effects do differ between tasks, and between older and younger individuals. On AH4-1 and AH4-2 tasks, practice effects were indeed markedly greater for older than for younger participants, with estimated improvements between first and second testing of 1.52.5% for a 49-year-old as against over 4.5% for a 70-year-old. On all other tasks they ranged from only 0.1 to 1.6 points, and they were independent of age. Neglect of practice effects leads to underestimation of the true extent of age-related changes and may disguise the fact that they are accelerated rather than linear. Further, marked differences in practice effects between tasks and age groups may be misinterpreted as evidence that brain aging affects performance on some tasks, and so some mental abilities, earlier and more severely than others.
Practice improvements were greatest between the first and second encounters with a task, and were thereafter modest. At first sight this seems paradoxical because considerable bodies of evidence, such as those reviewed by Kausler (1990)
, show that age slows the learning of novel tasks. This would lead us to expect that, the older individuals are, the less they should improve during a longitudinal study. One explanation for this counterintuitive finding is that older individuals perform poorly when they first encounter novel cognitive tests because they need longer to understand what the tests demand of them and to accommodate to an unfamiliar environment (Rabbit, 1993b
). On this premise the large and long-lasting practice gains observed between the first and subsequent test sessions during this longitudinal study not only reflect specific task learning but also general familiarity with the testing environment and procedures. Note that this possibility carries the awkward methodological implication that, even if particular tasks are not repeated, for example, by using "parallel forms," increasing familiarity with the general testing procedures may benefit older participants more than younger participants and so counteract age differences in rates of decline. We suggest that these findings also have theoretical implications. Difficulties in coping with task novelty, and marked gains once initial problems have been overcome, are characteristics of patients with focal prefrontal cortical damage (Burgess, 1997
). In this context the present findings may be interpreted as further evidence for age-related declines in "executive" functions supported by the prefrontal cortex that enable us to cope with novel tasks (Burgess & Shallice, 1996
; Lowe & Rabbitt, 1998
; Shallice & Burgess, 1991
) This behavioral evidence has been assumed to reflect neurophysiological findings that the prefrontal cortex suffers earlier and more rapid neurophysiological and cerebrovascular changes than other areas (Gur, Gur, Orbist, Skolnik, & Reivitch, 1987
; Haugh & Eggers, 1991
; Scheibel & Scheibel, 1975
; Shaw et al., 1984
). In this framework of interpretation, it is a surprising new finding that, once experienced, tasks and testing situations do not regain "novelty" through disuse, even over periods as long as 7 years.
Drop-Out
We have argued that these likelihood-based analyses under random drop-out assumptions allow good estimates of what actual trajectories of change would have been had drop-out not occurred and so permit more realistic estimates of how rates of age-related cognitive change differ between age groups, socioeconomic groups, and gender groups. Note, however, that these analyses adjust for, but do not give information about, drop-out effects. The relationship between volunteers' propensity to drop out and their cognitive measurement profiles, their gender, socioeconomic category, or general health status are different questions of substantive interest in their own right. We propose to investigate these relationships by using informative drop-out models. The results will be reported separately in due course.
The analysis also detected, and took into consideration, significant differences between the average levels of ability of cohorts recruited at different points during the study. These drop-in effects differed between tasks. On the AH4 and CVL tests, cohort recruitment differences were large enough so that interpretation of the data would have been affected if they had been neglected. In contrast, they were negligible on the MH vocabulary tests. This implies that analyses must not assume that cohort differences on any single "benchmark" test can be taken as representative of differences on all other tests.
The remaining working hypotheses were that, after practice and drop-out effects had been considered, it would be possible to more accurately determine actual rates of changes, and so to discover whether these are constant or are accelerated by increasing age, and whether they differ between different kinds of tasks, between more and less able individuals, and with demographic factors such as gender and socioeconomic advantage. Finally, it was predicted that after all of the aforementioned factors had been taken into consideration, variance in cognitive performance between members of a sample would be seen to significantly increase as the members age.
Does Rate of Cognitive Decline Accelerate With Sample Age?
After practice and drop-out effects were adjusted for, there was clear evidence that rates of decline accelerated with age on the two AH4 tests and the CVL task.
Do Scores on Different Cognitive Tests Decline at Different Rates?
On the AH4-1 and AH4-2 tests and on the CVL task, declines accelerated with age. Declines in VFR scores were less marked and were linear rather than accelerated. On the MHA and MHB vocabulary tests, there was little or no decline. This last finding agrees with the consensus of previous studies that declines in tasks that are assumed to be supported by gf contrasts with stability on tests such as the MHA and MHB vocabulary tests, in which performance is supported by "crystallized" knowledge acquired over a lifetime and maintained by practice in old age (Horn, 1982
). The different trajectories of change for the CVL and VFR tests also provide a longitudinal confirmation of Horn's (1982)
many cross-sectional demonstrations that age affects performance on some tests of fluid mental abilities more than on others.
How Are Rates of Decline Affected by Gender, by Level of Socioeconomic Advantage, and by Individual Differences in General Intellectual Ability?
On average, men performed better than women on the AH4-1 and AH4-2 and MHA and MHB, but women performed better than men on the CVL and VFR tasks. Superiority of men on the AH4 and MH tests may partly be explained by the fact that, for these generations of participants, women had much poorer educational and career opportunities, most especially in the industrial North of England. The finding of superiority of women on CVL and VFR tests confirms and extends cross-sectional comparisons within this sample by Rabbitt and colleagues (1996)
. The gender effect on CVL scores appears to be complex, because it also depends on age. The advantage in CVL scores for women is relatively small at young to middle ages and thereafter widens. One possible explanation for this might be that because women live longer they also retain mental competence later in life. However, this seems unlikely because there is no similar Gender x Age interaction on any other task. In our view, and in the absence of other evidence, the particular advantage for verbal learning (CVL) is as likely to reflect lifestyle factors as intrinsic differences in the level and the maintenance of particular mental abilities. These and other hints of interactions between differences in lifestyle and preservation of particular abilities in old age require further investigation.
There were marked differences in cognitive performance between socioeconomic categories on all tests. The mean difference between occupational groups C1 and C45 was over 20 percentage points on the AH4 and MH tasks and 711 percentage points on the CVL and VFR tasks. In spite of this clear evidence that SES affects overall levels of performance, there is no evidence that it differentially affects rates of decline. This is unexpected because SES is a good proxy for many factors that are known to slow biological decline, such as level of general health and of lifetime health care, level of education, and exposure to toxicity (Kitagawa & Hauser, 1973
). Socioeconomic disadvantage is also associated with higher and earlier mortality in later life, and there is robust evidence that approach to death reduces level of cognitive performance during longitudinal studies (Berkowitz, 1964
; Bosworth et al., 1999
; Botwinick, West, & Storandt, 1978
; Jarvik & Blum, 1971
; Johannsen & Berg, 1989
; Lieberman, 1965
; Rabbitt et al., 2002
; Reimanis & Green, 1971
; Riegel & Riegel, 1972
; Riegel, Riegel, & Myer, 1967
; Small & Backman, 1997
). There is also evidence that socioeconomic disadvantage, and in particular lower educational attainment, is linked to the prevalence of Alzheimer's disease in old age (Bonaiuto, Rocca, & Lippi, 1990
; Evans et al., 1993
; Korczyn, Kahana, & Galper, 1991
). Obviously, more detailed analyses exploring relationships among socioeconomic factors, age, and cognition in this particular population sample are required.
Even when effects of SES and gender are taken into account, Manchester residents perform significantly better than Newcastle residents on the AH4-1, MHA, and MHB tests. There is no evidence of any difference in performance between cities on CVL or VFR. These differences remain cryptic because the city term is likely to be a proxy for a variety of unidentified factors for which the modeling process could not control.
These analyses also suggest that the level of general intellectual ability of participants on entry to a longitudinal study may affect their rates of subsequent cognitive change, though not in the direction that previous research has led us to expect. If we make the reasonable assumption that, in members of this sample, cognitive decline can be dated from age 49 (the age of the youngest volunteers on entry to the study) and that individuals' rates of decline previous to age 49 had been constant, the analysis shows that after practice and drop-out effects had been considered, individuals who entered the study with higher overall levels of ability declined more rapidly than those who entered with lower levels of ability. This finding is inconsistent with previous suggestions that higher levels of performance in young adult life may be associated with longer retention of ability and with lower incidence of dementias and predementing conditions in old age (see, e.g., Snowden et al., 1996
). It does, however, agree with an analysis of data from a subgroup of this sample by Rabbitt, Chetwynd, and McInnes (2003)
based on the entirely different premise that, because individuals' scores on the MHA vocabulary test do not change with age, they can be used as proxies for their AH4 test scores in middle age and so can be compared against their current, observed AH4 test scores to estimate age-related losses.
Note, however, that the outcome of the present analysis depends on the age used for "centering" in the population. If age 65 is used for "centering," then rates of decline do not vary with levels of gf, and if ages older than 65 are used for "centering," then it appears that the less able decline more rapidly than the more able. The implications of these findings with regard to methods of analysis of individual differences in the forms of trajectories of cognitive change are currently being further explored.
As a Population Ages Does Variability Between Its Members Increase?
The standard deviation estimates provide useful insight into the amount of variability between individuals on each task. The estimated standard deviation for the linear rate of decline was similar for the AH4 and CVL tasks, ranging from 0.41 to 0.49, but for VFR it was only 0.23. The differences between the slopes for individuals give rise to increased variability in performance with age. For example, a pair of participants with equal cognitive function on entry to the study whose slopes differed by 0.4 would differ by 4 points after 10 years and by 8 points after 20 years. Differences of this size are of practical importance because they are large enough to provide useful insights into the functional causes of marked individual differences in rates of cognitive decline in old age.
There are two quite different reasons why, as the members of a sample age, they should increasingly diverge in terms of their levels of cognitive performance. One is that differing genetic legacies and lifetime health histories bring about differences in trajectories of biological aging, which will diverge over time (Rabbitt, 1982
, 1993a
). A second is that as people age and so become less able, their performance on any task on which they are tested varies more from moment to moment and, as a direct consequence, their average levels of performance also vary more from session to session and from day to day (Rabbitt, 1999
; Rabbitt, Osman, Stollery, & Moore, 2001
). As day-to-day variability increases for all members of a sample, so they will differ more with respect to each other when they are all tested on any single occasion. Thus, increasing variability between members of aging samples has at least two, functionally different, causes. The possibility of confounds between these effects means that any single, cross-sectional observation of members of a population at a particular time point will give us an inaccurate, and probably exaggerated, estimate of actual individual differences in trajectories of cognitive aging. For better estimates to be obtained, longitudinal data are essential; ideally, we also need to estimate, as far as possible, the effects of session-to-session or day-to-day variability by taking several samples of performance on each task at each successive longitudinal data point. Estimates of intrinsic within-participant variability obtained from these samples will allow long-term trends resulting from differences in trajectories of change to be more precisely determined. Such data will also be useful in showing the extent to which increases in the intrinsic variability of individuals' performance, as distinct from changes in their mean levels of performance, alter as they age.
| Footnotes |
|---|
Received for publication February 8, 2001. Accepted for publication August 18, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Rabbitt, M. Lunn, and D. Wong Death, Dropout, and Longitudinal Measurements of Cognitive Change in Old Age J. Gerontol. B. Psychol. Sci. Soc. Sci., September 1, 2008; 63(5): P271 - P278. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Rabbitt, M. Lunn, D. Wong, and M. Cobain Sudden Declines in Intelligence in Old Age Predict Death and Dropout From Longitudinal Studies J. Gerontol. B. Psychol. Sci. Soc. Sci., July 1, 2008; 63(4): P205 - P211. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Rabbitt, M. Lunn, D. Wong, and M. Cobain Age and Ability Affect Practice Gains in Longitudinal Studies of Cognitive Change J. Gerontol. B. Psychol. Sci. Soc. Sci., July 1, 2008; 63(4): P235 - P240. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lievre, D. Alley, and E. M. Crimmins Educational Differentials in Life Expectancy With Cognitive Impairment Among the Elderly in the United States J Aging Health, June 1, 2008; 20(4): 456 - 477. [Abstract] [PDF] |
||||
![]() |
J. L. Taylor, Q. Kennedy, A. Noda, and J. A. Yesavage Pilot age and expertise predict flight simulator performance: A 3-year longitudinal study Neurology, February 27, 2007; 68(9): 648 - 654. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Thorvaldsson, S. M. Hofer, S. Berg, and B. Johansson Effects of repeated testing in a longitudinal age-homogeneous study of cognitive aging. J. Gerontol. B. Psychol. Sci. Soc. Sci., November 1, 2006; 61(6): P348 - P354. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|