Home
HOME ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
PubMed
Right arrow PubMed Citation
The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 63:P235-P240 (2008)
© 2008 The Gerontological Society of America


RESEARCH ARTICLE

Age and Ability Affect Practice Gains in Longitudinal Studies of Cognitive Change

Patrick Rabbitt, Mary Lunn, Danny Wong and Mark Cobain

1 Department of Experimental Psychology, University of Oxford, England, and University of West Australia
2 Department of Statistics, University of Oxford, England.
3 Department of Research and Development, Unilever PLC, London, England.


    Abstract
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
During a 20-year longitudinal study, 5,842 participants aged 49 to 93 years significantly improved over two to four successive experiences of the Heim AH4-1 intelligence test (first published in 1970), even with between-test intervals of 4 years and longer. After we considered significant attrition by death and dropout and the effects of gender, socioeconomic advantage, and recruitment cohort, we found that participants with high intelligence test scores showed greater improvement than did those with lower intelligence test scores. Practice gains also reduced with age, even after we took into consideration the individual differences in intelligence test scores. This emphasizes the methodological point that neglect of individual differences in improvement during longitudinal studies underestimates age-related changes in younger and more able participants and the theoretical point that, like all experiences during everyday life, participation in longitudinal studies alters the ability of aging humans to cope with cognitive demands to different extents according to their baseline abilities.

Key Words: Cognitive gain • Longitudinal study • Practice gains

A MAIN goal for cognitive gerontology is to study individual differences in trajectories of cognitive change in old age and so to identify factors that accelerate and retard changes in mental abilities. This requires longitudinal designs in which participants repeatedly take the same, or very similar, cognitive tests. A new problem then arises because practice gains that are due to repeated testing can disguise declines associated with increasing age and frailty. This has long been recognized as a theoretical possibility (Schaie, 1965Go; Schaie, Labouvie, & Barrett, 1973Go) and has been shown to be a problem in the assessment of test–retest reliability of clinical diagnostic measures (Beglinger, 2005Go; Dikmen, Heaton, Grant, & Temkin, 1999Go; Falleti, 2006Go; Kulik, Chen-Lin, Kulik, & Bangert, 1984Go; Mitrushima & Satz, 1991Go). However, it is only recently that the development of new statistical models has made it possible to demonstrate formally that when test–retest intervals are as long as 4 to 7 years, practice gains are large enough to cause serious underestimations of the true rates of declines caused by increasing age, pathology, and frailty (Ferrer, Salthouse, McArdle, Stewart, & Schwartz, 2004Go; Ferrer, Salthouse, Stewart, & Schwartz, 2004Go; Rabbitt, Diggle, Holland, & McInnes, 2004Go; Rabbitt, Diggle, Smith, Holland, & McInnes, 2001Go). Because it is clear that practice gains occur in longitudinal studies, a further possibility is that they may vary between individuals. This seems probable because many brief laboratory experiments have found that elderly people learn new material more slowly (for reviews, see Craik & Jennings, 1992Go; Kausler, 1990Go; and Rabbitt, 2002Go). Another possibility, also suggested by comparisons on brief training studies, is that as intervals between successive testing become very long, older participants will forget more than young participants and so appear to benefit less from repeated testing. In either case, the acceleration of age-related cognitive decline would be overestimated.

There is also evidence that practice gains may vary with individual differences in general fluid intelligence (gF). Rabbitt, Bithell, Perdicou, Stollery, and Moore (2007 submitted) practiced 93 volunteers aged from 61 to 82 years on eight different tests of verbal learning, spatial learning, motor learning, and information processing speed. These researchers found that initial scores and subsequent rates of improvement were uncorrelated between different cognitive tests but were significantly predicted by unadjusted scores on three different tests of gF, namely, the Heim 1970Go AH4-1 and AH4-2 and the Cattell and Cattell (1960) Culture Fair test. Other studies have also shown that, irrespective of their ages, individuals with higher intelligence test scores learn new material more rapidly and remember it better than do those with lower intelligence test scores (e.g., Rabbitt & Anderson, 2006Go). This made it useful for us to check whether there are indeed marked individual differences in practice gains during longitudinal studies and whether these gains systematically differ with an individual's age between 60 and 82 years and with an individual's intelligence test scores on entry to a longitudinal study.

Apart from these methodological issues, individual differences in practice gains are in themselves substantively interesting as a further index of the age-related cognitive changes that longitudinal studies purport to assess. To confidently extrapolate from age-related changes observed in laboratory studies to changes in the ability to manage everyday life, we must recognize reciprocities between what people do and what happens to them and, even in old age, how and by how much their experiences alter their abilities.

In order to estimate and compare practice gains accurately, we also need to take account of some other frequently neglected factors. One is self-selection on entry. Volunteers for demanding longitudinal studies are unusually healthy, able, and highly motivated members of their age groups (Lachman, Lachman, & Taylor, 1982). This elite bias may be unavoidable but, because volunteers are typically recruited in successive waves over many years, it is at least possible to check whether apparent differences in trajectories of change and in practice effects are affected by demographic factors such as gender, age, differences between recruitment cohorts, differences in levels of socioeconomic advantage, and geographical locations of residence. A more difficult problem is to allow for selective attrition that occurs because older, frailer, and less able individuals die and withdraw earlier than do others. The longer studies continue, the more elite and atypical of their age groups the survivors become (e.g., Lachman, Lachman, & Taylor; Rabbitt, 2002Go; Rabbitt, Lunn, & Wong, 2005Go; Rabbitt, Watson, Donlan, Bent, & McInnes, 1994aGo, 1994bGo; Rabbitt, Wong, & Lunn, in press; Schaie et al., 1973Go). Because true trajectories of change cannot be estimated unless deaths and dropouts are taken into consideration, these must also be included in the analysis.

Data collected during the University of Manchester longitudinal study of cognitive change in healthy old age, described in detail elsewhere (Rabbitt, Diggle, Holland, McInnes, Bent, et al., 2004Go), allowed us to examine individual differences in improvements that were due to practice during two to four successive experiences of the Heim (1970)Go AH4-1 intelligence test, administered to participants at 4-year intervals over total periods of 8 to 16 years, after the effects of recruitment cohort, city of residence, gender, and socioeconomic advantage and selective attrition by death and dropout had been taken into consideration.


    METHODS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Participants, Procedure, and Materials
Researchers recruited a sample of 5,842 volunteers, that is, 2,615 residents of Greater Manchester and 3,277 residents of Newcastle-upon-Tyne, United Kingdom, by appeals on local media and by word of mouth. The 1,711 men were between the ages of 49 and 93 years (M = 65.6, SD = 7.7), and the 4,131 women were between 49 and 92 years (M = 64.4, SD = 7.8) All traveled independently to the Department of Psychology at either the University of Manchester or the University of Newcastle, where they completed batteries of cognitive tasks in quiet rooms supervised by two experienced testers. Participants were each reimbursed expenses of £5 (UK) per session. A search by HM Registry Office UK identified all 2,342 deaths between 1983 and the close of the census on July 31, 2004. Between 1983, when the study began, and July of 2003, when it ended, there were 3,204 dropouts, of whom 1,208 also died before the 2004 census. Because many dropouts could only be identified by failures to return for further testing, dates of dropout are recorded as the last session attended. The remaining 1,996 dropout participants did not drop out before July of 2003 and also survived the July 2004 census.

Earlier analyses by some of us and our colleagues, namely, Rabbitt, Diggle, Holland, McInnes, Bent, and colleagues (2004)Go, examined only Newcastle residents and found strong practice effects on the Heim (1970)Go AH4-1 intelligence test, contrasting with relatively slight, though still significant, effects on verbal learning tasks and vocabulary tests. As AH4-1 scores show the greatest practice gains, they were selected as being the most sensitive indices of possible individual differences in practice effects. The AH4-1 intelligence test consists of 64 problems with equal numbers of logic problems, verbal comparisons, arithmetic problems, and number series. After introductory practice on one question from each of these categories, participants answer as many problems as possible within 10 minutes. Scores are the percentages of correct answers. The results analyzed are from one Heim (1970)Go AH4-1 group intelligence test, which was included in a test battery that was repeated at 4-year intervals. Results for Manchester and Newcastle are closely similar and so we combine them.

Previous analyses of data from the Newcastle sample by our earlier group (Rabbitt, Diggle, Holland, McInnes, Bent, et al., 2004Go) found that participants' levels of performance on cognitive tests markedly vary with their levels of socioeconomic advantage (SEA) as categorized by reference to the UK Office of Population Census and Surveys Classification of Occupational Categories (1980). Categories are SEA C1 (n = 261), made up of professionals such as doctors, lawyers, senior managers, and academics; SEA C2, (n = 1,854), made up of other professionals such as schoolteachers, pharmacists, and junior managers; SEA C3N (n = 2,064), made up of skilled nonmanual workers such as secretaries; SEA C3M (n = 771), made up of skilled manual workers such as craftsmen, joiners, fitters, and machinists; SEA C4, (n = 433), made up of nonskilled nonmanual workers such as clerical assistants and storekeepers; and SEA C5 (n = 40), made up of nonskilled manual workers such as laborers, cleaners, and janitors. The remaining 427 did not record occupational data. Our group found that cognitive test scores also significantly differ with gender: Men scored higher on tests of gF and women higher on tests of verbal memory and learning. Recruitment cohorts also differ significantly in test scores and levels of socioeconomic advantage. Accordingly, we also entered occupational category and gender into the analyses.

We found that cognitive test scores, and rates of change, markedly varied between subsets of individuals who completed the study between 1983 and 2003, those who died during the course of the study, those who dropped out during the study but survived beyond a census of deaths completed by HM Registry of Births, Marriages and Deaths, Stockport UK in July of 2004, and those who dropped out during the study but subsequently also died before the 2004 census (Rabbitt, Lunn, & Wong, 2005Go). As the incidence and timing of deaths and dropouts might affect the sizes of practice effects, we found necessary to test for interactions between practice and death–dropout status. Accordingly, we divided participants into 11 subgroups according to their histories of survival, dropout, or dropout followed by death with respect to the time points of the four quadrennial test sessions at which the Heim (1970)Go AH4-1 test was administered. These groups were as follows.

Group C completed the study and survived the 2004 census of deaths (n = 1,510); Group D1 died between Test Session 1 and Test Session 2 (n = 365); Group D2 died between Test Session 2 and Test Session 3 (n = 409); Group D3 died between Test Session 3 and Test Session 4 (n = 246); Group D4 died between Test Session 4 and the 2004 census of deaths (n = 116); Group WD1 withdrew before Test Session 2 and subsequently died (n = 745); Group WD2 withdrew before Test Session 3 and subsequently died (n = 354); Group WD3 withdrew before Test Session 4 and subsequently died (n = 109); Group W1 withdrew before Test Session 2 and survived beyond the 2004 census (n = 1,013); Group W2 withdrew before Test Session 3 and survived beyond the 2004 census (n = 595); and Group W3 withdrew before Test Session 4 and survived the 2004 census (n = 388).

Because our study aim was to examine, independently, the effects of age and of general intellectual ability on practice effects, we found it necessary to have a different measure of ability than scores on the AH4-1, for which practice data were analyzed. This was available because the Heim (1970)Go AH4-2, a nonverbal test of general fluid ability, had been administered to all participants on each testing occasion cotemporaneously with the AH4-1. Within the entire sample, the correlation between age-unadjusted percentage correct of AH4-1 and AH4-2 scores is r =.84, so AH4-2 scores are a good independent measure of gF and, specifically, of individual differences in ability on the AH4-1. The AH4-2 intelligence test also consists of 64 problems. Each consists of a series of five line drawings with a missing element that must be supplied from among five provided alternatives. Solutions of series require mental rotation, addition, and subtraction or irregular shapes and recognition of logical progressions. After introductory practice on one question from each of these categories, participants attempt to solve as many problems as possible within 10 minutes. Scores used are percentages of correct answers.

We conducted our analysis by using a linear mixed effects model with fixed effects including age, demographic factors such as gender, socioeconomic advantage, city of residence, and recruitment cohort, AH4-2 score, practice session, and death and withdrawal group. The random effects were an intercept for each individual and an individual age effect, so that variance between individuals increased with age. A key point of the analysis was the introduction of the group effect, allowing the groups from C to W3 to influence the regression coefficients of the usual fixed effects. This is a so-called pattern-mixture model. If the regression on the groups, which could include interaction terms, is significant, then this model allows the time and type of dropout to influence the effects of other factors (Little, 1993Go).


    RESULTS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Effects of Age on Improvement With Practice
The results of a linear mixed effects pattern-mixture model are shown in Table 1.<--CO?12--><--CO?13--> Age is centered at 70 years, because this is close to the mean age of the sample. The linear effect of age is highly significant and the significant quadratic age term indicates that rates of decline in participants'AH4-1 scores accelerate as they grow older. The significant difference for gender occurs because average scores for men are 1.57% higher than average scores for women. There are also highly significant effects of socioeconomic advantage and of recruitment year. There are highly significant differences between groups with different histories of survival, death, and dropout over successive time points during the study. Finally, after variance associated with all of these other factors has been taken into consideration, practice gains are robustly significant. Participants score significantly higher on their second, third, or fourth experience than they do on their first experience of the AH4-1 test. The key finding of interest is the significant interaction between the effects of age and the amount of practice gains between Sessions 1 and 2, Sessions 1 and 3, and Sessions 1 and 4. These occur because, on all these transitions, the older participants benefit less from practice than the younger participants do. Another new finding is the Age x Gender interaction that occurs because men experience more rapid age-related declines than do women. No other interactions are significant, so there is no evidence that the amount of practice gain is affected by proximity to death or to dropout.


View this table:
[in this window]
[in a new window]

 
Table 1. Estimated Parameters for the Mean AH4-1 Percentage Score.

 
Effects of Intelligence on Improvement With Practice
To examine the effects of individual differences in gF (AH4-2 scores) on practice gains, an obvious procedure was to fit a linear term in AH4-2 and examine the interactions. However, in the present case this was clearly inappropriate, and the fit of this model was not good because there is little practice effect at the lowest and highest scoring individuals. We also considered a linear + quadratic term in AH4-2. Again the fit to the data was not good. At this stage it became apparent that (a) the relationship is complicated and (b) the precise form of this complex relationship is of substantive interest in interpreting the pattern of individual differences in practice effects. The form of the interaction reveals that estimates of "true" practice effects may be miscomputed both because of "ceiling" effects for the most able and "floor" effects for the least able. In other words, the relationship between practice gains and AH4-2 scores is represented by an inverted U function, and this feature of the data is best revealed by subdivision into three groups. Given a requirement for simplicity of interpretation, we should look for two change points to give lowest, middling, and highest score groups. This selected model gave a good fit to the data, and the interaction was significant. This procedure would clearly have been inappropriate had it been intended be used as a prediction for individual trajectories. This is not the case. It is only intended to test for overall differences between ability groups. The results of fitting this final model are shown in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 2. Estimated Parameters for the Mean AH4-1 Percentage Score: Final Model.

 
Because of the strong correlation between AH4-1 and AH4-2 scores, we expected the overall effects of level of ability to be highly significant. After we took level of ability into consideration, we found that the linear but not the quadratic effects of age (centered at the average age of 70 years) and the effects of socioeconomic advantage, recruitment year, and survival, death, and withdrawal history were similar to those found in the first analysis. An interesting further detail is that the effect of gender was now abolished. We interpret this as evidence that all of the variance in AH4-1 scores that is associated with differences between men and women was now picked up by scores on another, different, well-validated nonverbal test of gF—the AH4-2. In other words, differences in AH4-1 test scores between men and women are not due to any factor, for which gender is a proxy, other than differences in the particular kind of general fluid mental ability that is measured by the AH4-2 and AH4-1 tests. The significant Age x Gender interaction replicates that found in the first analysis, with women showing less age-related decline in AH4-1 scores. The Age x Practice interactions found in the first analysis still remain significant, even after the effects of differences in AH4-2 scores have been taken into consideration. In other words, greater age significantly reduces practice gains, even after individual differences in general fluid mental ability have been taken into account. We interpret this as new evidence that not all of the age-related cognitive changes that lead to declines in efficiency of learning (practice effects) can be explained by differences in gF.

The significant interaction between level of ability and practice gains is a new finding, but the complex nature of this relationship requires interpretation. For both the low-ability group and the high-ability group, there are negative interactions with practice. That is, both the low-ability and the high-ability groups benefit less from practice than does the medium-ability group. For the low-ability group this interaction is significant for comparisons between Session 1 and each of Sessions 2, 3, and 4. For the high-ability group the interaction is significant for the comparison between Session 1 and Session 2 but not for the comparisons between Sessions 1 and 3 or Sessions 1 and 4.

There is also a significant interaction between age and low level of ability relative to age and either medium or high ability. The age-related decline of –0.52 per year of age overstates the amount of decline for those individuals at the bottom level of the ability range. It is possible that it is the inclusion of this interaction that results in the quadratic effect in which age becomes no longer significant in this model. As in the first analysis described earlier, the absence of any interaction between practice effects and the occurrence or timing of deaths or withdrawal means that there is no evidence that approach to death or dropout affects the amount of practice gains after differences in ability have been taken into consideration.


    DISCUSSION
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
The present analyses of data from the combined Manchester and Newcastle samples replicate the main findings of earlier analyses of data from the Newcastle longitudinal samples by Rabbitt, Diggle, Holland, McInnes, Bent, and colleagues (2004)Go and the Manchester and Newcastle sample by Rabbitt, Lunn, and Wong (2005)Go. There are marked effects of socioeconomic advantage and of year of entry to the study. Men score higher than women but women experience less rapid age-related decline. This is of interest because it probably reflects the fact that, in Western industrialized societies, women live longer than men and so experience slower biological changes and retain their competence to a later calendar age, as distinct from biological age. A further new detail in exploration of gender differences is that differences between men and women on one test of gF, the AH4-1, disappear when scores on a different, highly correlated test of gF, that is, the AH4-2, are taken into consideration. Thus we have no evidence that factors other than levels of gF are responsible for gender differences in performance of intelligence tests.

The significant effects of survival, death, and dropout history replicate earlier effects reported by Rabbitt, Lunn, and Wong (2005)Go and Rabbitt, Lunn, and Wong (2007, submitted). Performance declines with approach to either death or dropout and the amounts of declines preceding death and dropout are closely similar; comparing equivalent time points of dropout, we see that the effects of approaching dropout are greater when dropout is shortly followed by death than when dropout is followed by survival. This empirically shows that not only the effects of age-related decline but also the effects of other factors such as health or socioeconomic advantage, gender differences, and practice—all of which influence rates of changes in cognitive performance over time—are miscalculated unless both the occurrence and timing of both deaths and dropouts are also logged and taken into consideration.

The main point of the present analyses is that, even after we take into account the effects of initial selection and selective attrition of a sample, the marked practice effects during a prolonged longitudinal study found by some of us in earlier research (Rabbitt, Diggle, Holland, & McInnes, 2004Go) are replicated on a different and very much larger sample of participants (i.e., the Manchester and Newcastle group rather than just the Newcastle cohort of the University of Manchester longitudinal study). As in our earlier analyses (Rabbitt, Diggle, Holland, & McInnes, 2004Go), significant practice improvements are found even when intervals between successive presentations of the task are as long as 4 years. Indeed, the present analyses show that even the oldest participants show gains with quadrennial experiences of the AH4-1 test over periods of 8 and of 12 years.

The new questions asked by these analyses were whether practice gains during a longitudinal study differ between individuals of different ages and different levels of ability. The first analysis shows that younger participants gain more from practice than do relatively older participants. The second analysis shows that, even after variance associated with age differences has been taken into account, higher scores on one test of general intellectual ability, the AH4-2 intelligence test, are associated with significant increments in practice gains on another, the AH4-1. The further finding that age differences in practice remain significant even after effects of differences in unadjusted intelligence test scores have been considered makes the additional new point that the reduction in practice effects with increasing age cannot entirely be attributed to the fact that peoples' levels of gF decline as they grow older. This is inconsistent with recent speculations that age-related differences in all cognitive skills can be parsimoniously treated as differences in gF (e.g., Anderson, 1992Go, Deary 2000Go). The particular age-related cognitive changes that reduce practice gains in this longitudinal study are not entirely captured by scores on a well-validated test of gF.

Because the form of the interaction between AH4-2 test score group and practice gains is complex, it requires interpretation. Both the high-ability and the low-ability groups show less practice gains than does the medium-ability group. Many participants in the high-ability group scored the maximum possible, and many others had such high scores that there was little scope for improvement. Thus ceiling effects offer a sufficient explanation as to why high-ability participants show less improvement. Within the range of scores in which ceiling effects are no longer possible so that practice gains can appear, the significant advantage for the middle-ability group over the low-ability group shows that practice gains do increase with gF.

We must conclude that all previous analyses of longitudinal data in which only mean values for practice effects have been calculated have, because of this, significantly underestimated the true amounts of age-related declines for younger and more able participants. For older and less able participants, who show significantly less improvement with practice, estimated rates of age-related declines have been closer to their "true" values. This finding highlights a nontrivial issue because, unless it is implemented in further analyses, we cannot confidently address some theoretically interesting and socially important questions such as whether more and less able individuals experience different rates of cognitive decline and whether data show generation cohort effects. For example, do young participants in longitudinal studies, who have benefited from recent improvements in socioeconomic conditions and in medical care, experience less rapid cognitive decline than earlier generations who have, historically, been disadvantaged in these respects? To find reliable answers to these questions, we must measure and consider individual differences in practice effects.

Perhaps the most general point that these analyses make is that individuals' cognitive abilities are altered by their involvement in prolonged longitudinal studies, just as by their involvements in all other aspects of their everyday lives. The variety and the particular nature of our experiences significantly alter our abilities to cope with demands made by our everyday lives or, indeed, by psychological tests. People do not simply and inexorably decline in mental ability as they grow old. Their everyday experiences may also degrade or enhance their ability to cope with the demands that their lives make upon them. We may speculate that, as well as learning to cope with particular life demands and situations, people also gradually learn to adapt to and cope with the changes in their mental abilities brought about by neurophysiological aging. It is striking that, even in elderly individuals, very brief episodes of practice, even as brief as 10 minutes on first encounter with the simple problems in the AH4-1, can bring about domain-specific improvements that last for 4 to 7 years (see Rabbitt, Diggle, Holland, McInnes, Bent, et al., 2004Go). It therefore seems less interesting to continue to explore changes in individuals' scores on particular cognitive tests than to study interactions between their baseline levels of ability and their life experiences, and so the levels of competence they can achieve and so also the extent to which they can maintain, in old age, their ability to cope with their lives and with any laboratory experiments into which they may be inveigled.


    Footnotes
 
Decision Editor: Thomas M. Hess, PhD

Received for publication November 30, 2005. Accepted for publication January 2, 2008.


    References
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
PubMed
Right arrow PubMed Citation


HOME ARCHIVE SEARCH TABLE OF CONTENTS