| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
RESEARCH ARTICLE |
Department of Psychology, University of Virginia, Charlottesville.
| Abstract |
|---|
|
|
|---|
IT IS often assumed, at least implicitly, that people can be characterized as having a fixed level of a cognitive ability that can be accurately evaluated with a single assessment. However, to the extent that performance on cognitive tasks varies from one occasion to the next, as has been reported in numerous recent studies (e.g., Hertzog, Dixon, & Hultsch, 1992
; Hultsch, MacDonald, Hunter, Levy-Bencheton, & Strauss, 2000
; Li, Aggen, Nesselroade, & Baltes, 2001
; Nesselroade & Salthouse, 2004
; Rabbitt, Osman, Moore, & Stollery, 2001
; Rapport, Brines, Axelrod, & Thiesen, 1997
; Salinsky, Storzbach, Dodrill, & Binder, 2001
; Salthouse & Berish, 2005
; Shamni, Bosman, & Stuss, 1998
), single assessments may represent only one of many possible levels of performance that could have been observed for that individual, and hence they are potentially misleading (cf. Hultsch & MacDonald, 2004
; Nesselroade, 1991
).
Much of the prior research on within-person variability has focused on reaction time or other speed variables. These types of variables are often used as measures of transient states of arousal or alertness, and thus it is not surprising to find that they exhibit within-person variability. Only a few studies have investigated within-person variability with cognitive variables measured in terms of accuracy rather than in time or speed. In one such study, Hultsch and colleagues (2000)
reported results on two memory tasks (word recognition and story recognition) across four occasions from 15 healthy older adults and 30 adults with dementia or osteoarthritis. In another study, Li and associates (2001)
examined performance on three memory tasks (digit span, text memory, and spatial recognition) in 25 older adults across 25 sessions. Both studies reported substantial across-occasion variability for the measures of memory accuracy. However, in neither study was there any mention of how the test versions administered on different occasions were equated for difficulty, and therefore it is possible that some of the within-person (across-occasion) variability in those studies was attributable to differences in the difficulty of the versions. Only the study by Li and associates provided information about the reliability of the measures of within-person variability of cognitive performance, and the estimates were disappointingly low. That is, these researchers reported correlations between the variabilities computed across the first and the second half of the occasions, and between the variabilities computed across the odd- and even-numbered occasions, but both sets of correlations were quite low (i.e., Mdn = 0.20 and Mdn = 0.32, respectively). Finally, another limitation of the previous studies is that the sample sizes of normal healthy adults were fairly small, and only a narrow range of ages was represented.
We designed the study reported here to address these limitations by using a moderately large sample of adults (N = 143) across a wide age range (18 to 97 years), who each performed a battery of 13 different cognitive tests on multiple occasions. Different test versions were administered on each occasion, but we carried out adjustments for version differences on the basis of data from another group of individuals who performed the versions in a counterbalanced order across sessions.
There were two major issues of interest in this study. One concerned the magnitude of within-person variability in different measures of cognitive performance, and the implications that short-term variability might have for the interpretation of longitudinal change. Of particular interest is whether longitudinal change either could be difficult to detect or could be confused with short-term fluctuation, if the magnitude of short-term within-person variability is large relative to any systematic within-person change that may be occurring. To the extent that this might be the case, it is worth considering whether an individual's within-person variability might be used to help calibrate the magnitude of his or her change (Salthouse, Kausler, & Saults, 1986
). Because some of the participants in this study had performed several of the tasks 3 years earlier, we were able to compare different ways of evaluating longitudinal change.
The second major issue concerned the nature of within-person variability, particularly whether it is merely random noise or is a reflection of a meaningful individual difference characteristic that might be uniquely informative about the performance capabilities of the individual. One type of information relevant to this issue is the reliability of the within-person variability measures, because only if they were reliable would it be useful to characterize people as systematically differing in their degree of across-occasion variability. A second type of relevant information is the magnitude of correlations among measures of within-person variability for different cognitive variables. The rationale is that if the correlations were found to be moderately high, then the influences contributing to variability are unlikely to reflect measurement error or determinants that are specific to a particular variable.
When investigating within-person variability, researchers need to consider two aspects: the number of occasions of measurement and the type of assessment in each occasion. For the first aspect, a minimum of two occasions is needed to evaluate within-person variability, but if a researcher is interested in attempting to determine the asymptotic level of performance, then 50 or more sessions may be required (e.g., Kliegl, Smith, & Baltes, 1989
; Salthouse & Somberg, 1982
). The optimum number of occasions obviously depends on the specific goals of the study, because if a researcher is interested in decomposing the variability in terms of factors related to learning, or to cyclical variations, then he or she will need a relatively large number of occasions. However, pragmatic considerations usually mean that there is a trade-off between the number of occasions available from each individual and the number of individuals with some estimate of within-person variability. Because our earlier work (Nesselroade & Salthouse, 2004
) indicated that three occasions provided sufficient information to warrant comparisons of individual differences in measures of within-person variability, we had each participant in the current study perform the tests on three occasions.
The second aspect that researchers need to consider in studies of within-person variability is how they can be confident that the assessments on different occasions are equivalent, and that any variation observed from one occasion to the next is not attributable to differences in the particular items or versions that are administered on a given occasion. This is seldom a concern with reaction-time tasks and certain memory tasks in which the trials or items are either identical or very similar, and are sampled randomly within and across occasions. However, version equivalence is more of a concern with tests of other types of cognitive abilities, and there are a number of possible solutions to this problem. One approach is to use exactly the same version of the tests on each occasion. Although this obviously eliminates version differences, it may lead to an underestimate of within-person variability if at least some of the performance on later occasions is determined by one's memory for specific items from earlier occasions, or to an overestimate if awareness of the repeated items leads to feelings of resentment or boredom. Another approach is to use different versions of the tests on each occasion, and merely assume that they are equivalent without any explicit evaluation. However, to the extent that the versions are not truly equivalent, some of the observed variability may be attributable to version differences rather than to fluctuations within the individual. What might be the ideal approach would be to use different versions on each occasion that have been precisely equated by means of a procedure such as item response theory. Although this method would ensure that none of the across-occasion variation is attributable to differences in the tests, researchers would need a considerable amount of data to determine the characteristics of each item prior to conducting the research of primary interest. A compromise approach, and the one we adopt in this study, is to use different versions of the tests on each occasion, but to collect data from another sample of individuals who performed the versions in counterbalanced order to allow the difficulty levels of the different versions to be adjusted statistically.
| METHODS |
|---|
|
|
|---|
|
|
Because the test versions could have differed in difficulty, we conducted a preliminary study in which 60 young adults performed the three versions of each test in different orders. That is, 10 individuals each performed the tests in the six possible orders (O-A-B, O-B-A, etc.). The results from the preliminary study revealed that there were significant version differences in most of the tests, and thus we used the following procedure to adjust the scores on the A and B versions of each test to approximately match the mean of the O version. First, we computed linear regression equations from the data in the preliminary study to predict the scores on the O version of the test from the score on either the A or the B version. Second, we used the intercept and slope parameters from these equations to create adjusted A and B scores for each participant in the current study. To illustrate, the regression equation predicting the digit symbol score on the O version from the score on the A version based on the data from the preliminary study was DSO = 13.45 + 0.83(DSA). The application of these parameters to a participant in the current study with a score of 75 on version A would therefore result in an adjusted version A score of 75.7, or 13.45 + 0.83(75). Our rationale for the adjustment procedure was that the results from a study in which the test versions were administered in counterbalanced order can be used to equate the difficulty of the new versions to that of the original version (see endnote).
| RESULTS |
|---|
|
|
|---|
|
|
|
The values in Table 4 indicate that there is considerable within-person variability in each of the measures of cognitive functioning. Moreover, the ratios of within-person to between-person variability were actually somewhat larger for the vocabulary, fluid cognition, and episodic memory variables than for the perceptual speed variables that are most similar to the types of variables examined in prior studies of within-person variability. Another point to note in this table is that people differ in the magnitude of their within-person variability. That is, the (between-person) standard deviation of the within-person (across-occasion) standard deviations, reported in the third column of Table 4, are all moderately large. The variation across occasions is therefore not simply a reflection of a situation in which everybody is affected to the same degree.
Table 5 contains correlations of the within-person standard deviation and mean across the three occasions, and the correlations of these variables with age before and after control of the other variable. The values in the second column indicate that most of the correlations between an individual's mean and his or her across-session variability were negative, which indicates that better (higher) performance was associated with smaller across-session variability.
|
We investigated the question of whether people can be characterized as more or less variable across different types of cognitive tests by computing correlations of the within-person standard deviations for the 13 cognitive tests. The correlations ranged from .15 to +.48, but the median was only.05, and it was still only.09 when we ignored the sign of the correlation. We also computed correlations after partialling age from both variables, and in subsamples with a narrower range of ages. The median age-partialled correlation was.05, and the median correlations for participants 18 to 39, 40 to 59, and 60 to 97 years of age were, respectively,.06, .01, and.11.
One possible reason for the low correlations is weak reliability of the within-person variability measures. We investigated this possibility by obtaining estimates of the reliability of the within-person standard deviations by using the standard deviations from scores on different pairs of sessions as the "items" in the coefficient alpha. That is, for each individual, we computed three standard deviations for each test variable based on the scores in sessions 1 and 2, the scores in sessions 1 and 3, and the scores in sessions 2 and 3, and we then treated these three standard deviations (each based on two scores) as items in the computation of the coefficient alpha.
The reliability estimates derived in this manner ranged from.42 to.71, with a median of.59. Although these values are lower than the psychometric standard of.70, they nevertheless indicate that the measures of within-person variability were sufficiently high to sustain meaningful correlational patterns if they were to exist in the data. What is most important is that, when we adjusted the correlations among the within-person standard deviations for these estimates of reliability, they were still quite small, with a median of only.10. At least on the basis of these results, therefore, it does not appear that individuals who exhibit large across-occasion variability in one cognitive measure are any more likely than the average individual to exhibit large across-occasion variability in other cognitive measures.
We also conducted an exploratory factor analysis on the within-person standard deviations. As we would expect from the low correlations, the factor analysis did not reveal much evidence of structure. The correlation matrix had six eigenvalues greater than one, and thus we extracted six factors (iterative principal axes) and rotated them by means of promax. There were three variables with loadings greater than.4 on the first factor (i.e., spatial relations, paper folding, and paired associates), two on the second factor (i.e., synonym vocabulary and antonym vocabulary), four on the third factor (i.e., vocabulary, logical memory, recall, and paired associates), two on the fourth factor (i.e., letter comparison and recall), and one each on the fifth (i.e., matrix reasoning) and sixth (i.e., pattern comparison) factors. The relatively large number of factors with modest loadings of the variables on each factor, and the diversity of the variables loading on each factor, suggests that there is little or no structure in the available measures of within-person variability. This finding is substantially different from that observed in analyses of the means, because there is clear evidence of a strong structure among the means of these variables (e.g., Salthouse, 2004
; Salthouse & Ferrer-Caja, 2003
).
The other major issue of interest in this study concerned the relation between short-term variability and longitudinal change. Because people vary in the magnitude of short-term within-person variability, it is possible that the same absolute value of longitudinal change could have quite different meanings for different people. For some individuals the change might be well within their normal range of fluctuation, but for others it might represent an extreme value. We could examine this possibility in the current data because 18 of the participants (age at first assessment, M = 57.1) had performed the original version of six of the tests 3 years earlier. The tests common across the 2001 and 2004 assessments were Vocabulary, Picture Vocabulary, Digit Symbol, Logical Memory, Word Recall, and Paired Associates tests. This sample is too small for meaningful statistical analyses, but it is useful for illustrating the point that change in the original units of measurement may not have the same functional meaning for different people.
The longitudinal patterns were generally similar for each variable and can be illustrated with the results from the Word Recall test. The values for individual participants on this variable are portrayed in Figure 2, with the solid circles indicating the 2001 score and the open circles with bars representing the mean and standard deviation, respectively, of the three scores in 2004. An inspection of the figure indicates that, for many of the individuals, the scores were higher at the later assessment. Although these gains could reflect true improvements in ability, we suspect that a substantial proportion of the performance gains are attributable to retest effects (e.g., Ferrer, Salthouse, Stewart, & Schwartz, 2004
; Salthouse, Schroeder, & Ferrer, 2004
).
|
This point can be illustrated by considering two individuals in Figure 2 who were 44 and 46 years of age in 2001. These two individuals had similar word recall scores of 38 and 36 in 2001, and similar means across the three assessments in 2004 of 40 and 39, respectively. However, their across-occasion standard deviations in 2004 were 0.7 and 4.9, respectively, which indicates that the slightly larger absolute difference for the 46-year-old individual (i.e., 3 vs 2) was actually substantially smaller than that of the 44-year-old when it is expressed in within-person standard deviation units (i.e., 0.6 vs 2.9).
The preceding example suggests that, depending on the method used to calibrate change, different conclusions might be reached about the magnitude and correlates of change. In order to examine this issue more systematically, Table 6 contains information relevant to different methods of calibrating change for the six variables with longitudinal data. In addition to containing the means and between-person standard deviations for the 2001 (Time 1, or T2) score, the mean across the three T2 scores, and the standard deviation of the T2 scores, Table 6 also summarizes three ways of expressing the within-person change from T1 to T2. The simplest and most frequently used method of representing change is the difference between the relevant scores at each occasion (i.e., the mean of the T2 scores minus the T1 score) in the original units of measurement. A second method is the difference scaled in T1 between-person standard deviation units (e.g., Ivnik et al., 1999
; Schaie, 1996), and a third method is the difference relative to each individual's T2 within-person standard deviation (Salthouse et al., 1986
).
|
| DISCUSSION |
|---|
|
|
|---|
The existence of within-person variability in cognitive functioning has important implications for the evaluation of change. Most contemporary researchers assess change in the original units of measurement, and thus they implicitly assume that the units have the same meaning for everyone. A number of researchers have attempted to evaluate individual change relative to the variability that exists across people (e.g., Ivnik et al., 1999
; Schaie, 1996), but this is not ideal because between-person variability is only a crude approximation of within-person variability. Salthouse and colleagues (1986)
proposed that researchers might obtain more sensitive assessments of change by expressing the change for each individual relative to his or her own across-occasion variability. This method is analogous to the computation of an effect size for each individual, and it shares the property that normal variability is taken into account when the magnitude of an effect is specified.
Conclusions about the magnitude of change, about between-person variation in change, and about correlates of change will therefore vary depending on how the change is assessed. The optimal method to be used in assessing change will obviously depend on the specific question of interest. For example, a comparison in the absolute units of measurement may be more meaningful if the variable is scaled in a ratio level of measurement such as units of time, or if it represents progress toward an absolute criterion. However, change calibrated relative to each individual's within-person variability may be more meaningful if the goal is to investigate change in units that are functionally equivalent in different people.
Although large individual differences in amount of within-person variability are evident in every variable in Tables 4 and 6 and in Figure 2, there were no significant correlations between age and the measures of within-person variability after we adjusted for influences associated with the mean. This pattern is similar to that recently reported by Salthouse and Berish (2005)
in several analyses, and it seems to suggest that information about an individual's short-term variability may not have any unique predictive power beyond what is available from his or her mean level of performance. However, it should be noted that in samples of older adults, Hultsch and colleagues (2000)
and Rabbitt and associates (2001)
have reported that within-person variability in reaction time was correlated with level of performance in other cognitive tasks. Therefore, it may be the case that within-person variability only provides unique information with variables assessing performance speed rather than performance accuracy, or in samples of individuals likely to be experiencing substantial change in level of cognitive functioning.
In our study there was also little evidence of structure in the measures of within-person variability either in the raw correlations or in the exploratory factor analysis, before or after we adjusted for reliability of the measures. This pattern suggests that influences contributing to across-occasion variability in these variables are specific to particular variables and are not shared across different variables, even those assumed to reflect the same cognitive ability. In other words, even though the reliability estimates suggest that within-person variability is not simply random fluctuation, there is no evidence in these data that people who exhibit high across-occasion variability for one cognitive variable exhibit high across-occasion variability for other cognitive variables. Li and associates (2001)
also failed to find much evidence of structure among the measures of within-person variability for several memory measures assessed across 25 occasions. Important questions for future research are the determination of what is responsible for the across-occasion variability in cognitive performance, and why there are such weak relations among the within-person variability measures from different cognitive tasks.
In conclusion, there is now considerable evidence that calls into question the adequacy of the classical notion of a fixed true score as an ideal focus of measurement efforts. Theoretical concepts and analytical methods should therefore reflect this shift of thinking if progress it to be made in describing, measuring, and explaining behavior and behavior change.
| Acknowledgments |
|---|
| Footnotes |
|---|
Although it would have been desirable to use individuals of the same age and ability range as those in the primary study for this calibration study, college students were more readily available and could be compensated with credit towards a course requirement rather than with money. These young adults had somewhat higher average levels of performance than the age-heterogeneous sample in the primary study on many of the variables. However, most of the relations between the predictor (i.e., version A or B) and criterion (i.e., version O) variables in the primary sample were linear (i.e., for the linear relation, median R2 = was .532; for the quadratic relation, median R2 = .004; and for the cubic relation, median R2 = .002), indicating that it is reasonable to extrapolate from one region of the distribution to the entire distribution. ![]()
Received for publication June 15, 2005. Accepted for publication October 10, 2005.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. A. Gamaldo, S. R. Weatherbee, and J. C. Allaire Exploring the Within-Person Coupling of Blood Pressure and Cognition in Elders J. Gerontol. B. Psychol. Sci. Soc. Sci., November 1, 2008; 63(6): P386 - P389. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. M. Bielak, T. F. Hughes, B. J. Small, and R. A. Dixon It's Never Too Late to Engage in Lifestyle Activities: Significant Concurrent but not Change Relationships Between Lifestyle Activities and Cognitive Speed J. Gerontol. B. Psychol. Sci. Soc. Sci., November 1, 2007; 62(6): P331 - P339. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|