Home
HOME ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
PubMed
Right arrow PubMed Citation
The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 61:S52-S56 (2006)
© 2006 The Gerontological Society of America


BRIEF REPORT

Test–Retest Reliability of Subclinical Status for Functional Limitation and Disability

Douglas K. Miller1,, Elena M. Andresen2, Theodore K. Malmstrom3, J. Philip Miller4 and Fredric D. Wolinsky5

1 Center for Aging Research, Indiana University, and Regenstrief Institute, Inc., Indianapolis.
2 Health Services R&D Service, Department of Veterans Affairs Medical Center, and College of Public Health and Health Professions, University of Florida, Gainesville.
3 Department of Psychiatry, School of Medicine, Saint Louis University, St. Louis, Missouri.
4 Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri.
5 College of Public Health, University of Iowa, and Iowa City VAMC, Iowa City.

Address correspondence to Douglas K. Miller, MD, IU Center for Aging Research, 1050 Wishard Blvd., RG-6, Indianapolis, IN 46202. E-mail: dokmille{at}iupui.edu


    Abstract
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Objectives. Subclinical status for functional limitation and disability help explain pathways to difficulties with functional limitation and disability, but data on their measurement stability are minimal. We evaluated the reproducibility of measuring subclinical status in a random subset of 92 community-dwelling St. Louis African Americans aged 49 to 65 years old.

Methods. We examined test–retest reliability of subclinical status using Fried's measurement method of changing either the frequency or method of task performance for five functional limitations, three basic activities of daily living (ADLs), and four instrumental ADLs, as well as summary scales reflecting these three constructs. We also performed sensitivity analyses of test–retest interval and alternative definitional approaches (using only method, only frequency, or both).

Results. Weighted kappas for individual tasks across three performance levels (high functioning, subclinical status, and task difficulty) indicated moderate agreement for one task and substantial agreement for 11 tasks. Intraclass correlation coefficients for the three scales demonstrated outstanding agreement. The most reproducible definition of subclinical status involved the either/or method.

Discussion. Excellent test–retest reproducibility was demonstrated in this population-based sample of late middle-aged African Americans using Fried's method of measuring subclinical status.

Functional limitation and disability are core concepts in health status and health-related quality-of-life trajectories among older adults (Andresen, Rothenberg, &, Zimmer, 1996Go; Wolinsky & Miller, in press). Functional limitation generally refers to reported difficulty performing muscular-skeletal activities (e.g., going up and down stairs, walking a half mile), while disability reflects reported difficulty performing basic activities of daily living (ADLs) and instrumental ADLs (IADLs; Wolinsky & Miller). As originally proposed by Fried and colleagues (Fried, Herdman, Kuhn, Rubin, & Turano, 1991Go), the concept of subclinical status (or "preclinical disability") is useful in understanding the earlier stages of functional limitation and disability. Theoretically, the ascertainment of subclinical status prior to progression to self-reported difficulty should provide an early warning system that can be successfully used to promote functional recovery and prevent the onset of task disability, which is crucial in the context of the Institute of Medicine's (2003)Go recent emphasis on disability prevention as a priority area for improving health care quality (Wolinsky, Miller, Andresen, Malmstrom, & Miller, 2005aGo). Although several approaches to operationalize subclinical status have been proposed (e.g., Binder et al., 2002Go; Hazuda, Gerety, Lee, Mulrow, Lichtenstein, 2002Go), an easy and frequently used method is Fried's (Fried et al., 1996Go). In it, subclinical status is said to exist if the subject reports no difficulty performing a task but reports having modified his/her task performance either in terms of its frequency or method of performance.

Using this definition, subclinical status for functional limitations, ADLs, and IADLs have been shown to be prevalent and to be highly predictive of subsequent incident difficulty in samples of community-dwelling individuals either in or near their senior years. Fried and colleagues (1996)Go demonstrated subclinical status prevalence ranging from 2% to 33% for 27 tasks in a convenience sample of 231 adults aged 59 to 90 years. We showed subclinical prevalence among subjects who reported no task difficulty ranging from 10% to 40% for five functional limitations, three ADLs, and four IADLs in a population-based sample of 998 urban-dwelling African Americans aged 49 to 65 years. Moreover, both Fried and colleagues (Fried, Bandeen-Roche, Chaves, & Johnson, 2000Go) and we (Wolinsky et al., 2005aGo) have also shown that task-specific subclinical status is a consistent and strong predictor of the subsequent development of reported task difficulty, even after adjusting for relevant covariates.

Despite these useful attributes of subclinical status for functional limitations and disability, information on the reliability of its measurement is sparse. The only published data that we are aware of come from the Fried and colleagues (1996)Go study. In that report, the test–retest reliability of five tasks (dressing, preparing meals, self-managing medications, grasping and handling small objects, and walking a half mile) was assessed in 93 subjects selected at random from the sample of 231. Kappas ranged from 0.74 to 0.86. Despite the innovative nature of this work, Fried and colleagues' results have important limitations. Because it was not feasible to have their subjects return for reevaluation, the retest interval they used was only several hours long, increasing the possibility of recall and short-term learning effects and thus of reliability estimates that were biased high. In addition, only five tasks were evaluated in a convenience sample of primarily Caucasian subjects.

In this study, we extended Fried and colleagues' (1996)Go work using a population-based, somewhat younger sample from a single, important minority group (African Americans) coupled with a more optimal retest interval (DeVellis, 2003Go) to examine the reliability of measuring subclinical status for 12 tasks. We also conducted sensitivity analyses to determine whether alternative definitional approaches to subclinical status (change in method only, change in frequency only, or change in both) would be equally or more reproducible than the original either/or approach, which might reduce the need to ask subjects about both method and frequency modifications, assuming that the new definition also retained predictive validity of the original definition.


    METHODS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
Study sample
The African American Health (AAH) study is a longitudinal population-based cohort of African Americans from metropolitan St. Louis, Missouri. The basic study design has been described previously (Miller et al., 2004Go; Wolinsky, Miller, Andresen, Malmstrom, & Miller, 2004Go). In brief, 998 age- and race-eligible subjects born between 1936 and 1950 were recruited from two geographic strata that were designed to maximize the socioeconomic differences between them. One stratum involved the same poor inner-city area that served as the catchment area for an earlier study of older African Americans (Miller et al., 1996Go), and the other involved more affluent, near-northwest suburbs. The recruitment proportion was 76% (77% in the inner city and 75% in the suburbs). In addition to race and age, exclusion criteria included a Mini-Mental State Exam (Folstein, Folstein, & McHugh, 1975Go) score of less than 16 (Molloy et al., 1996Go) and living in an institution (nursing home or assisted living center). All AAH procedures were approved by the Institutional Review Board at the supervising institutions.

Fifty interviewers conducted an initial in-home interview and functional assessment averaging about 2.5 hours in length on the 998 participants between September 2000 and July 2001. In a substudy, 114 of the 998 subjects were randomly selected for retesting, and 92 (81%) completed the in-home repeated assessments. For 80 of these subjects, interviewers were matched for the two interviews, and 12 subjects had a different interviewer at test than retest. Questions in the substudy included the difficulty and subclinical status questions for 12 tasks (five functional limitations, three ADLs, and four IADLs). There were no statistically significant differences between subjects participating in the substudy from other participants in the main study in terms of age, gender, income, education, 11 self-reported medical conditions, or reported level of difficulty (none, subclinical status, or difficulty) for any of the 12 tasks addressed in this article, with the exception of preparing meals and congestive heart failure. For preparing meals, 73% of the subjects in the retest subset had high functioning, 13% had subclinical status, and 14% had difficulty versus 76%, 19%, and 5%, respectively, in the other main study participants (p =.002). Twelve percent of the retest group reported congestive heart failure, while only 5% of those in the main study did (p =.004). Given that 30 comparisons were performed, these two differences were most likely due to chance. The interval between test and retest was 5 to 45 days (M = 18, median = 19, interquartile range = 13–22). For the most part, these intervals fit within the range identified as optimal to minimize subject and interview bias (several days to several weeks; DeVellis, 2003Go).

Functional limitation, ADL, and IADL items
Tasks addressed included five functional limitations (stoop, crouch, or kneel; lift and carry 10 pounds; go up and down 10 steps; grasp or handle ["for example, picking up a dime from the table"]; and walking a half mile), three ADLs (bathing or showering, dressing, and getting in and out of bed or chairs), and four IADLs (preparing meals, performing light housework, performing heavy housework, and managing medications). We ascertained task difficulty using the wording and approach of the Second Longitudinal Study on Aging (LSOA-II; National Center for Health Statistics, 1998Go). For example, we asked subjects, "Because of a health or physical problem do you have any difficulty bathing or showering?" Subjects who reported any difficulty or inability to perform a task were considered to have "difficulty." We then asked subjects who reported no difficulty performing a task whether, because of a health or physical problem, they had altered either the method or frequency of task performance since age 40. Subjects who reported neither difficulty nor task modification were considered to have "high functioning," and those who reported no difficulty but either modified the method or reduced the task frequency were determined to have "subclinical status" for that task. Subjects could also indicate that they did not perform the task for reasons other than health or physical problems, and this opportunity existed at both the test and retest administration. We excluded subjects who responded, "Don't do for other reasons" at the time of either test or retest from the reliability assessment (see Table 1 for number of subjects included for each task). Thus, for each item, we placed each subject into one of three mutually exclusive and exhaustive functional categories: high functioning, subclinical status, or difficulty.


View this table:
[in this window]
[in a new window]
 
Table 1. Test–Retest Results for Three Levels of Disability or Functional Limitation (High Functioning, Subclinical Status, and Difficulty) in the St. Louis African American Health Cohort.

 
Analysis
We assessed retest reliability for the three-level categorization for each individual task by weighted kappa ({kappa}; Armstrong, White, & Saracci, 1992Go) according to Cohen's method (Cohen, 1968Go; Fleiss, 1981Go). In addition, we computed scales separately for the five functional limitation items, the three ADL items, and the four IADL items using a simple sum of items in the scale and the following category weights: high functioning = 0, subclinical status = 1, and difficulty = 2. We examined retest reliability of the scales using two-way random effects intraclass correlation coefficient (ICC) for the summary scale scores (Fleiss; Statistical Package for the Social Sciences [SPSS], 2004Go). Fleiss and Cohen (1973)Go have shown that the ICC is mathematically equivalent to kappa and weighted kappa. Therefore, we interpreted both kappa and ICC levels using established conventions for kappa, in which reliability values ranging from 0.40 to 0.59 may be considered moderate, 0.60 to 0.79 substantial, and ≥ 0.80 outstanding (Landis & Koch, 1977Go). When possible, we calculated 95% confidence intervals (95% CI) for each kappa and ICC to estimate their precision based on the size of the retest sample (Fleiss).


    RESULTS
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
The 92 subjects in the retest group had an average age of 56.6 years (SD 4.6, range = 50–65), 63% were women, 38% were divorced or separated, 16% were widowed, and 14% were single. Other characteristics of the retest group have been reported previously (Andresen, Malmstrom, Miller, Miller, & Wolinsky, 2005Go; Wolinsky, Miller, Andresen, Malmstrom, & Miller, 2005bGo) and are not repeated here.

Depending on the task, subclinical status was somewhat more prevalent (e.g., bathe, light housework) or less prevalent (e.g., heavy housework, walk a half mile) than difficulty and ranged from 10.8% for managing medications to 23.1% for lift and carry 10 pounds (Table 1). Kappas across the three levels for the individual tasks ranged from 0.40 for managing medications to 0.78 for going up and down 10 steps. Kappas indicated moderate agreement for one task and substantial agreement for 11 tasks. ICCs for the three scales (functional limitations, ADL, and IADL) were higher than the individual item kappas and showed outstanding agreement in each case.

To examine the potential effect of the test–retest interval on the results, we examined the item and scale reliability separately in the group whose test–retest interval was 5 to 18 days and in the group with a 19 to 45 day interval. Including all individual items, kappas for the short interval group averaged 0.045 higher than for the longer interval group, but kappas for the three domain summary scores were 0.023 higher for the longer interval group (data not shown; available from first author on request).

We also examined the reasons for subclinical status determination (changed method only, changed frequency only, or changed both) at both the test and retest assessments for respondents in the retest subgroup who had no difficulty for that task at either assessment. These analyses were performed to investigate (a) which of the two factors (changed method or changed frequency) was the more common reason for subclinical status determination, (b) whether the reason for subclinical status was stable from test to retest, and thus (c) whether one of the two factors could be dropped without reducing reliability. For all 12 items, change in frequency was the more common reason for subclinical status, but this varied across the items and generally in content-appropriate ways (Table 2). For example, change in frequency was by far the more frequent reason for subclinical status determination for light housework, but change in method was more common for managing medications. However, when kappas could be determined for agreement regarding which reason caused a subclinical status determination (seven items), they ranged from 0.17 to 0.66. These kappas were substandard for five tasks and moderate for two tasks.


View this table:
[in this window]
[in a new window]
 
Table 2. Consistency of Subclinical Status (SCS) Classification by Report of Change ({Delta}) in Method, Frequency, or Both for Respondents in the Retest Group with no Task Difficulty at Either Evaluation.

 
Finally we examined the reason (method or frequency) for subclinical status classification in those subjects with subclinical status for each specific task at both interviews. Although the sample sizes were small, it was clear that the reported reason that caused subclinical status determination varied for some subjects from test to the retest.


    DISCUSSION
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 
This study has generally confirmed excellent reliability of measuring subclinical functional limitation and disability when using the method proposed by Fried and colleagues (1996)Go of permitting either change in method or change in frequency (or both) to indicate subclinical status. The only task with less than substantial agreement was managing medications, and even that item met acceptable reliability standards ({kappa} = 0.40). The reliability was especially strong when the items were combined into scales according to their functional category. Moreover, by demonstrating this finding for 12 tasks in a population-based group of community-dwelling African Americans who were about 17 years younger on average than Fried's sample, we have extended their findings to a representative sample of different age and race, to a larger group of functional tasks, and to a more appropriate test-retest interval.

Kappas in our sample averaged 0.14 lower for the five tasks that Fried and colleagues (1996)Go also examined. The biggest difference related to managing medications, for which the reliability in our study was 0.40 versus 0.86 in theirs. If that one item is removed, then the difference in reliability for the other four items between the two studies is only 0.07. The somewhat lower reliability in our study is plausible given that the shorter test–retest interval in their investigation would permit greater recall of prior responses.

Although our examination of the specific reasons for subclinical status determination was limited somewhat by the relatively small number of subjects in each category (Table 2), the results strongly suggest that both the change method and change frequency questions must be used to determine subclinical status for two reasons. First, the factor that resulted in subclinical status determination varied across tasks. Second, the reason for subclinical status (change in method or change in frequency) varied from test to retest even within each task so that the either/or combined measure produced excellent reliability, whereas either factor on its own would have generated a less reliable measure. These data suggest that the distinction between change in method and change in frequency may not have been entirely clear to the subjects. Qualitative investigation to understand subjects' interpretation and response to these questions would be useful to help clarify these issues, for example using cognitive interviewing techniques (Krause, 2002Go).

Potential limitations of our investigation should be kept in mind. We examined the test–retest reliability for only 12 tasks in a single minority population of restricted age in a single locale. Moreover, while most of the retest interviews were performed by the same interviewer as the original interview, in 12 cases interviewers differed between test and retest assessments. Despite these potential limitations, the consistency of our findings both internally and in comparison with the findings of Fried and colleagues (1996)Go suggest robust reliability for this method of determining subclinical status.


    Acknowledgments
 
This research was supported by a grant from the National Institute on Aging to Dr. D. K. Miller (R01 AG-10436). The opinions expressed here are those of the authors and do not necessarily reflect those of the funding agencies or academic, research, and governmental institutions involved.


    Footnotes
 
Decision Editor: Charles F. Longino, Jr., PhD

Received for publication February 9, 2005. Accepted for publication July 29, 2005.


    References
 TOP
 Abstract
 Methods
 Results
 Discussion
 References
 




This article has been cited by other articles:


Home page
Journals of Gerontology Series B: Psychological Sciences and Social ScienceHome page
E. J. Porter
Scales and Tales: Older Women's Difficulty With Daily Tasks
J. Gerontol. B. Psychol. Sci. Soc. Sci., May 1, 2007; 62(3): S153 - S159.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
PubMed
Right arrow PubMed Citation


HOME ARCHIVE SEARCH TABLE OF CONTENTS