| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
RESEARCH ARTICLE |
a Agency for Healthcare Research and Quality, Rockville, Maryland
b National Center for Health Statistics, Hyattsville, Maryland
John A. Fleishman, Center for Cost and Financing Studies, Agency for Healthcare Research and Quality, 2101 East Jefferson Street, Rockville, MD 20852 E-mail: jfleishm{at}ahrq.gov.
Decision Editor: Fredric D. Wolinsky, PhD
| Abstract |
|---|
|
|
|---|
Methods. Data came from the 1994/1995 National Health Interview Survey Disability Supplement. Analyses focused on 5,750 adult respondents who received help or supervision with at least one of 11 activities of daily living/instrumental activities of daily living tasks. We estimated gender and age group (1839, 4069, and 70+) differences in disability, using multiple-indicator/multiple-cause models, which treat functional disability as a latent trait.
Results. Nine items manifested significant DIF by age or gender; DIF was especially large for "shopping" and "money management." Without adjusting for DIF, middle-aged persons were less disabled than elderly men, and women were less disabled than men among nonelderly persons. After adjusting for DIF, middle-aged persons did not differ from elderly persons, and gender differences within age groups were not significant.
Discussion. Comparisons of disability across sociodemographic groups need to take DIF into account. Future research should examine the causes of DIF and develop alternative question wordings that reduce DIF effects.
Functional disability refers to limitations in the performance of basic daily activities necessary to maintain personal hygiene or reside independently in the community. Functional disability includes limitations in activities of daily living (ADLs)such as bathing, dressing, and eatingand instrumental activities of daily living (IADLs)such as shopping, meal preparation, and managing finances (Katz, Ford, Moskowitz, Jackson, and Jaffe 1963
; Lawton and Brody 1969
). Accurate measurement of functional disability is important at both the population and individual levels. Knowledge of the prevalence and incidence of functional disability in the population is essential for anticipating demand for services and for program planning. At the individual level, functional disability is assessed to determine eligibility for participation in long-term care programs, and to assist in discharge and care planning.
In addition to information on the presence or absence of any functional disability, assessment of the severity of disability among the disabled serves important purposes. Measurement of the severity of functional disability is especially important in monitoring changes in health status for persons with chronic illnesses, whose progress may be measured in partial recovery of function instead of complete absence of limitations. Aggregate severity information can also be used for program planning and group comparisons.
Although functional disability is often measured by asking about difficulty performing ADL and IADL tasks, a tradition in this literature, beginning with Katz and colleagues 1963
, distinguishes between dependence and independence. Operationally, in this tradition, disability is assessed by asking whether a person receives human or mechanical help with these tasks (Spector and Fleishman 1998
). The present study falls in this tradition and measures the severity of functional disability in terms of receipt of human help or supervision with ADL and IADL tasks.
ADL and IADL assessment instruments were originally designed for elderly or chronically ill adults (Katz et al. 1963
). Some national surveys, however, ask the same questions about functional disability of both elderly and nonelderly adults. Findings from the 1994 National Health Interview SurveyDisability Supplement (NHIS-D) indicate that functional disability rates increase in older age groups. The proportion of people receiving help with ADLs or IADLs is about 1% for persons aged 1829, 10% for persons aged 7074, and 80% for those aged 95 and over (Spector, Fleishman, Pezzin, and Spillman 2000
). Data from this study also show that adult women of all ages are more likely than adult men to receive help with ADLs or IADLs: 12.8% versus 2.4% for persons under age 65 and 20% versus 12% for those 65 and older.
Age and gender comparisons of the severity of disability show a slightly different pattern. Spector and Fleishman 1998
assessed severity of functional disability by measuring the number of IADL and ADL items for which a person received human help. Among an elderly population with some disability, they found that people aged 80 and older reported more severe disability than those who were younger. Elderly men also reported more severe disability than elderly women, even though elderly women were more likely to be disabled.
Despite the widespread use of ADL and IADL questions across the age spectrum, the measurement properties of these scales have not been established. Valid comparison of disability severity across age or gender groups requires that the measure be comparable in these groups. The validity of any comparison of functional disability across age or gender groups (or groups differing in other characteristics) rests on the assumption that the measure is invariant across these different groups. If extraneous factors influence people to respond differently in one group than in another, then the resulting lack of invariance confounds group comparisons. In other words, for valid group comparisons, measures should not be affected by differential item functioning (DIF; Camilli and Shepherd 1994
; Millsap and Everson 1993
; Thissen, Steinberg, and Wainer 1993
).
In terms of measurement theory, responses to survey questions are considered to be observed indicators of an unobserved latent variable. In the present context, measured responses to questions about receipt of help with ADLs and IADLs are viewed as reflecting a latent factor of functional disability. This factor is not observed directly, but indirectly through its effect on observed responses to ADL or IADL questions. If individuals at the same level of underlying disability differ in their responses to a specific itemdepending on their age, gender, or other characteristicsthen the item exhibits DIF. (The term "item bias" is often used, as well. DIF is a more neutral term referring to the existence of differential response patterns in different groups, whereas item bias specifically implies invidious group comparisons resulting from DIF [Camilli and Shepherd 1994
; Marshall, Mungas, Weldon, Reed, and Haan 1997
].) The mere existence of a group difference in observed disability is not sufficient to demonstrate DIF, which reflects group differences in responses to an item after respondents' status on the latent factor is controlled (Millsap and Everson 1993
). If DIF is present, then observed group differences will reflect something other than the latent factor. In the present case, if DIF is present, observed differences between age groups or genders in responses to ADL and IADL items will be a combination of real differences in latent disability and group-specific effects. It is, therefore, important to be able to separate true differences from measurement differences when comparing groups.
This study examines the extent to which comparisons of the severity of functional disability across age and gender groups are affected by DIF. Prior literature raises the possibility of DIF by gender and age in IADL and ADL items. Concerns about the comparability of IADL questions for men and women have existed since the 1960s, when functional disability measures were first being developed (Lawton and Brody 1969
). Tasks like laundry or meal preparation, especially in cohorts of elderly persons, may represent gender-typed activities that men may not normally do, and thus men may be more likely to receive help when they attempt them (Allen, Mor, Raveis, and Houts 1993
). Although evidence of gender DIF for functional disability items has been found for elderly persons (Spector and Fleishman 1998
), gender DIF has not been studied among nonelderly persons. In an attempt to reduce potential gender bias when assessing functional disability, surveys often ask whether nonperformance or help is caused by a "health problem or a disability or physical, mental, or emotional problems" (Spector et al. 2000
). It is not known, however, to what extent this strategy is successful in eliminating DIF.
In addition to gender DIF, age-related DIF may occur. Nonelderly persons may not respond in the same way to certain ADL/IADL questions as elderly persons. Elderly and nonelderly respondents may have different perceptions of disability, different levels of support, varying role expectations, or varying coping styles that may result in a reluctance either to report receiving assistance or to seek assistance in the first place (Groot 2000
). If these tendencies are particularly strong for a subset of items, DIF may result. Physical or mental impairments that are more prevalent in certain age groups and affect only a few items may also produce DIF.
In this study, we compare three age groups1849, 5069, and 70 and overand two gender groups to assess DIF in commonly used ADL and IADL items. To gauge the potential impact of DIF, we estimate age and gender differences in underlying disability levels controlling for DIF and not controlling for DIF. We identify items that have particularly large DIF effects and compare results from models with and without these items to test the stability of our results.
| Methods |
|---|
|
|
|---|
Functional Disability Measures
Respondents were asked a series of questions about ADLs and IADLs. ADLs included bathing, dressing, eating, using the toilet, getting in or out of bed or chairs, and getting around inside the home; IADLs included preparing meals, shopping, managing money, using the telephone, doing heavy housework, and doing light housework. Respondents were asked whether they get help from another person for each ADL because of a physical, mental, or emotional problem; a separate question asked whether the person needs to be reminded or needs to have someone close by when performing each task. For each of the six ADLs, we created a dichotomous variable, which had a value of one if the person answered yes to either of these questions. For each of the six IADLs, respondents aged 18 or older were asked whether they get help or supervision from another person because of a physical, mental, or emotional problem; we created dichotomous variables that had values of one if the person received help or supervision performing each task. Use of equipment to perform any task, in the absence of human help or supervision, was not counted as being disabled.
Multiple-Indicator/Multiple-Cause (MIMIC) Models
Initial analyses were conducted on the 10,371 adult NHIS-D respondents who received help or supervision with at least one of the 12 ADL/IADL tasks. The remaining 134,440 (93%) respondents, persons who answered "no" to all items, have no variation in their responses and consequently contribute no information regarding item performance. More important, because the focus of the analyses is on measuring the severity of disability among persons with any disability, including the large number of respondents with no disabilities in the analyses would obscure systematic variation among the disabled.
Standard DIF assessment procedures assume that all items measure a single underlying latent trait. Recommended practice is to demonstrate the existence of a single dominant dimension (Hambleton, Swaminathan, and Rogers 1991
). One indication of a single dominant dimension is if the magnitude of the first eigenvalue of the item correlation matrix is large, relative to the second eigenvalue (Lord 1980
). We examined eigenvalues of the interitem correlation matrix based on the 10,371 adult NHIS-D respondents who received help or supervision with at least one of the ADL/IADL tasks. Because the items were dichotomous, tetrachoric correlations were computed.
There are several approaches to measuring DIF. We used a MIMIC latent variable model. MIMIC models have been used previously to investigate DIF in depression screening scales (Gallo, Anthony, and Muthen 1994
; Grayson, Mackinnon, Jorm, Creasey, and Broe 2000
). The MIMIC model postulates that a latent factor gives rise to associations among several observed indicators. In the present case, IADL and ADL items are observed indicators assumed to measure an unobserved functional disability factor. In addition, the MIMIC model extends the standard factor analysis model by including observed exogenous variables that affect the latent factor. In the present instance, latent disability is regressed on age and gender. Finally, the model includes direct effects from the exogenous age and gender variables to the indicators. Thus, the MIMIC model distinguishes two ways in which group differences in ADL/IADL may manifest themselves. First, age or gender groups may be more or less disabled, which in turn affects responses to the observed indicators. Second, age or gender groups may differ in their responses to particular items, over and above any differences in disability. Such direct effects correspond to DIF; they represent systematic differences in item responses controlling for the latent factor.
Many authors have used Item Response Theory (IRT) models to investigate DIF (e.g., Morales, Reise, and Hays 2000
;Teresi, Kleinman, and Ocepek-Welikson 2000a
). We decided to estimate MIMIC models instead for several reasons. First, it has been shown that a dichotomous factor analysis model (without exogenous covariates) is equivalent to and a reparameterization of the standard two-parameter IRT model (McDonald 1999
; Muthen and Lehman 1985
; Takane and de Leeuw 1987
). Second, procedures for testing DIF in an IRT framework become cumbersome when there are more than two groups. The MIMIC model has the advantage that multiple exogenous variables can be included simultaneously.
In the measurement part of the MIMIC model, the dichotomous ADL/IADL items were specified as indicators of a single latent disability factor. To identify the model, the loading of the toileting item on the latent factor was fixed to equal 1.0. (Preliminary analyses using IRT modeling showed no significant DIF for this item as a function of age or gender.) In the structural part of the model, the latent factor was regressed on five age-gender indicators. We classified respondents into three age groups: 1839, 4069, and 70 or older to differentiate young, middle-aged, and older adults. We included persons aged 6569 in the middle-aged group because we wanted the elderly group to exclude the relatively healthy younger-old. Because we anticipated that the impact of gender-based DIF might vary in different age groups, we included interaction effects in the model. To examine each combination of age group and gender, analyses included five dummy variables (young men, young women, middle-aged men, middle-aged women, and elderly women). The reference category was men aged 70 or older. These two parts constituted the no-DIF model.
Next, each of the IADL and ADL items was examined individually for DIF. For each item, DIF was captured by a set of five direct effects, one from each dummy age/gender indicator to that item. A set of "forward inclusion" models was estimated, each adding the five age/gender DIF effects for one item to the no-DIF model. Items that did not manifest DIF were identified by a nonsignificant difference between the no-DIF model and the model containing DIF for that item. In view of the large sample size, this was a stringent test of the absence of DIF. These analyses validated the choice of toileting as an anchor item and identified other potential anchor items.
Next, we estimated a DIF model, which contained direct age-gender effects to all items, other than the no-DIF items identified previously. A series of "backwards elimination" models was then estimated, removing the five DIF effects from the DIF model one item at a time. We compared the chi-square for each of these models to that for the DIF model to identify items with especially severe DIF. Finally, to examine the extent to which adjusting for DIF affected estimates of age and gender differences in latent functional disability, we compared the magnitude of the direct effects of the age-gender indicators on the disability factor in the no-DIF and DIF models. We report unstandardized regression coefficients; because the estimated variance of the latent factor was .98, values of standardized coefficients were virtually identical to unstandardized ones.
The DIF and no-DIF MIMIC models were evaluated using standard criteria. A goodness of fit statistic, reflecting the discrepancy between the observed data (item means and covariances) and the model's predictions, can be compared with a chi-square distribution. However, because statistical power increases with sample size, chi-square goodness of fit tests should be viewed with caution because trivial differences often appear statistically significant. Consequently, we also examined other indicators of goodness of fit. The Comparative Fit Index (CFI) and the TuckerLewis Index (TLI) compare the substantive model to a baseline null model of independence among the observed variables; values of 0.95 or higher suggest acceptable fit (Hu and Bentler 1999
). The root mean square error of approximation (RMSEA) assesses misfit per degree of freedom; values less than 0.08 suggest an acceptable fit, whereas values less than 0.05 suggest very good fit (Browne and Cudeck 1993
).
Among the subset of persons who received help with at least one task, 92% had complete data. To deal with missing data, we replicated MIMIC analyses twice. In one set of analyses, we removed any case with missing data (listwise deletion). In the second set of analyses, we assigned missing data a value of zero (i.e., did not receive help or supervision); this procedure reflects the fact that this response was most likely, occurring more than 92% of the time for each item among those aged 18 or older. Both sets of analyses led to the same conclusions, raising confidence that biases from missing data were minimal. We report analyses in which missing values were recoded to zero.
All analyses were conducted using Mplus software, version 2.01 (Muthen and Muthen 1998
). Because the observed indicator variables were dichotomous, we used weighted least squares estimation, which is appropriate for models containing categorical variables. All analyses incorporated the NHIS-D sampling weight, normalized so that the sum of the weights equaled the unweighted sample size.
| Results |
|---|
|
|
|---|
Removing heavy housework from the analysis meant that persons who received help with only heavy housework were no longer considered disabled. Among the 10,371 persons who received help with at least one ADL/IADL task, 45% (n =4,621) indicated that they received help or supervision only with heavy housework. These persons had "no" responses to the remaining 11 items and thus provide little information regarding item performance. Consistent with our focus on measuring severity of disability among persons with disabilities, analyses were conducted on the remaining 5,750 respondents.
The first two eigenvalues of the tetrachoric correlation matrix of the remaining 11 items, calculated on the 5,750 disabled respondents, were 6.28 and 1.89, with the remaining eigenvalues less than 1.0. This is consistent with a single major dimension.
Demographic Characteristics
Among the 5,750 adults with a functional disability on at least 1 of the 11 ADL/IADL tasks, 17% were between 18 and 39 years old; 38% were between 40 and 69 years old, and 45% were aged 70 or older. Almost two-thirds were women (63% vs. 37%). The proportion of women was greater in older age groups: 53% among the young, 59% among the middle aged, and 70% among the old.
Table 1 shows the mean number of tasks for which people received help or supervision, by age-gender groups. Overall, adults with functional disabilities on average received help with 3.34 of 11 ADL/IADL tasks. To examine age and gender differences, we conducted multiple regression analyses using SUDAAN to incorporate the complex sampling design. Men had significantly more disabilities than women (3.47 vs. 3.26, p < .01). Elderly respondents had more limitations than the younger groups (3.63, 3.15, and 2.97 for elderly, middle-aged, and young persons, respectively, p < .0001), but differences between middle-aged and young persons were not significant.
|
|
2 = 2171.17, df = 44, p < .000), and the RMSEA was .092. In view of the somewhat high RMSEA, we examined derivatives among residual covariances to ascertain which ones might be contributing to lack of fit. We incorporated four covariances among residuals into the model: money management with telephoning, meal preparation with shopping and with light housework, and bathing with dressing. These residual covariances were between complementary tasks (e.g., bathing and dressing or meal preparation and shopping) or between tasks that share a strong cognitive component (i.e., money management and telephoning). Residual correlations ranged from .12 for bathing and dressing to .27 for money management with telephoning. The goodness of fit chi-square for the revised model was 1,218.89 (df = 40, p < .000), and CFI and TLI were .98 and .97, respectively. Inclusion of the residual covariance parameters reduced the RMSEA to .072. Table 3 presents the factor loadings and thresholds derived from the confirmatory factor analysis, including these four covariance parameters. Factor loadings were statistically significant (p < .001). The loadings were very high for all ADLs, ranging from .85 to .97; they were somewhat lower for IADLs, with managing money, light housework and shopping having the lowest loadings, ranging from .47 to .58.
|
Identifying Items With DIF
To identify items with DIF, we estimated several MIMIC models; all incorporated four residual covariances, specified previously. The first model (no-DIF) contained no direct (DIF) effects from the five age-gender groups to individual items. The goodness of fit chi-square for the no-DIF model was 2,398.08 (df = 90, CFI = .96, TLI = .96, RMSEA = .067). We then estimated 11 models, each adding five DIF effects for one item. Models with DIF effects for toileting (
2 = 2,386.71, df = 85) and for getting around inside (
2 = 2,393.08, df = 85) did not differ significantly from the no-DIF model (p < .01), suggesting that DIF was not present for these items. Subsequent models did not include age-gender DIF for these two items.
The chi-square for a model with DIF for the remaining nine items (the DIF model) was 1,384.32 (df = 45). The difference in chi-squares between this model and the no-DIF model (1,013.84, with 45 degrees of freedom) was statistically significant. Thus, including differential effects for age and gender significantly improved the fit of the model. Other fit indices indicated acceptable overall fit (RMSEA = .072, TLI = .95, CFI = .98).
To combine the five DIF effects for each item into a summary statistic, we compared the chi-square of the full-DIF model (i.e., nine items with DIF) with the chi-square from a model that eliminated DIF effects for that specific item. The last column of Table 4 reports the chi-square difference for each item, with five degrees of freedom. Each of the nine items had a significant chi-square (p < .001). Three items had relatively small chi-square difference valueseating, dressing, and meal preparation. Two items, shopping and managing money, had chi-square values that were notably higher than the rest. Other items with relatively large DIF were doing light housework, bathing, and using the telephone.
|
To gauge the overall impact of adjusting for DIF for all nine items, we compared age-gender group effects on latent disability, with and without DIF adjustment. These effects can be interpreted as estimated differences in latent functional disability between each age-gender group and elderly men. As shown in the top section of Table 5 , without adjusting for DIF, young women and both middle-aged groups appeared to be significantly less disabled than elderly men, but (surprisingly) young men were not. Controlling for DIF altered the estimates of the effects of age and gender on latent disability, and changed the conclusions about the relative disability of these groups. In the DIF model, the effects of being in the middle-aged group diminished for both men and women, resulting in a nonsignificant difference from elderly men. In contrast, the effect for young men became more negative, resulting in both young men and young women estimated to be significantly less disabled than elderly men.
|
To provide a sense of the relative impact of adjusting for DIF, the last column in Table 5 shows the percentage change in each coefficient effected by controlling for DIF. The magnitude of the DIF adjustment was large, except for young women. It was especially large for young men (decreasing the disability estimate by 264%) and middle-aged men (increasing the disability estimate by 80%).
Effects of Deleting Items
If an item shows large DIF, one option is to drop the item rather than try to adjust for it in a statistical model. We assessed whether deleting items with high chi-squares for DIF effects would affect the magnitude of the DIF adjustment. Managing money and shopping had chi-square values that were much larger than the other items (Table 4 ). Consequently, we re-estimated the no-DIF and the DIF models first deleting only shopping, then removing only money, and finally removing both items (sections 24 of Table 5 ).
When the "managing money" item was removed from the analysis, the no-DIF model estimates changed dramatically for young men, who appeared much less disabled, compared with old men. Removing shopping reduced the estimate of disability for elderly women. Nonetheless, even when these two most problematic items were removed, DIF adjustment remained important. Controlling for DIF produced, at a minimum, a 35% change in parameter estimates. Estimated differences between each age-gender group and elderly men were reduced by controlling for DIF. Significant effects for middle-aged men and elderly women became nonsignificant when DIF effects were included in the model. In general, without DIF adjustment, we would conclude disability differences across age-gender groups were greater than they actually were.
| Discussion |
|---|
|
|
|---|
After controlling for DIF, the gender effects on the latent factor show that women were slightly (but not significantly) less disabled than men in the middle-aged and elderly age groups. In contrast, the common finding in studies of functional disability is that women are more likely to have disabilities than men (Dawson, Hendershot, and Fulton 1987
; U.S. Bureau of the Census 1990
). The higher prevalence of any disability among women was replicated in this study, because nearly 9% of women in the overall NHIS-D sample had a functional disability, compared with 5% of men. However, when analyses were restricted to the subset of persons with some functional disability, the level of disability tended to be more severe among men in the two older groups. Spector and Fleishman 1998
found similar results among elderly adults. These results highlight the importance of distinguishing factors that affect the prevalence of any disability in the general population from factors that affect the severity of disability among persons with disabilities.
IADL items, especially managing money and shopping, tended to have larger DIF effects than ADL items. Relative to men aged 70 and older, at the same disability level, young men were more likely to respond that they received help with managing money, and both men and women in the two youngest age groups were less likely to respond they were receiving help with shopping. DIF was less substantial for ADLs, except bathing.
Strategies for Dealing With DIF
Studies that include participants from across the age spectrum or examine gender differences and do not adjust for DIF may produce biased estimates of age or gender differences in functional disability. The potential existence of DIF needs to be addressed in the design and analysis phases of such studies. Statistical adjustment, by using latent variable models, is one approach to reducing the impact of DIF on group comparisons. Another approach is to reword questions that exhibit DIF. The NHIS-D questions themselves are very general, leaving substantial room for interpretation; thus, improvements may be possible.
As noted, a third strategy to reduce the impact of DIF is to delete problematic items, such as managing money. To assess whether removing problematic items was sufficient to reduce the impact of DIF, we compared models that excluded managing money and/or shopping. Removing these items did not eliminate DIF, suggesting the importance of statistically adjusting for DIF in making group comparisons of functional disability. When considering whether to delete an item from a scale, one must also consider the impact on the content validity of the instrument and the potential loss of information for measuring certain levels of the latent trait. If an item's threshold differs from those of other items, the item is measuring a point on the latent dimension that is not well represented by the other items in the scale, and removing the item may have a negative impact on precision or content validity.
Psychometric Properties of IADL and ADL Items
The present results are consistent with Spector and Fleishman 1998
findings, among elderly people with disabilities, that ADL and IADL items do not form a clear hierarchy. In the present study, the ADL task of bathing had a threshold parameter similar in magnitude to some IADL items, whereas the IADL task of telephoning had a threshold parameter similar to some ADL items. As in the earlier analyses, the threshold parameters display a gap between shopping and the item with the next lowest threshold, indicating an area on the latent continuum of functional disability in which the items do not provide a great deal of information. In an analysis of IADL and ADL items in the National Long Term Care Survey, doing laundry followed shopping in terms of thresholds and had a high loading (Spector and Fleishman 1998
). The addition of this item, which is not included in the NHIS-D, would likely improve discrimination among those with mild functional disabilities.
Heavy housework ("Doing heavy work around the house like scrubbing floors, washing windows, and doing heavy yard work") should not routinely be included in measures of functional disability. This item had virtually negligible correlations with all but one other ADL/IADL item and thus did not appear to indicate the same construct. Excluding heavy housework had a large impact on the size of the sample defined as disabled. If heavy housework was included among the tasks that define functional disability, 6.9% of adults were disabled. However, if receipt of help with heavy housework was excluded as an indicator of disability, then the estimated proportion of adults with disabilities dropped to 3.8%.
Latent trait analyses assume that the items all reflect a single underlying dimension. We used the criterion (Lord 1980
) that a large first eigenvalue relative to the second was sufficient evidence to suggest unidimensionality. For 11 ADL/IADL items, the first two eigenvalues were 6.3 and 1.9. Using this criterion, the present study provides evidence that NHIS-D IADL and ADL items can be combined in a unidimensional scale. Others (e.g., Teresi et al. 2000a
) have used similar eigenvalue patterns as evidence of unidimensionality. For example, Teresi and colleagues 2000b
, in analyses of a cognitive screening measure, reported first eigenvalues ranging from 5.1 to 5.7 in different subgroups and second eigenvalues ranging from 1.4 to 1.5 as evidence of unidimensionality.
Although we have met a statistical standard commonly used for determining unidimensionality, the existence of age and gender DIF, and the inclusion of correlated errors in the model, suggest that the scale is not perfectly unidimensional. The strong DIF exhibited by managing money, and the correlated error between money and telephoning, suggest the presence of a secondary cognitive factor. Of the 11 ADL/IADL items, managing money and using the telephone appear to have the strongest cognitive component. Only two items, however, may not be sufficient to demonstrate the existence of a cognitive factor. IADL/ADL scales have been criticized for being insensitive to functional losses that result from cognitive deficits (Spector 1997
; Tappen 1994
). Future research could include items intended to tap functional loss associated with mild cognitive deficits, such as items in the Pfeffer Functional Activity Scale (Pfeffer, Kurosaki, Harrah, Chance, and Filos 1982
). Combined with standard ADL/IADL items, such an expanded item pool may provide clearer evidence for a separate cognitive dimension of functional disability.
Limitations
Limitations of the analyses need to be acknowledged. The MIMIC model allows the loadings to vary across items. However, the MIMIC model inherently imposes the restriction that the loading for each item does not vary as a function of age or gender. Future research on the psychometric properties of ADL/IADL items should examine the comparability and equivalence of factor structures and loadings across age and gender groups.
Another limitation pertains to the sampling design. Although we incorporated the sampling weights into the analyses, software limitations precluded us from adjusting for other aspects of the complex sample, such as stratification and clustering. In part, this motivated our selection of a conservative .001 level for significance tests.
The chi-square goodness of fit tests for both the DIF and the no-DIF MIMIC models were statistically significant, indicating that the models' predictions did not perfectly match the observed means and correlations. However, a large sample size5,750 in this studyinflates the value of the chi-square statistic. More troubling were the values of the RMSEA, which were still somewhat high, despite the expedient of estimating four disturbance covariances. To ascertain the degree to which imposing a single disability factor might contribute to lack of fit, we estimated a MIMIC model with two factors, corresponding to ADL and IADL items, respectively. The two-factor DIF model, with no correlated disturbances, had a chi-square of 1,844.59 (df = 48) and an RMSEA of .081. These values were worse than those for the single-factor DIF model with correlated disturbances; a two-factor specification may not dramatically improve the model's fit. (The ADL and IADL factors correlated .67.)
The nature of the functional disability criterionreceiving human help or supervisionmay shape the generalizability of the results. In many studies, rather than indicate receipt of help, respondents indicate how much difficulty they have performing ADL or IADL tasks. Using a criterion of difficulty may give rise to different pattern of results. In particular, the criterion of receipt of help may result in higher interitem correlations than the difficulty criterion; once a network of helpers is activated, it may provide assistance with multiple tasks. Receiving help with ADL/IADL tasks may also reflect cultural or social factors, such as expectations concerning when it is appropriate to request or offer help with these tasks. Incorporating use of mechanical aides into the definition of disability may also alter the pattern of results. Future research should examine more closely the effect of the disability criterion on the psychometric properties of functional disability measures.
This study has focused on ADLs and IADLs as measures of functional disability. Other measures of functional disability include items assessing mobility or cognition (Kempen, Miedema, Ormel, and Molenaar 1996
; Mahoney and Barthel 1965
). The psychometric properties of several functional disability scales have been reported elsewhere (Cohen and Marino 2000
; Spector 1996
). In addition, measures of disability, more broadly construed, ascertain performance of social roles and participation in socially valued activities. Although our analyses do not address these other dimensions, our results suggest that researchers be attentive to the possibility of DIF in these measures.
In conclusion, evidence of substantial age and gender DIF implies that adjustments for DIF are necessary when making comparisons of disability levels across age and gender groups. Concern with DIF first arose in educational testing. Correctly answering certain test items could potentially require extraneous information that was differentially available to members of certain sociodemographic groups. Decisions for individual students based on such biased items could be inappropriately disadvantageous. A similar situation may exist with measures of functional disability. ADL/IADL items are often used to determine eligibility for services or to allocate program resources to individual clients. Some states have initiated efforts to consolidate in one agency long-term care programs for elderly persons and for people with disabilities (e.g., Oregon, Texas, Wisconsin). This could increase the likelihood that clients of various ages would be compared. Based on this study's findings, if age-based DIF is ignored, one consequence could be that middle-aged persons may appear to be less disabled, compared with elderly persons, and may receive fewer program resources than their underlying severity of disability would merit. In addition, programs for nonelderly persons that use ADLs/IADLs as criteria for allocating resources may make inappropriate decisions due to gender DIF.
| Acknowledgments |
|---|
Received for publication November 12, 2001. Accepted for publication February 20, 2002.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C.-L. Shih and W.-C. Wang Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor Applied Psychological Measurement, May 1, 2009; 33(3): 184 - 199. [Abstract] [PDF] |
||||
![]() |
W. Holmes Finch and B. F. French Detection of Crossing Differential Item Functioning: A Comparison of Four Methods Educational and Psychological Measurement, August 1, 2007; 67(4): 565 - 582. [Abstract] [PDF] |
||||
![]() |
C. A. McHorney Ten Recommendations for Advancing Patient-Centered Outcomes Measurement for Older Persons Ann Intern Med, September 2, 2003; 139(5_Part_2): 403 - 409. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|