| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
RESEARCH ARTICLE |
a Department of Sociology and Gerontology Program, Purdue University, West Lafayette, Indiana
Kenneth F. Ferraro, Stone Hall, Purdue University, West Lafayette, Indiana 47907-1365 E-mail: ferraro{at}purdue.edu.
| Abstract |
|---|
|
|
|---|
Methods. Morbidity measures from adults in 2 large national surveys were used in both cross-sectional and longitudinal analyses.
Results. Although differences across the approaches are modest, the binary variable approach offers greater explanatory power and slightly higher R2 values. Despite these advantages, statistical power is insufficient in some cases, especially for conditions that are relatively rare and/or that manifest modest differences on the outcome variable.
Discussion. Statistical power estimates are advisable when using the binary variable approach, especially if the list of diseases and health conditions is extensive. Although a simple count of diseases may be useful in some research applications, separate counts for serious and nonserious conditions should be more useful in many research projects while avoiding the risk of inadequate statistical power.
THE accurate measurement and use of morbidity information are critical for gerontological, epidemiologic, and health services research. Adulthood, especially later adulthood, is a time of increasing prevalence of chronic disease, and the influence of comorbidity may be quite consequential to health trajectories. Although morbidity has been examined as an independent variable in thousands of studies, relatively little systematic research has considered the consequences of various methods for measuring morbidity as an independent variable.
Virtually all surveys that measure morbidity have a series of questions that query whether a person has a given disease or condition. Most gerontologists and health scientists using health surveys or medical records begin with the binary (indicator) variables of disease, but may use the indicators in different ways. The decision of how to use the disease indicators is important, because it may lead to different conclusions about the effect of morbidity on a wide range of outcomes, from self-rated health (Idler 1993
) to hospital mortality rates (Iezzoni 1997
). It could also have profound policy implications. For instance, if a disease is not found to influence an outcome such as disability or mortality, there may be an inclination to treat the disease less aggressively.
A very common practice in the gerontological literature has been to sum the disease indicators to form an overall measure of comorbidity (Fillenbaum 1979
; Gibson 1991
; House et al. 1994
; Levkoff, Cleary, and Wetle 1987
; Liang, Bennett, Whitelaw, and Maeda 1991
; Liang, Lawrence, Bennett, and Whitelaw 1990
; McBroom 1970
; Whitelaw and Liang 1991
). A chief advantage of this procedure is parsimony, especially when a long battery of diseases is presented to participants or when panel data are used to assess change in comorbidity and other phenomena (e.g., social support). A second advantage is that this approach uses all the disease information, even that from the rare conditions. The resulting composite expresses comorbidity in an additive form, and it conveniently differentiates people at each level of overall morbidity.
The other major approach has been to use dummy variables for each illness to avoid simple sums of very different conditions (e.g., Crimmins and Saito 1993
; Idler and Kasl 1991
; Wolinsky and Johnson 1992
). Separate variables to distinguish the various illnesses are less parsimonious but keep the unique contribution of the conditions salient. In addition, they permit investigators to examine how special combinations of diseases may interact in affecting health-related outcomes. Fried, Bandeen-Roche, Kaser, and Guralnik 1999
recently found such interactive effects of diseases on disability in the Women's Health and Aging Study. Reports of such analyses are rare, perhaps in part because of the complexity involved, but the binary variable approach is the only way in which the unique contribution of a diseaseor, ultimately, interactions of diseasesmay be ascertained. Proponents of the binary variable approach further argue that because the correlation among the various disease indicators is typically modest, summing disease indicators is akin to adding the proverbial apples and oranges (Schmitt and Colligan 1984
).
One concern when using the binary variable approach is parsimony. Although parsimony is a concern for many surveys that ask respondents questions about a "short list" of diseases (less than 15 conditions), it is even more of a concern when the surveys have a larger battery of questions on diseases. Indeed, some investigators have used factor analysis of the binary disease variables to reduce their number (e.g., Wolinsky and Johnson 1992
, reduced their list from 13 to 7). The issue of parsimony is even more important when disease is treated as a dependent variable.
Closely related to, but distinct from, the issue of parsimony is statistical power (Cohen 1988
). Although power is an elementary statistical issue, it is not apparent that researchers are systematically taking it into account when using the binary variable approach. The usefulness of the binary variable approach in population samples is highly dependent on the distribution of the conditions or diseases under investigationwhat is often referred to as the base-rate problem in studies of accidents or rare diseases (Schmitt and Colligan 1984
). Statistical power for testing the effect of a condition on a quantitative variable is a function of sample size, the distribution of the condition in the sample, and the difference in means on the quantitative variable between those who do and do not have the condition. Thus, if conditions are rare (e.g., stroke, hip fracture), then it is less likely that there will be adequate statistical power for testing their effects with the binary variable approach. In such cases, the possibility of "false negatives" is greatinvestigators conclude that disease x does not influence an outcome when, in fact, the statistical chance for it to do so is low. If, instead, the investigator used the count variable approach, at least the contribution of disease x, expressed in the comorbidity term, could be estimated. It is clear that each approach has its advantages and disadvantageseach may be appropriate for given research questions.
Although there is no routine approach for all research problems, there is value in systematically examining the consequences of using these two approaches on the relevant statistical criteria. The purpose of this article is to compare these two approaches, along with two closely related approaches, to better understand the advantages and disadvantages of each. The present research compares these procedures for measuring morbidity in both cross-sectional and longitudinal analyses, with explicit attention given to statistical power, parsimony, and model fit. We use self-rated health as the outcome to illustrate how the different procedures influence findings from two large national data sets. The results have direct implications for the large and growing literature on self-rated health as a global perception of overall health status (e.g., Idler 1993
; Jylha, Guralnik, Ferrucci, Jokela, and Heikkinen 1998
), but the aim of this analysis is to clarify the relative merits of the alternative measures of disease information. Therefore, special attention is given to the issue of statistical power.
| Methods |
|---|
|
|
|---|
The NHEFS collected medical examination and survey interview data in 19711975 using a multistage, stratified, probability sample of noninstitutionalized persons ages 2574. The analyses will be completed on the baseline NHEFS subsample, designed as a nationally representative sample, which was administered the "detailed component," including an extensive medical examination (N = 6,913; National Center for Health Statistics 1979
). The baseline for the NHEFS was administered in modules. Although 14,407 respondents 2574 years of age were given selected modules, respondents in the detailed component received all modules including the health care needs and general well-being questions. Sampling and operational details are provided elsewhere (Cornoni-Huntley, Huntley, and Feldman 1990
; National Center for Health Statistics, 1987). The survey is longitudinal, and the present analysis makes use of the first follow-up, approximately 10 years later (due to the extensive field operations of the baseline). This analysis is based on the 6,833 non-Hispanic White and Black respondents at the baseline and 4,986 reinterviewed respondents at the first follow-up.
The HRS is a nationally representative survey of 7,706 households with at least one person age 51 to 61. The data contain a total of 12,652 respondents (Juster and Suzman 1995
). The baseline interview was conducted in 1992 and follow-up interviews have occurred every 2 years thereafter. The spouses of selected respondents were interviewed for HRS, but this analysis excludes the spouses who are secondary respondents. The analysis is based on the 5,637 non-Hispanic White and Black, age-eligible, primary respondents (i.e., ages 51 to 61) who were interviewed at the baseline in 1992. Of these respondents, 5,154 (91%) were successfully reinterviewed in 1994. The sample size for the longitudinal analysis is 5,151 because 3 age-eligible cases, who did not answer the self-reported health measure in 1994, were excluded.
The surveys have different lag times between waves10 years for NHEFS and 2 years for HRS. Although there would be value in testing models across surveys with the same time lags, effect sizes can vary as a function of the length of the time lag. Thus, as recommended by Gollob and Reichardt 1987
, there is value in using different lags to study the same relationship to obtain a more "complete understanding of a variable's effect" (p. 82).
Measurement
The measurement of self-reported morbidity in NHEFS was derived from a checklist question designed to identify which illnesses respondents had. Respondents were asked, "Has a doctor ever told you that you have...hypertension or high blood pressure?" Overall, 36 conditions are presented. The answer to such a question is not a report about how one feels about a specific condition, but rather a report of a condition based on a medical encounter. Each condition was coded as a binary variable (
).
Morbidity in the HRS was measured by focusing on eight target conditions that are prevalent among adults who are middle aged or older and can lead to work disability (Wallace and Herzog 1995
). Those conditions include adults who have hypertension, diabetes, cancer, chronic lung disease, heart problems, psychiatric problems, arthritis, and who have had a stroke. Each respondent was asked if he or she had ever been told by a doctor that he or she had one of these conditions. Disease-specific questions were then asked concerning medical treatment and functional impairment. Seven additional diseases and conditions were measured through a series of questions in which the respondent was asked, "Do you have any of the following health problems?" The list of conditions included asthma, back problems, foot and leg problems, kidney or bladder problems, stomach or intestinal ulcers, and high cholesterol. A separate question measures whether the respondent had fractured a bone since he or she was 45. Although the prevalence estimates derived from the physician-referenced questions may be different from those without a physician referent, the focus of the present analysis is not estimating prevalence. Whereas the aim of this research addresses the consequences of using different forms of the disease indicators as independent variables, all 15 conditions queried are used in the analyses that follow.
Morbidity in both surveys will be treated several ways. The focus of our research is comparing two approaches outlined earlier:
In addition, the following procedures for measuring morbidity will be examined (the latter only with the NHEFS):
First, for these additional analyses, separate counts of serious conditions and nonserious (chronic) conditions can be developed from both surveys (Ferraro and Farmer 1996
, Ferraro and Farmer 1999
; Mutran, and Ferraro 1988
). This approach, developed in consultation with a panel of physicians and nurses, is recommended for avoiding the possible power problem of the binary variable approach while offering more specific and descriptive measures than the simple count of conditions. The serious illness conditions include: cancer, diabetes, heart failure (attack or trouble), hypertension, and stroke. The remaining conditions, hereafter referred to as chronic illness, are identified in Table 1 ; examples include arthritis, asthma, bone fracture, cataracts, gout, psoriasis, and ulcer.
|
Second, given the large number of specific conditions in NHEFS, it is possible to estimate parallel models with diseases categorized according to the ICD, Eighth Revision (ICD-8; World Health Organization 1967
). This approach amounts to aggregating the original disease indicators into a widely recognized system of disease classification (i.e., a specialized count variable approach). Fourteen ICD-8 codes are also used to simplify the list of 36 original variables (ICD uses 17, but there are no conditions in 3 categories for this sample). The ICD-8 has been revised, but the 8th revision was the most current at the time of the survey.
Table 1 displays the list of morbidity variables available in the surveys and provides descriptive statistics for each. The specific conditions common to both surveys are listed first, in alphabetical order, followed by those that are unique to NHEFS and HRS. It is clear that NHEFS queried respondents about several conditions that appeared in very small proportions (e.g., prevalence
1% for stroke, cataracts, detached or other retina condition, glaucoma). On the other hand, the more limited list employed by HRS focused on conditions more prevalent in the age group studied (i.e., stroke was the least prevalent condition, 3%).
Self-rated health was measured identically in both surveys with the question, "Would you say that your health in general is excellent, very good, good, fair, or poor?" (Scoring in this analysis ranges from
.) The independent variables span a broad range of factors known to be related to morbidity and self-rated health, either directly or indirectly (e.g., Ferraro 1980
; Idler 1993
; Krause 1987
). Although the list of covariates is not exhaustive, it includes variables that are widely acknowledged in the literature: age, gender, race, living alone, education, income, present smoker, and past smoker.
Analysis Plan
The analysis proceeds in two major stages: a comparison of alternative explanatory models and a statistical power analysis. First, analyses comparing the alternative measures of morbidity in cross-sectional and longitudinal models of self-rated health were estimated with both ordinary least squares (OLS) and logit models. The longitudinal models are based on residualized change procedures in which Wave 2 self-rated health is regressed on Wave 1 self-rated health, the morbidity indicators, and covariates. The longitudinal analysis also accounts for selection bias due to attrition (Heckman 1979
; Winship and Mare 1992
). Second, statistical power is calculated for each disease indicator and all count variables. Power is also calculated for each disease variable on subsamples of the NHEFS and HRS to demonstrate the importance of power calculations in both the design and analysis of health data related to morbidity.
| Findings |
|---|
|
|
|---|
In this instance, Long suggests two alternatives: (a) estimate a multinomial logit model in which no ordering is implied across the five categories; in essence, treat self-rated health as a nominal variable and (b) collapse the five categories of self-rated health into two, and then use a binomial logit model. We followed his recommendations and completed the analyses in three basic ways: (a) OLS (b) multinomial logit, and (c) binomial logit with two different categorization schemes. (The five original categories of self-rated health were grouped in two different ways: [a] excellent, very good, and good vs. fair and pooryielding roughly a 50/50 split and [b] excellent, very good, good, and fair vs. pooryielding a split of approximately 78/22 in each data set.) Whereas all three strategies yielded similar results, we present the OLS results for the most parsimonious and straightforward presentation. (The lengthy results from the binominal and multinomial logit models are available from the authors upon requestthe latter entail four coefficients for each variable in each model.)
Table 2 presents OLS estimates for the cross-sectional models with the two data sets. Model 1 for each data set specifies the simple count of conditions as the morbidity indicator along with the covariates. Model 2 specifies the separate summed variables for chronic (nonserious) and serious conditions for the morbidity indicators. The fit of Model 2 is superior to Model 1 in both data sets. The pattern of statistical significance for the covariates is similar in Models 1 and 2 for both data sets. As expected, each of the morbidity measures in Model 2 is negatively related to self-rated health, but the stronger effect in both samples is for serious conditions. A test of the slope difference reveals that it is statistically significant in both samples
. Chronic conditions are associated with poorer health ratings, but serious or life-threatening conditions are more consequential in shaping health ratings. This conclusion was evident from both the OLS analyses presented and from the multinomial logistic regression analyses.
|
Generally, the effects due to the covariates are similar across the models. A comparison of the covariates shows that the values of the coefficients tend to decline across the models, however, most coefficients maintain their level of significance. There is one exception that should be noted. In the cross-sectional models for the NHEFS, gender is significant in Model 1 but not in Models 24.
Table 3 presents the parallel analyses for the longitudinal models of self-rated health across both surveys. For NHEFS, this is an approximate 10-year lag, whereas for HRS the lag period is just 2 years. The simple count of conditions in Model 1 is again quite significant in both samples. Model 2, substituting the counts for chronic and serious conditions, provides a better fit than Model 1 in both samples. As in the cross-sectional models, serious illness is the stronger predictor of self-rated health (
. Chronic illness leads to a decline in health ratings, but the decline is steeper for those afflicted with serious illness. Model 3 for NHEFS specifies the ICD codes as the morbidity measures. This model provides a better fit than Model 1 but not Model 2. Although 11 of the 14 ICD codes were significant in the cross-sectional equations, only three of the 14 codes are significantly associated with change in health ratings: respiratory system, digestive system, and symptoms and ill-defined conditions.
|
Three supplementary analyses were performed to examine more closely the relationships between the morbidity indicators and self-rated health: a nominalization test for thresholds, a reduced morbidity inventory with conditions parallel across surveys, and tests for interactions of specific diseases. These analyses are not presented in tables, but are briefly summarized below. First, a nominalization test was performed to identify possible thresholds or tip-points of the effects of comorbidity on self-rated health (Cohen and Cohen 1983
). Beyond the effects due to the unique conditions tested in the binary variable approach, separate dummy variables representing each level of the simple count were created to test the effects of comorbidity. In models for both the NHEFS and HRS, with 0 serving as the reference group, each higher level was significantly and negatively associated with self-rated health. The effects were fairly linear, however. Although there does not appear to be any threshold effect in these data, investigators who favor the binary variable approach may find value in estimating threshold effects in models of other outcomes.
Second, whereas the number of conditions was not equivalent across the surveys, the equations were re-estimated with a reduced morbidity inventory that considered only the conditions common to both surveys. The logic of such an analysis is to make the analyses across the two surveys more similar. Moreover, the results shown in Table 2 and Table 3 indicate that many of the chronic (nonserious) conditions, especially in the NHEFS, are not consequential to self-rated health. Although deciding which conditions to include and which to ignore is somewhat arbitrary, omitting disease information in the morbidity count variables would lead to an underestimate of comorbidity. Thus, we compared the fit statistics and R2 values from the results in Table 2 and Table 3 to parallel models that remove the conditions not found in both surveys. In both the cross-sectional and longitudinal analyses, the models omitting the conditions unique to each survey provided significantly poorer fit and lower R2 values. For the HRS, the model with all of the conditions included added about 4% to the explained variance; the difference was about 2% for the NHEFS. In general, these results show the importance of including all of the available morbidity information. We conclude that longer morbidity inventories are worth the interview time to collect them and should be used in multivariate models.
Third, given the recent work of Fried and colleagues 1999
, several disease interaction terms were tested in the binary variable approach. The following product terms were tested for both data sets: Heart Attack x Diabetes, Diabetes x Kidney Disorder, Cancer x Emphysema, Arthritis x Hypertension, and Heart Attack x Cancer. The first one was significant in both data sets, and Arthritis x Hypertension was also significant in the HRS. Thus, investigators may want to consider testing for interaction terms that are theoretically meaningful to the analysis, especially when using the binary variable approach.
Statistical Power
The analyses reported thus far for the binary variable approach indicate which diseases are significant in predicting self-rated health. The presumption of such analyses is that there is adequate statistical power to test such effects. That is, it is presumed that if a binary disease variable is nonsignificant, it is because it is inconsequential to the self-rated health of respondents in the two surveys. This presumption, however, may be inappropriate because of inadequate statistical power for selected conditions, especially the relatively rare ones. In this next section, we examine statistical power in these surveys to further consider how it may influence the selection of the most appropriate method for measuring morbidity.
In Fig. 1 we graphed hypothetical power calculations for detecting differences in self-rated health among persons with and without diabetes and hip fracture, respectively. These conditions were chosen for this simulation because they have different prevalence rates in the population while providing different levels of self-rated health among those who do and those who do not have the condition. Using the mean differences from the NHEFS as the population parameters, in the figure we plotted three hypothetical power curves for samples of 100, 500, and 1,000 across various distributions of the disease. (For instance, the 5/95 split assumes 5% of the sample has the condition and 95% does not.) In Fig. 1 we demonstrate that statistical power is a function of (a) the difference between means (b) sample size, and (c) the distribution of respondents in the sample (i.e., percentage split) for those who do and do not have a specific condition.
|
Returning to the comparison of morbidity measures in the analyses of NHEFS and HRS presented above, it is now appropriate to ask another question: Was there sufficient statistical power to detect differences for each of the conditions considered in the models of self-rated health? Although NHEFS and HRS are very large samples (over 6,800 and 5,637 cases used in these analyses, respectively), might statistical power be a problem in these or other large data sets when using the binary variable approach?
In Table 4 we provide power calculations for all individual diseases as well as for the simple count of conditions and the separate counts for serious and chronic conditions. For the binary variables, power was calculated based on the following formula for the noncentrality parameter from Neter, Wasserman, and Kutner 1990
:
![]() | (1) |
is the standard deviation for self-rated health (assuming the same within-group
), and n1 and n2 are the number of people with and without the condition.
|
![]() | (2) |
The first column of Table 4 presents power for all respondents in NHEFS and HRS. Using .80 as a criterion, it may be seen that power is sufficient for the summed morbidity variables. However, there is insufficient power for 13 of the 35 individual conditions that are represented in this NHEFS sample. Over one-third of the binary variables do not have sufficient power in the NHEFS data. By contrast, there is adequate power for all of the individual conditions measured in the shorter disease inventory used in the HRS.
In the last four columns of Table 4 we present power calculations for subsamples defined by gender and race. Researchers often use subsamples of public data sets to focus on better understanding health processes within selected groups. Or, in the case of testing for statistical interactions, researchers rely de facto on power in the subsample. Thus, the data shown in the last four columns of Table 4 reveal the impact on power when analyses are based on subsamples or when interactions are tested by gender or race. Whereas the HRS contains a Black oversample, power calculations for HRS were completed on both the weighted and unweighted data; the latter, reflecting the Black oversample, are reported in Table 4 .
Statistical power is excellent for all of the summed morbidity variables in NHEFS and HRS subsamples. On the other hand, statistical power is insufficient in a larger number of diseases for subsample analyses in NHEFS. For example, among the subsample of men, power is insufficient for 20 of the 35 conditions (57%), whereas for the female, non-Black, and Black subsamples the number of conditions with inadequate power is 12 (34%), 13 (37%), and 29 (83%), respectively. For the HRS sample of 5,637 cases, all of the specific conditions have sufficient power, except the fracture since age 45 condition in the Black subsample analyses. This was also the case with the weighted data. For all other conditions, results from the weighted and unweighted data were very similar and showed that statistical power is adequate.
Returning to our earlier question of sufficient statistical power for all conditions in these data sets, one may answer affirmatively for HRS but not for NHEFS. In retrospect, the absence of a statistically significant effect in regression models of self-rated health for many of the individual conditions in NHEFS may have been due to inadequate power. If a researcher were to use the binary variable approach for NHEFS without calculating power, he or she would conclude (perhaps inappropriately) that many diseases are not related to self-rated health in cross-sectional and longitudinal analyses. In fact, it is quite possible that some of those diseases influence self-rated health, but it is impossible to detect such effects with the binary variable approach in NHEFS. The problem is worse for subsample analyses or when testing for interactions. For example, in the NHEFS, it would be impossible to demonstrate that cancer affects health ratings among men.
| Discussion |
|---|
|
|
|---|
Beyond the demand of handling a large number of independent variables, statistical power may be the more serious problem with the binary variable approach. Power is critical to the conclusions derived from any sampling-based study. It is clear from the above power analyses that the binary variable approach may lead to the conclusion that a given disease is not consequential to an outcome when, in fact, the statistical chance for it to do so is small. Indeed, there was insufficient statistical power for more than one-third of the 35 diseases available in the full NHEFS, and even more of the diseases lacked sufficient statistical power when subsample analyses by gender and race were considered. However, statistical power was not a problem for any of the variables based on the count of conditions.
Insufficient power may also be a concern in studies that examine how special combinations of diseases may interact in affecting health-related outcomes (i.e., interaction of diseases). Fried and colleagues 1999
recently found such interactive effects of diseases on disability in the Women's Health and Aging Study. Given this, we performed statistical power calculations for several disease interactions for the NHEFS and HRS data. For both data sets, power was calculated for interactions of (a) heart disease and diabetes and (b) cancer and chronic bronchitis or emphysema. Power was also estimated for the interaction of (a) diabetes and kidney disorder in NHEFS and (b) diabetes and kidney or bladder problem in HRS. There was sufficient power for all interactions examined with the full sample. The only instance in which power was insufficient was for cancer and chronic bronchitis or emphysema in the Black subsample of NHEFS. In short, investigators may find it valuable to calculate power prior to specifying disease interactions, but these analyses indicate that power is adequate in these data for several common disease interactions.
It is plausible that there is no association between self-rated health and food allergies, hay fever, and hives, but it is impossible to determine if this is so from NHEFS. It may be more difficult to presume that other conditions, which did not manifest an association with self-rated health (e.g., tuberculosis, hepatitis, and psoriasis), are not consequential in shaping health ratings. Or, would one genuinely believe that cancer is not related to health ratings in a sample of men? Whether any of these conditions influence health ratings is an empirical question, but it cannot be answered in NHEFS and probably not in many other data sets because of inadequate statistical power. The results of the analyses presented here clearly show that investigators need to exercise caution with the binary variable approach to be sure that there is adequate statistical power for the proposed tests.
The conditions lacking adequate power in the NHEFS full sample were not measured in HRS. Instead, HRS focused on those conditions for which the age group studied is known to be at risk. Thus, the list itself is an important clue to the appropriateness of the different approaches. For short lists with prevalent conditions, the binary variable approach may be preferred in large samples. For longer lists including relatively rare conditions, caution should be exercised with the binary variable approacheven with large samples. Stated differently, the greater the specificity in the morbidity listing, the greater the likelihood of a base-rate problem for some conditions, resulting in insufficient power (Schmitt and Colligan 1984
).
In models of self-rated health with the HRS, investigators may proceed with the binary variable approach, but should still exercise caution if testing interactions or estimating effects on other subsamples. It remains to be seen if the binary variable approach is equally appropriate with other outcomes in HRS or in other data sets.
ICD codes were considered as one alternative to the long list of NHEFS conditions. Models based on ICD codes possessed better fit than those with the simple count of conditions in both cross-sectional and longitudinal models. This fact and the widespread use of ICD codes in health services research make them an attractive option. The raw ICD codes are very specific nominal variables that may be useful for some investigations, but most researchers aggregate within bodily systems. As used here, ICD codes were treated as counts that measure the number of conditions within each ICD category. This approach classifies conditions of related pathology and preserves variation in morbidity within bodily systems, but alternative uses of ICD codes may be appropriate in some investigations (Schwartz, Iezzoni, Moskowitz, Ash, and Sawitz 1996
).
The other procedure examined for measuring morbidity was based on separate counts of serious and chronic (nonserious) conditions. In cross-sectional and longitudinal analyses for both data sets, models using these variables provided a better fit than did those with the simple count of conditions, and power was always sufficient.
It is possible that one might use elements of more than one of the approaches articulated above. For instance, on some outcomes, it may be useful to consider a count of chronic conditions, but separate binary variables for serious illnesses (Ferraro and Kelley-Moore in press
). This approach would be less parsimonious than the two variables for serious and chronic illness, but more parsimonious than the binary variable approach. One would still need to be attentive to the adequacy of statistical power for the binary variables considered and to possible threshold effects for the count variable, but this strategy may be optimal for certain research questions. Indeed, a number of studies follow such a strategy by focusing on the contribution of a single disease to an outcome while using the other diseases as adjustments (e.g., Parmelee et al. 1995
). The conclusions from the present analyses extend to that case as well: Test for the adequacy of statistical power on the binary variables. If there is insufficient statistical power for a single disease indicator, the use of count variables preserves the comorbidity information of the respondent for the analysis. It may be anticipated that the binary variable approach will yield more detailed and slightly better fitting models, but the need for parsimony may make count variables more appropriate for certain research questions.
Measures of severity of illness represent another promising approach to assess the impact of morbidity on health while distinguishing the more serious illnesses (Gonnella et al. 1984
; Knaus, Wagner, Zimmerman, and Draper 1993
; Mulrow et al. 1994
; Parmelee et al. 1995
; Steen, Brewster, Bradbury, Estabrook, and Young 1993
). Although different severity of illness measures have been advanced, most rely on hospital discharge abstracts or information from medical records. An alternative is to ask respondents to distinguish the most serious conditions. Regardless of the procedures involved, most severity of illness measures would avoid the problem of statistical power and, we suspect, provide better fitting and insightful models than simple counts of disease.
Accurate estimates of the prevalence of morbidity across the life course and assessments of how morbidity shapes quality of life, health service use, and mortality are dependent on how morbidity is measured. Although the differences in explained variance across the approaches are not large in these analyses, the lowest R2 values (and poorest fit) were routinely observed for the simple count of conditions. The simple count may be useful in some research applications (Liang et al. 1990
), but moving to separate counts for serious and nonserious (chronic) conditions should be more useful in many research projects while avoiding the risk of inadequate statistical power associated with the binary variable approach. If parsimony is not a major concern, then the binary variable approach may be preferred. We urge investigators, however, to test for statistical power when using the binary variable approach before concluding that rare conditions do not affect the outcome. Failure to perform such power tests may lead to an underestimate of the role of selected diseases on health and longevity. It could be very misleading for gerontologists to conclude that selected diseases do not influence health-related quality of life, especially among frail elders, when inadequate statistical power is responsible for the false-negative result. Finally, we urge clinicians and policy officials to temper conclusions about the lack of effects due to selected diseases in reported studies unless tests for statistical power have been performed.
| Acknowledgments |
|---|
Received for publication May 13, 1999. Accepted for publication December 7, 1999.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
N. Krause Meaning in Life and Mortality J Gerontol B Psychol Sci Soc Sci, July 1, 2009; 64B(4): 517 - 527. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Magai, N. S. Consedine, K. L. Fiori, and A. R. King Sharing the Good, Sharing the Bad: The Benefits of Emotional Self-Disclosure Among Middle-Aged and Older Adults J Aging Health, April 1, 2009; 21(2): 286 - 313. [Abstract] [PDF] |
||||
![]() |
V. L. Forman-Hoffman, K. K. Richardson, J. W. Yankey, S. L. Hillis, R. B. Wallace, and F. D. Wolinsky Retirement and Weight Changes Among Men and Women in the Health and Retirement Study J. Gerontol. B. Psychol. Sci. Soc. Sci., May 1, 2008; 63(3): S146 - S153. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. London and J. M. Wilmoth Military Service and (Dis)Continuity in the Life Course: Evidence on Disadvantage and Mortality From the Health and Retirement Study and the Study of Assets and Health Dynamics Among the Oldest-Old Research on Aging, January 1, 2006; 28(1): 135 - 159. [Abstract] [PDF] |
||||
![]() |
M. Ramos and J. Wilmoth Social Relationships and Depressive Symptoms Among Older Adults in Southern Brazil J. Gerontol. B. Psychol. Sci. Soc. Sci., July 1, 2003; 58(4): S253 - 261. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. F. Ferraro, Y.-p. Su, R. J. Gretebeck, D. R. Black, and S. F. Badylak Body Mass Index and Disability in Adulthood: A 20-Year Panel Study Am J Public Health, May 1, 2002; 92(5): 834 - 840. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Foley, H.-J. Tung, and E. J. Mutran Self-Gain and Self-Loss Among African American and White Caregivers J. Gerontol. B. Psychol. Sci. Soc. Sci., January 1, 2002; 57(1): S14 - 22. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|