| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||
RESEARCH ARTICLE |
a Department of Psychology, Washington University, St. Louis, Missouri
Maura Pilotti, Department of Psychology, Washington University, One Brookings Drive, Campus Box 1125, St. Louis, MO 63130-4899 E-mail: mpilotti{at}eudoramail.com.
Decision Editor: Margie E. Lachman, PhD
| Abstract |
|---|
THE principal challenge confronting listeners in spoken word identification is that talkers' voices substantially modify the spectral and temporal properties of speech signals (
Ladefoged 1980
;
Peterson and Barney 1952
), so that a word spoken by two different talkers results in slightly different acoustic patterns. Several studies (
Church and Schacter 1994
;
Goldinger 1996
,
Goldinger 1998
;
Nygaard and Pisoni 1998
;
Nygaard, Sommers, and Pisoni 1994
;
Pilotti, Bergman, Gallo, Sommers, and Roediger 2000
;
Sommers 1999
) have demonstrated that young listeners confront this challenge by encoding in long-term memory the unique characteristics of a talker's voice (e.g., pitch, melodic contours, and speech rate), which are then used to identify words spoken by that talker. The goals of this investigation were (a) to examine the conditions that promote the encoding of voice information in memory, and (b) to assess whether age-related declines exist in this form of encoding.
| Young Adults: Research Findings |
|---|
|
|
|---|
In studies using this paradigm (
Church and Schacter 1994
;
Goldinger 1996
,
Goldinger 1998
;
Pilotti et al. 2000
;
Sheffert 1998
), priming has been found to be greater for words repeated at test in the same voice as at study than for words repeated in a different voice, even though the encoding instructions focused listeners' attention on linguistic information (either phonetic or semantic). These findings indicate that, even with a limited exposure to a talker's voice and without instructions to remember either the words produced by that talker or his/her voice, young adults incidentally encode both forms of information in long-term memory. These findings, however, also suggest that young adults develop memory representations of spoken words in which each word is encoded with the voice characteristics of the talker who said that word (i.e., specific voices saying specific words), implying that any beneficial effect of perceptual information on word identification is dependent on previously encoded phonetic/lexical information.
Interestingly, the voice priming paradigm devised by Nygaard and associates (see
Nygaard and Pisoni 1998
;
Nygaard et al. 1994
) has provided evidence suggesting that voice characteristics encoded in long-term memory can also affect word identification independently of previously encoded phonetic/lexical information. In this paradigm, however, subjects are first exposed to a lengthy voice-encoding phase in which they are explicitly required to become familiar with the voices of a set of talkers (explicit or intentional encoding). Specifically, subjects are asked to learn the association between a set of visually presented talkers' names and their voices by listening to a large number of words spoken by these talkers. At test, subjects identify degraded words, half of which are spoken by the familiar talkers and half by novel talkers. In contrast to the word-priming paradigm, all words used in the identification test are different from those of the voice familiarization set. In this experimental context, novel words spoken by familiar talkers are generally identified more accurately than those spoken by unfamiliar talkers (
Nygaard et al. 1994
).
At first sight, these findings suggest that the encoding of voice information in long-term memory can be empirically separated into two qualitatively different stages. With a limited exposure to a talker's voice, young adults appear to develop incidental records of spoken words, in which the phonetic/lexical content and perceptual details of each word, although stored separately, remain linked together via associative connections (
Schacter and Church 1992
) or constitute a single memory unit stored in a context-sensitive word-recognition system (
Goldinger 1996
). Consequently, young subjects display higher identification for words repeated at test in the same voice as at study than for words repeated in a different voice (see word priming results). With considerable exposure to a talker's voice, young adults seem to be able to dissociate these two sources of information as to show a benefit in the identification of novel words that match the encoded perceptual details (as demonstrated by the voice priming results).
The namevoice association task used by
Nygaard and colleagues 1994
, however, required subjects to explicitly extract voice characteristics from the speech signals of the encoding phase. Therefore, the possibility exists that an encoding task focusing subjects' attention on voice information could lead subjects to attend to voice characteristics in the subsequent identification test, making the beneficial effect of voice familiarity on word identification observed by Nygaard and coworkers a carryover effect of the encoding task. A similar argument applies to the findings of a recent study conducted by
Yonan and Sommers 2000
. In this study, a lengthy voice-encoding phase with sentences spoken by several talkers was followed by an identification test with novel sentences masked by noise. During the voice-encoding phase, subjects were instructed to either become familiar with the voices of a set of talkers (explicit encoding) or focus on semantic information (i.e., the meaning of the last word; incidental encoding). At test, subjects were required to identify the final word of novel sentences, half of which were spoken by the familiar talkers, and half by novel talkers. Yonan and Sommers reported that explicit encoding yielded voice effects similar to those observed following incidental encoding. However, prior to the identification test, subjects in both encoding conditions were given a voice discrimination test with 160 sentences, half of which were spoken by the familiar talkers of the encoding phase and half by novel talkers. Interestingly, incidental encoding yielded considerably lower voice-discrimination scores than explicit encoding, even though it produced equivalent voice effects in the word identification test. Therefore, the possibility exists that the incidental encoding instructions might have yielded little retention of voice characteristics, with the mere length of the discrimination test giving subjects the opportunity to encode these characteristics in long-term memory. Of course, it might have also led subjects to attend to voice characteristics in the subsequent identification test, nullifying any effect that the encoding manipulation might have had on word identification.
| Older Adults: Research Findings |
|---|
|
|
|---|
How can these apparently contradictory findings be explained? Clearly, the uptake of sensory information is reduced and/or disrupted in old age as a result of hearing loss and/or other peripheral impairments (
Florentine et al. 1993
;
Konig 1957
;
Moore, Peters, and Glasberg 1992
;
Schneider 1997
;
Schneider and Pichora-Fuller 2000
). Consequently, the possibility exists that phonetic/lexical coding and voice processing, which are closely tied together in young adults, might be altered by age-related changes in the uptake of sensory information. Of course, a compromised sensory uptake provides the word recognition system with a degraded input, making word identification more difficult for elderly subjects (see
Schneider and Pichora-Fuller 2000
).
Schneider 1997
has suggested that a degraded input to the word recognition system leads elderly adults to devote cognitive resources to the recovery of the phonetic information lost at the periphery (topdown processing). Therefore, in encoding tasks promoting linguistic processing, the possibility exists that the diversion of resources to the recovery process, although necessary for word identification, might weaken the processes involved in encoding the attributes of a novel voice in long-term memory. Consequently, these encoding tasks should yield either no voice effects or voice effects smaller than those explicitly promoting voice processing (e.g., namevoice association and voice discrimination tasks) in elderly adults. The finding of
Yonan and Sommers 2000
, who reported voice effects in elderly subjects in the voice-priming paradigm, is consistent with this hypothesis if we assume that the effects of the incidental encoding condition were driven by the voice-discrimination test. Of course, the finding of
Schacter and associates 1994
, who reported that elderly adults did not yield voice effects in the word-priming paradigm with encoding instructions focusing subjects' attention on linguistic information, clearly supports this hypothesis. However, the finding of voice effects in elderly adults reported by
Sommers 1999
in the same paradigm is difficult to interpret with respect to this hypothesis because there were no instructions explicitly focusing elderly adults' attention on voice processing. Therefore, it is unclear whether the size of these effects would fluctuate with different encoding instructions as predicted by the assumption of age-related declines in voice processing.
In light of these unresolved issues, the first goal of the present study was to reexamine the effects of encoding instructions (intentional/explicit vs incidental) on the identification of novel words displaying either familiar or unfamiliar voice patterns. To this end, we exposed young and older subjects to an extensive voice familiarization (encoding) session to give both age groups the opportunity to develop a memory of each voice. In contrast to the
Yonan and Sommers 2000
voice familiarization session, which involved talkers of different gender, our participants heard only two male voices. This permitted us to avoid confounding memory of gender with memory of voice characteristics per se, and limit the cognitive load involved in processing several voices (
Mullennix and Pisoni 1990
).
In Experiment 1, the voice familiarization phase was administered under two instructional conditions (explicit and incidental) to assess whether the encoding of voice information is modulated by the type of instructions. In the explicit (E) encoding condition, subjects were to learn the association between a name and a voice as in Nygaard and colleagues' study (1994). Therefore, this condition was intended to promote the processing of voice information irrespective of the specific words spoken by each talker. In the incidental (I) encoding condition, subjects were asked to judge the clarity of enunciation of spoken words. Therefore, this conditionwhich required subjects to judge the quality of the phonetic content of each word with no explicit reference to voice characteristicswas assumed to focus subjects' attention on linguistic information. Voice characteristics, however, modified the acoustic patterns that listeners were to judge for clarity (
Ladefoged 1980
;
Peterson and Barney 1952
). Consequently, the processing of voice information was an integral aspect of the clarity-of-enunciation task, although it was unclear whether this processing would promote the same encoding and use of voice information as the namevoice association task.
After each encoding condition, listeners identified words masked by noise. Because we were interested in the long-term encoding of voice characteristics irrespective of word information, the words used in the identification test were all novel. Although novel, half of the words were spoken by one of the familiar talkers and half by a novel talker (both men). In this context, novel words exhibiting unfamiliar voice patterns served as a control condition. The findings of
Nygaard and coworkers 1994
and
Yonan and Sommers 2000
led us to expect that with explicit encoding instructions both young and older adults would identify with higher accuracy words spoken by the familiar talker (voice effects). Whether the effect of voice familiarity on word identification would be modulated by the encoding manipulation in either age group was a matter of empirical investigation. Of course, the question of interest here was not whether subjects develop a memory of a talker's voice after extensive exposure to that voice. It is obvious that subjects have knowledge about familiar voices in long-term memory, as illustrated by their ability to recognize the voice of a friend over the telephone. Rather, the question of interest was whether without prompting (i.e., attention to voice characteristics during encoding), subjects would develop a sufficiently detailed knowledge of a talker's voice that could be used to identify novel words spoken in that voice. We hypothesized that if mere exposure is sufficient to induce this form of processing in either age group, encoding instructions focusing subjects' attention on either voice or word information should yield equivalent voice effects at test. We also hypothesized that, if age-related declines in sensory uptake weaken older adults' ability to process voice information, age differences in voice effects should be observed at test primarily following encoding instructions promoting attention to linguistic information.
| Experiment 1 |
|---|
Because a major concern in studies comparing performance of young and older adults in auditory tests is the decreased uptake of sensory information in the aged, we collected pure-tone air-conduction thresholds for octave frequencies from 250 to 4000 Hz from both young and older adults. Average pure-tone air conduction thresholds, which provide a measure of participants' hearing acuity, and standard errors are reported in Fig. 1. These data were submitted to a mixed factorial analysis of variance (ANOVA) with frequency and age as factors. This analysis produced a main effect of age, F (1,94) = 274.22, MSe = 174.50, frequency, F (4,376) = 53.91, MSe = 43.40, and a reliable interaction, F (4,376) = 87.01, MSe = 43.40. Although young and older adults differed in auditory acuity at all the selected frequencies (p < .05), high-frequency information yielded the largest group differences (see Appendix, Note 1).
|
All the stimuli, recorded by six male talkers in a sound-attenuating booth, were digitized at a sampling rate of 20 kHz on an IBM-compatible computer using a 16-bit analog-to-digital converter equipped with anti-aliasing filters. The amplitude levels of all the stimuli were digitally equated to the same root mean square (RMS) using a software package specifically designed to modify speech waveforms. The stimuli were presented at 80 dB sound pressure level. Prior to experimental implementation, each word token, presented in the clear, was checked independently for mispronunciations and misarticulations by two raters. Tokens that were judged as containing any of these production errors were re-recorded. Approximately 10% of the word tokens of each talker needed to be replaced.
The 300 stimulus words were organized in three lists of 100 words each. Because the familiarization sessions of the explicit (E) and implicit (I) phases involved different tasks and talkers, the same list of 100 words was used for both familiarization sessions. The remaining two lists of 100 words were used in the test sessions. Lists were matched for frequency and familiarity. Both the familiarization and test lists included filler words, placed at the beginning of each list for subjects to practice. The word tokens of the familiarization sessions were presented in the clear, whereas the words of the test lists were masked by white noise. The noise was 5 dB louder than the signal (signal-to-noise ratio, S/N = 5).
The six talkers were first randomly assigned to two sets of three so as to assure that the talkers heard in the E phase were never heard in the I phase. A Latin square design was then used to assign talkers to the voice familiarization and test sessions so that in either the I or E phase, a subject would be first familiarized with two voices, one of which would be subsequently used at test along with a novel voice. This procedure produced three unique combinations of talkers (2 familiar voices and 1 unfamiliar voice) for each experimental phase. Therefore, because the voices used in one phase were never used in the other phase, each participant at test always heard one familiar voice and one unfamiliar voice. Forty-eight unique combinations of talkers, lists, and order were obtained by counterbalancing the test lists assigned to each phase, the familiar talkers selected for a given test list, the words of the test lists assigned to familiar and unfamiliar talkers, and the order in which each phase was administered.
Procedure and design.
The experiment was presented as an investigation of auditory perception consisting of a series of independent tasks. Participants were exposed to two experimental phases, explicit (E) and incidental (I), counterbalanced across subjects. Each phase involved a voice familiarization (encoding) session followed by a test session in which the task was to identify words masked by noise. The main difference between these two phases was the task of the familiarization session, which required either the explicit or the incidental encoding of voice information.
In the E phase, subjects were first familiarized with two voices by learning to associate a person's name with a voice (explicit encoding). Subjects were given 20 practice trials and 400 randomly presented trials, each involving a word spoken by one of two male talkers. To assure maximum exposure to the two voices, each talker spoke the same 100 words twice. On any given trial of this session, two names (John and Paul) appeared on the screen before subjects heard a word spoken by one of these two talkers. Subjects were asked to identify the talker who spoke that word by pressing one of two keys on the computer keyboard, which were labeled "Paul" and "John." Participants started the task by guessing, as no preexisting association existed between names and voices. Feedback on each trial provided the opportunity for learning the correct namevoice pairings, which was the prerequisite for entering the next phase of the experiment. Performance in the last 50 trials of the namevoice association task was near ceiling for both age groups (young adults: M = 97%, SD = 3; older adults: M = 96%; SD = 4; t (94) = 1.62, NS), indicating that subjects satisfied such a prerequisite. After the explicit voice-familiarization session, subjects were given the identification test, including 5 words for practice and 100 novel words, all masked by noise. Subjects were asked to identify each word and report their answers in a booklet containing 105 numbered blanks (test session). Half of the words were spoken by one of the familiar male talkers and the other half by a novel male talker.
In the I phase, subjects first became familiar with two male voices by performing a clarity rating task (incidental encoding). As in the other phase, subjects were given 20 practice trials at the beginning of the familiarization session and 400 randomly presented trials, each involving a word spoken by one of two male talkers (the same 100 words were spoken by both talkers twice). On any given trial, subjects, who heard a word spoken by one of these talkers, were asked to rate the clarity of enunciation of each word on a 7-point scale (from 1 = poorly enunciated to 7 = very well enunciated) by pressing the key on the computer keyboard that corresponded to their answer. As in the E phase, at test participants heard 5 words for practice and 100 novel words, all masked by noise. Subjects were asked to identify each word and report their answers in a booklet containing 105 numbered blanks. Half of the words were spoken by one of the familiar male talkers and half by a novel male talker.
The experiment lasted approximately 2.5 hours. To minimize fatigue effects, there was a 10-minute break between experimental phases and a 5-minute break between familiarization and test sessions within each phase. Subjects were tested individually in a sound-deadened testing room. The experiment involved a mixed factorial design with encoding task (explicit vs incidental) and test voice (familiar vs unfamiliar) as within-subjects factors. Age was the only between-subjects factor.
| Results and Discussion |
|---|
|
|
|---|
|
Interestingly, the explicit encoding of voice information yielded a similar pattern of facilitation for young and older adults (words spoken by a familiar talker minus words spoken by an unfamiliar talker: young: 7%; older: 6%, t (94) < 1). Of course, age differences emerged in the incidental encoding condition (7% vs 0; t (94) = 5.20). These findings indicate that although young adults can encode in long-term memory voice information and use it to promote on-line speech perception irrespective of the encoding task, older adults' ability to encode and use voice information is task dependent. Specifically, older adults appear to process the characteristics of a talker's voice only when their attention is explicitly directed to these characteristics.
In this experiment, there were age differences not only in voice effects, but also in overall identification performance. Specifically, older adults were less accurate in identifying words masked by noise across all the encoding conditions and test voices, F (1,94) = 61.89, MSe = 41.49, Eta-squared = .40. Because age differences in hearing acuity existed in our sample of participants, we examined whether hearing loss could account for these patterns of results. Hearing loss was defined as the average decrement in pure-tone sensitivity across all the selected frequencies relative to the normative value of 25 dB, which is the value that defines normal hearing in young adults. As seen in
Fig. 1, which displays average pure-tone thresholds for both young and older adults, the elderly adults of our sample exhibited hearing losses primarily in the high-frequency range. A 2 (familiar vs unfamiliar test voice) x 2 (explicit vs incidental encoding session) x 2 (young vs old) analysis of covariance was then conducted on the percentage correct identification scores with hearing loss as the covariate. In this analysis, there were no age differences in performance, F = 2.83, p = 1, indicating that age-related declines in pure-tone sensitivity accounted for the overall lower word-identification rate of the aged (see also
Yonan and Sommers 2000
). There was, however, an effect of test voice, F (1,93) = 62.06, MSe = 14.02, Eta-squared = .40, indicating that word identification benefited from voice familiarity. Test voice also interacted with encoding task, F (1,93) = 11.95, MSe = 14.95, Eta-squared = .11. Although this effect was primarily driven by the elderly adults' data, the interactions involving age and the other factors were either quite small or did not reach significance (age and test voice: F (1,93) = 4.15, MSe = 14.02, Eta-squared = .04, other Fs < 1). These findings indicate that hearing loss accounted for most of the age differences in voice effects uncovered in this experiment (see Appendix, Note 3).
How can hearing loss explain elderly adults' impaired ability to encode voice information incidentally and subsequently use it in on-line speech perception? Hearing loss, as indexed by pure-tone sensitivity, is a gross measure of the reduced uptake of sensory information in the aged. We have argued earlier that age-related declines in the uptake of sensory information reduce and/or disrupt the input to the word recognition system. We have also suggested that a reduced and/or disrupted input to this system may lead elderly adults to devote cognitive resources to the recovery of the phonetic information lost at the periphery (topdown processing). On the basis of these assumptions, we have proposed that the recovery process, although necessary for the activation of the appropriate lexical units in the word recognition system, may weaken the processing of voice information. Obviously, the clarity-of-enunciation task of the incidental condition requires that subjects evaluate the quality of the phonetic content of the stimulus material, whereas the namevoice association task of the explicit encoding condition focuses subjects' attention on voice characteristics. These different task requirements make word identification essential to the clarity-of-enunciation task, but irrelevant to the namevoice association task. Consequently, in the incidental encoding condition (clarity-of-enunciation task), where cognitive resources are devoted to word identification, it is reasonable to expect the recovery process to weaken the encoding of voice characteristics. The absence of voice effects in elderly adults following incidental encoding instructions supports this hypothesis. In contrast, in the explicit encoding condition, cognitive resources are specifically devoted to extracting voice information from the stimulus material and developing a distinct memory of each voice. Consequently, it is reasonable to expect voice information stored in long-term memory to facilitate the processes involved in extracting phonemic information from the degraded items of the word identification test. The voice effects observed in this condition support this hypothesis.
The overall lower word-identification rate of elderly adults in the explicit encoding condition also indicates that voice familiarity can weaken, but not eliminate, the effects of age-related declines in sensory uptake on word identification. Therefore, it is reasonable to assume that although voice information encoded in long-term memory helps older adults to disambiguate the degraded items of the identification test, it does not entirely compensate for age-related declines in sensory uptake.
| Experiment 2 |
|---|
If the encoding of voice information in older adults is dependent upon the type of analyses conducted on speech signals, an incidental encoding task that promotes a fine-grained analysis of speech signals should produce voice effects in older adults. However, if older adults encode perceptual information only under explicit encoding instructions, even a fine-grained analysis of the stimulus material would not produce voice effects in this subject group.
We tested these contrasting hypotheses in Experiment 2 by exposing older adults to two incidental encoding conditions. In one encoding condition (incidentalfine grained, IFG), a word spoken by two different talkers was presented on any given trial. The participants' task was to decide which of the two instances of the word was spoken more clearly (perceptual comparison task). It was thought that this comparison would promote a fine-grained analysis of the stimulus material (attention to voice information), even though the encoding task did not involve instructions to explicitly remember or attend to voice characteristics. In the other condition (incidentalphonetic, IP), subjects performed the clarity of enunciation task of Experiment 1 in which only one word was presented on any given trial. This condition was used to assess whether the findings of the incidental encoding condition of Experiment 1 could be replicated with another sample of older adults.
| Methods |
|---|
Stimuli and procedure.
The stimuli of this experiment were the words used in Experiment 1. There were two experimental phases: incidentalfine grained (IFG) and incidentalphonetic (IP). As in the earlier experiment, each phase, counterbalanced across subjects, involved an encoding task and a perceptual identification test with words masked by noise. In the IP phase, subjects were exposed to the voices used in the explicit condition of Experiment 1, whereas in the IFG phase, subjects were exposed to the voices of the incidental encoding condition of Experiment 1. This was done to assure that the specific voice patterns heard in the explicit encoding condition of Experiment 1 could not be held responsible for the voice effects observed in this experiment.
In the IFG encoding session, subjects were presented with 10 practice trials and 200 randomly presented trials, each including two instances of the same word, each spoken by a different talker. The subjects' task was to judge the clarity of enunciation of the two instances of a word and select the one that they judged to be spoken more clearly by pressing the key on the computer keyboard that corresponded to their answer. In the IP encoding session, subjects were presented with 20 practice trials and 400 randomly presented trials, each including a word spoken by one of two talkers. The participants' task was to rate each spoken word on a 7-point scale for clarity of enunciation. Both encoding tasks were followed by the identification test used in Experiment 1, in which half of the words were spoken by one of the familiar talkers and half by an unfamiliar talker.
| Results and Discussion |
|---|
|
|
|---|
|
| General Discussion |
|---|
Our findings are consistent with the results of word-priming studies (
Church and Schacter 1994
;
Goldinger 1996
;
Sheffert 1998
) reporting that young adults identify words spoken at test in the same voice as at study more accurately than words spoken in a different voice. They are also consistent with the results of voice-priming studies (see
Nygaard et al. 1994
) reporting that young adults identify novel words spoken by familiar talkers more accurately than words spoken by unknown talkers. Therefore, our findings corroborate the notion that young listeners encode talker-specific characteristics in long-term memory, and then incidentally retrieve these characteristics to identify novel or repeated phonetic patterns that match these characteristics. Interestingly, in our study, the beneficial effects of voice information on young adults' spoken word identification were independent of the encoding task. These findings support the notion that the encoding of voice characteristics is an automatic byproduct of speech perception in young adults.
Our findings also support the notion that aging involves a change not in the ability of older adults to encode voice characteristics, but in their ability to do so spontaneously (i.e., independently of the encoding task). Interestingly,
Schacter and coworkers 1994
found in the word-priming paradigm that older adults were insensitive to voice characteristics when the encoding task focused their attention on linguistic information. Similarly, we observed no beneficial effect of voice familiarity on word identification in the voice-priming paradigm when the encoding task did not focus older adults' attention, either directly (e.g., namevoice association task) or incidentally (e.g., perceptual comparison task), on perceptual information. These findings suggest that attention to voice characteristics fostered by the encoding task can promote the long-term encoding of voice information in the aged. Why would attention to voice characteristics be important for the encoding of this information in old age? We have proposed that age-related declines in the uptake of sensory information lead older adults to shift cognitive resources to the recovery of phonetic information, reducing their opportunities for encoding voice characteristics. Consequently, the encoding of voice information in older adults becomes dependent on tasks that promote either explicitly or indirectly the processing of voice characteristics. This account is consistent with the notion that the encoding of perceptual details becomes a cognitively effortful activity in old age (
Kausler and Puckett 1981
), and provides a reasonable explanation for our finding of task-dependent voice effects in elderly adults.
Interestingly,
Yonan and Sommers 2000
found that voice familiarity aided the identification of novel words displaying familiar voice patterns in both young and older adults, with explicit and incidental encoding conditions yielding equivalent voice effects. Their incidental encoding condition, however, was not truly incidental. Indeed, prior to the identification test, subjects were given a voice discrimination test with 160 sentences, half of which were spoken by the familiar talkers of the encoding session. We have proposed earlier that the discrimination task of this study may have given older adults the opportunity to encode voice characteristics in long-term memory, nullifying the effect of the encoding task manipulation. Our finding of task-dependent voice effects in elderly adults supports this account. However, the voice effects reported by
Sommers 1999
in the word-priming paradigm are difficult to interpret with respect to this account because there were no encoding conditions promoting voice processing in that study. Interestingly, in the same paradigm,
Pilotti and Beyer 2000
found that older adults' voice effects were dependent on attention to voice characteristics. Specifically, older adults exhibited voice effects only when they were familiarized with the talkers' voices, via a namevoice association task, prior to the encoding session, which involved the clarity-of-enunciation task of Experiments 1 and 2. Of course, this finding leads us to predict that the voice effects observed by
Sommers 1999
would fluctuate with encoding tasks promoting either voice or linguistic processing.
Although our findings indicate that older adults' compromised sensory uptake may affect the encoding of voice information, there was no evidence in our study that it also affected older adults' reliance on this information at test. If the effect of a compromised uptake of sensory information is to lead elderly adults to focus cognitive resources on phonetic/lexical processing, as we have hypothesized, why did elderly adults have no difficulty in processing voice patterns at test? There are two feasible explanations for this finding. First, voice information encoded in long-term memory facilitates phonetic/lexical coding; thus, it becomes quite useful to older adults, for whom word identification under difficult listening conditions is quite problematic (as demonstrated by the lower identification rate of our elderly subjects; see
Findlay and Denenberg 1977
;
Townsend and Bess 1980
). Therefore, it is reasonable to expect older adults to rely on voice information encoded in long-term memory for the identification of the degraded signals of the test session. Second, in older adults, attention to voice information via a fine-grained analysis of the stimulus material promotes the processing of this information (as demonstrated by the task-dependent voice effects observed in the elderly subjects). Therefore, it is reasonable to expect the difficulty of the identification test to enhance older adults' attention to the test items, promoting a fine-grained analysis of the stimulus material, and thus the processing of voice information at test.
It should be noted here that the notion of a compromised uptake of sensory information in the aged, which we have proposed to account for the task-dependent voice effects and the lower word-identification rate of the elderly subjects, was based on the finding of age-related hearing loss. Hearing loss, as measured by a pure-tone audiometric examination, however, is simply a gross measure of elderly adults' reduced and/or disrupted uptake of sensory information. Clearly, age-related slowing of processing (see
Stine, Wingfield, and Poon 1986
;
Wingfield, Poon, Lombardi, and Lowe 1985
), and declines in frequency, intensity, and temporal discrimination (
Florentine et al. 1993
;
Konig 1957
;
Moore et al. 1992
;
Schneider 1997
;
Schneider and Pichora-Fuller 2000
), albeit not measured here, may also compromise the uptake of sensory information in old age. Therefore, it is reasonable to assume that the age-related declines in the encoding of voice information and word identification reported here have multiple sources, and hearing loss is simply one indicator of elderly adults' compromised uptake of sensory information. Moreover, it should be noted that the age differences in overall test performance observed in our investigation were present despite the generally superior vocabulary scores of the older participants, and were attenuated, but not eliminated, by voice familiarity. This finding indicates that the superior verbal abilities of older subjects cannot serve as a compensatory mechanism for age-related declines in spoken word identification.
Interestingly, the findings of this study are consistent with models of implicit memory processes in which both phonetic information and nonlinguistic details are encoded and stored in long-term memory. The pre-semantic perceptual representation system (PRS) proposed by
Schacter and Church 1992
is one of these models. The PRS is assumed to be composed of cortically based subsystems devoted to the encoding and storage of the superficial properties of words such as their phonetic and perceptual context. With respect to spoken words, the PRS model postulates that phonetic information and voice characteristics are represented in separate subsystems. Associative connections between these subsystems represent the co-occurrences of phonetic patterns and voice characteristics in the stimulus material. The PRS model accounts for the voice effects on word identification observed in our study by assuming that the voice familiarization session produces a record of each talker's voice in the voice subsystem. At test, novel words spoken by one of the familiar talkers activate voice patterns preserved in memory, facilitating word identification under difficult listening conditions. The PRS model can also account for the finding that older adults' voice effects are task dependent by postulating age-related changes in the operations that govern the voice and the word subsystems. However, because in this model separate operations govern the encoding of voice and phonetic/lexical information, to account for our findings these operations must be assumed to depend on a common pool of cognitive resources. Given this assumption, the PRS is compatible with our proposal that age-related changes in the uptake of sensory information increase the resources that older adults devote to phonetic/lexical processing, weakening the encoding of voice information in long-term memory.
Our findings are also compatible with episodic memory models (
Goldinger 1996
,
Goldinger 1998
;
Hintzman 1986
; see also
Tenpenny 1995
), which postulate that each encounter with a spoken word creates a memory record including phonetic/lexical information and idiosyncratic perceptual attributes (e.g., voice information). Therefore, in these models, word identification is assumed to depend on a collection of perceptually specific lexical records stored in the word recognition system. Accordingly, the voice familiarization (encoding) session of our study would produce a large number of voice-specific records in which a talker's voice is represented by the collection of records that share the same voice patterns. These models account for the voice effects observed in this study by postulating that, at test, novel words displaying familiar voice patterns activate the voice-specific records of the encoding session, facilitating word identification. To be able to account for the task-dependent voice effects observed in this study's elderly adults, however, these models must assume that memory records are not mere analogues of incoming stimuli, but complex entities defined by both the physical forms of the stimuli and the operations that subjects perform on them (
Van Orden and Goldinger 1994
). Given this assumption, episodic models can account for elderly adults' task-dependent voice effects by postulating that age-related declines in the uptake of sensory information compromise the input to the word recognition system of elderly adults. We have proposed earlier that a weakened input to this system is likely to engage a recovery process. Therefore, it is reasonable to assume that this recovery process may lead older adults to discard idiosyncratic information in the speech signals. As a result, the memory records generated under encoding instructions that do not focus elderly adults' attention on voice characteristics may be less voice-specific than those of young adults. Of course, this account entertains the notion that prior to the experiment the perceptually specific lexical records that constitute the word recognition system of elderly adults may be also less voice-specific, further biasing elderly adults to discard idiosyncratic information in encoding sessions that promote linguistic processing.
In conclusion, our findings encourage researchers not only to refine existing models of implicit memory phenomena to account for age-related changes in perceptual processing, but also to study the specific environmental conditions (instructions) and peripheral factors that produce these changes. Our findings also alert researchers to the role that both peripheral and cognitive factors may play in determining the age-related declines in spoken word identification documented here and in numerous other studies of aging (see
Grady et al. 1984
;
Willott 1991
;
Working Group on Speech Understanding and Aging 1988
).
| Acknowledgments |
|---|
Received for publication February 7, 2000. Accepted for publication October 7, 2000.
| Appendix ENDIX |
|---|
| References |
|---|
| ||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|