Pupil Size Tracks the Effects of Global Context and Semantic Ambiguity on Word-Meaning Processing

Julieta Laurino; Laura Kaczer

doi:10.5334/joc.454

Introduction

Reading or listening to a word often involves a seemingly effortless access to its meaning. Yet, the vast majority of the words we use are in some way ambiguous (Rodd et al., 2002). Thus, the ability to select a contextually appropriate meaning is a critical component of language comprehension. In such cases, contextual cues —both local and global— play a crucial role in resolving ambiguity and promoting access to the intended meaning. Most research has focused on the function of the local context, which encompasses sentence-level information, grammatical structures, and immediate semantic relationships surrounding the ambiguous word. Studies have consistently demonstrated the facilitatory effects of local context on lexical processing through behavioral measures, such as reduced reading times (Brothers & Kuperberg, 2021; Schustack et al., 1987) and naming latencies (Hess et al., 1995), as well as faster reaction times and increased accuracy in lexical decision tasks (e.g., De Groot, 1985). In addition, electrophysiological markers such as the N400 show a reduced amplitude for contextually facilitated words indicating smoother integration of meaning (Kutas & Hillyard, 1980; Szewczyk & Federmeier, 2022).

Global context, on the other hand, refers to broader thematic or situational frameworks across extended discourse. For instance, the word “bat” might evoke a baseball bat in the context of a sports article or a flying mammal in the context of a wildlife feature. It has been shown that the presence of a global discourse context facilitates processing for globally related concepts (Albrecht & O’Brien, 1993; Carter & Hoffman, 2024; Hess et al., 1995; Schwanenflugel & White, 1991; Vu et al., 2000). In a key study by Hess et al. (1995), participants heard short passages consisting of three to four sentences that served as context, followed by a visually presented target word that they were asked to name. The researchers manipulated the relationship between the target word and two types of context independently: the local context, defined as the final sentence fragment (e.g., “…the English/computer science major wrote the POEM”), and the global context, which referred to the broader situation or scenario described in the preceding sentences (e.g., “The English major met a woman who he was very fond of. He had admired her for a while but wasn’t sure how to express himself. He always got nervous when trying to express himself verbally so …”). Their findings revealed across several experiments that global context facilitated naming times of the target word (e.g., ‘POEM’), whereas local context had no facilitative effect unless it was consistent with the global context.

While it is clear that global context shapes word-meaning access, its precise neurocognitive mechanisms in guiding lexical processing remain less well understood. In terms of neuroimaging, there is a vast body of electrophysiological research focusing on processing of local context in single sentences, whereas fewer studies have used contexts that encompass longer passages beyond a single sentence (Berkum et al., 1999; Boudewyn et al., 2015; Camblin et al., 2007; Federmeier & Kutas, 1999; Helder et al., 2020). Primarily, these studies demonstrate global context effects on the N400, (Kutas & Hillyard, 1980). For instance, in Camblin et al. (2007), participants read sentences (e.g., “Lynn couldn’t stop scratching her arms and LEGS”) where the critical word (‘LEGS’) was either congruent with a preceding passage (e.g., “Lynn had gotten a sunburn at the beach. Nothing she tried would help her dry and irritated skin.”) or incongruent (e.g., “Lynn’s wool sweater was uncomfortable and itchy. She fidgeted as the rough material irritated her skin.”). They found that the congruence of discourse contexts had early and lingering effects on ERP, showing a reduction in N400 amplitudes to target words for congruent conditions.

However, in most of the studies discussed above, it is difficult to completely disentangle local from global influences. This is because the target word is typically presented within a sentence, and although this local sentence is often a controlled or manipulated factor, it remains difficult to fully eliminate the influence of local grammatical and lexical constraints. Additionally, the distinction between what constitutes a local and global context varies across studies, adding further complexity. To address these issues, the present study focuses on analyzing isolated ambiguous words, which lack local contextual information beyond their inherent lexical properties. In addition, we introduced the global context as a separate linguistic component in a different task, preventing it from serving as a local influence. This experimental approach allows us to specifically examine the effects of semantic context on word processing. In particular, we propose an approach that remains relatively underexplored: investigating whether the facilitation effects of global context are linked to a reduction in the neurocognitive demands associated with word processing.

Finally, a number of studies have explored the interaction between contextual constraints and lexical properties (e.g., Rayner et al., 2004; Schwanenflugel & Stowe, 1989). Some of these studies suggest an increased top-down semantic influence, often referred to as semantic control, for less informative word inputs (Hoffman, 2016). Notably, it is proposed that semantic cognition relies on the interaction between the semantic control system and a semantic representation system, which encodes multimodal representations of concepts (Ralph et al., 2017). In this framework, our study looks into the interaction between these two systems, investigating whether global context effects on pupil size are greater for more ambiguous inputs.

Pupillometry in language research

Pupillometry, which measures the dilation of an individual’s pupil, has long been used as an indicator of brain activity in response to shifts in cognitive load or attentional demands (Beatty, 1982; Kahneman & Beatty, 1966; Sirois & Brisson, 2014). Pupil size was found to be more sensitive and more specific to cognitive effort than behavioral measures (Geller et al., 2016; Kuchinsky et al., 2013; Papesh & Goldinger, 2012; Zekveld et al., 2010). Importantly, the pupillary response is recorded throughout the trial, thereby allowing a more fine-grained, temporal analysis of task-evoked effort that is thought to be a purer index of cognitive effort (Geller et al., 2019). Evidence from electrophysiological and neuroimaging studies suggest that pupil dilation in response to cognitive activity is under the physiological control of the locus coeruleus, the primary source of norepinephrine to the neocortex (e.g., Gilzenrat et al., 2010; Joshi et al., 2016; Murphy et al., 2014). Importantly, norepinephrine has been involved in the modulation of many cognitive functions including memory consolidation and retrieval, working memory, attention, and decision-making (see Sara, 2009, for a review).

Pupillometry has become increasingly used in cognitive language research (Rojas et al., 2024), providing objective and real-time information about the cognitive effort when recognizing, inhibiting, and selecting words (Chapman & Hallowell, 2015; Geller et al., 2016; Guasch et al., 2017; Haro, Guasch, et al., 2017; Haro et al., 2023; Kuchinke et al., 2007; McLaughlin et al., 2022). For example, Kuchinke et al. (2007) recorded pupillary responses while participants performed a lexical decision task (LDT) and found that low-frequency words elicited greater peak pupil dilations and longer reaction times. Interestingly, a few studies have examined the role of context on pupil size during speech comprehension (e.g., Engelhardt et al., 2010; Winn et al., 2015). For instance, in a study of Winn (2016), participants listened to single sentences while the pupillary response was recorded, with varying degrees of context congruence. It was found that semantic context led to rapid reduction of listening effort, suggesting an increased efficiency of ongoing predictive language comprehension.

Word ambiguity to explore context-to-word interactions

The representation of word meaning in psycholinguistic literature traditionally categorizes words into discrete groups (Cevoli et al., 2021). In this sense, words are deemed unambiguous when there is a one-to-one relationship between the word form and meaning (Lyons, 1981). In contrast, ambiguous words, defined by a one-to-many relationship between form and meaning, are further classified based on the degree of relatedness between their meanings: polysemous words have highly related meanings, while homonymous words have meanings with little to no relation. For example, the word bark can refer to the sound of a dog or the outer layer of a tree, making it a homonym. Importantly, these meanings can either be balanced in frequency or biased, where one meaning is much more common (i.e., dominant) than the other. Also, many accounts propose that word ambiguity lies in a continuum, and that these discrete categories are a useful simplification (Klepousniotou & Baum, 2007). Some even propose that word meanings themselves are extremely context-dependent and continuous, lacking discrete boundaries between them (Trott & Bergen, 2023).

Semantically ambiguous words challenge language comprehension, particularly when listeners must select a less frequent (subordinate) meaning at disambiguation. For biased ambiguous words, dominant meanings are typically accessed more quickly than subordinate meanings (e.g., Rayner & Duffy, 1986; Simpson & Burgess, 1985). Recent studies have shown that preferences for different meanings of ambiguous words are not fixed but rather adaptable based on previous usage information. For instance, Rodd et al. (2013) demonstrated that exposure to sentences that include ambiguous words in a biasing context toward one of their meanings (e.g., “The bark was very loud”) led participants to produce more associated words aligned with that specific meaning. The authors explain these results in terms of an episodic context account where the biasing sentences generate a contextually bound memory representation that aids future comprehension. They further suggest that these episodic representations contribute to the maintenance of situation models (Kintsch, 1988) which facilitate discourse comprehension over longer periods (Curtis et al., 2022; Gaskell et al., 2019).

Challenges in language comprehension arising from semantic ambiguity have also been explored using pupillometry. In particular, Haro et al. (2023) found that ambiguous words led to greater pupillary responses in a number-of-meanings task compared to unambiguous words. In addition, recent results show that pupils dilate more when listening to sentences that include ambiguous words (Kadem et al., 2020).

From these, we suggest that ambiguous words offer a valuable opportunity to explore context-to-word interactions, as the effect of the context can be observed through shifts in meaning access, highlighting how semantic context guides word comprehension.

This study

In our study, we will address if semantic context reduces the cognitive demands of word-meaning processing. Specifically, we will explore whether and how a global context—i.e., the thematic content of a visually presented short text—can influence the subsequent meaning processing of single words. By using behavioral and pupillometry measurements, our goal is to discern the contributions of word-context congruency and semantic ambiguity on the cognitive load during word-meaning access. To this end, we will employ two complementary tasks with distinct semantic engagement: a word-association task (Experiment 1) and a semantic relatedness task (Experiment 2). Our specific hypotheses are the following: i) pupillary response is modulated by context-to-word matching, such that a smaller pupil size is expected when the context matches the word’s meaning, indicating reduced cognitive load. In addition, this effect will be more pronounced in more engaging semantic tasks; ii) there is a differential effect of contextual modulation on pupil dilation according to the target’s word ambiguity, with more ambiguous words presenting an increased effect of context with respect to non-ambiguous words. This work makes a novel contribution to the literature by examining the specific effect of global semantic context on cognitive demands, as measured by pupillometry, which provides a fine-grained measure of the temporal dynamics of cognitive effort during lexical processing.

Experiment 1

Methods

Participants

Participants were recruited via mail and laboratory’s social media pages (Twitter, Facebook and Instagram) and signed an informed consent approved by the Ethical Committee of the ‘Instituto de Investigaciones Médicas Alfredo Lanari’. They were all native Spanish speakers with normal or corrected-to-normal vision. Participants received financial compensation for their participation.

Sample size is based on previous literature using the same methodology for the study of word-meaning processing (Guasch et al., 2017; Haro et al., 2023; Kuchinke et al., 2007).

Twenty-five participants took part in the experiment, and two were excluded due to not following the instructions of the experimental task (they wrote the same word presented in the word-association task as response). The final sample consisted of 23 participants (9 females, 13 males and 1 unspecified gender; M ± SD age in years = 26.1 ± 5.96).

Materials

Target words

We used 64 Spanish ambiguous words as targets. Ambiguous target words were nouns that belonged to 16 predefined semantic categories (e.g., music, economics, soccer, animals, etc.) (see Appendix A for a full stimuli list and Appendix B for the lexical properties in each experiment). The selection of these target words followed a two-step process. First, we identified words that were either part of the available semantic ambiguity corpora (Fraga et al., 2017; Haro, Ferré, et al., 2017; López Cortés & Horno Chéliz, 2021; Monzó, 1991) or met the criteria of being “potentially ambiguous” defined by Haro et al. (2017): having more than one dictionary entry or more than five dictionary senses in the Spanish Language Dictionary of the Real Academia Española (RAE, 2014).

After identifying a pool of candidate ambiguous words, our second selection criteria was to include ambiguous words that were not strongly biased toward a single dominant meaning, thereby allowing the global context to play a more influential role in meaning selection. To estimate meaning frequency, we followed established methodologies that rely on word-association data (Gilbert & Rodd, 2022; Rice et al., 2019; Twilley et al., 1994) and used on data from the Small World of Words for Rioplatense Spanish (SWOW-RP) database (Cabana et al., 2024). This choice ensured that our estimates were derived from a population of language users closely matching that of our experimental participants. For each ambiguous target word, we manually coded the word-association responses to identify those consistent with the category-relevant meaning. For example, the target word ‘note’ belonged to the music category, and therefore responses such as ‘song’ or ‘listen’ were coded as ‘music-related’ whereas responses such as ‘pen’ or ‘write’ were coded as ‘music-unrelated’. Meaning frequency was thus operationalized as the proportion of responses in SWOW aligned with the semantic category of interest. Across our stimuli, the relative frequency of the category-related meaning averaged 0.35 (SD = 0.17), with values ranging from 0.10 to 0.70. This range ensured that both meanings were sufficiently accessible, allowing for meaningful effects of contextual disambiguation.

Additionally, 32 filler words were selected so that the rationale of the experiment would not be evident to participants. Consequently, these words did not fall into any of the previously defined 16 semantic categories, were not ambiguous and their lexical properties were comparable to those of the target words.

Semantic contexts

Sixty-four contexts were used, which consisted of a text paired with an image. Texts and images for these contexts were extracted from Wikipedia articles corresponding to each category (freely available on the OSF). These contexts were representative of the same 16 semantic categories as target words, with four contexts per category. Length ranged from 30 to 35 words (M = 33.41, SD = 1.60). Crucially, neither the text nor the image contained the target word itself.

Sixteen additional contexts were selected to act as fillers. These were paired with filler words and were representative of another eight categories. Length also ranged from 30 to 35 words (M = 33.35, SD = 1.66).

Procedure

Participants were tested in a medium-illuminated, quiet room. They sat with their head on a chinrest with forehead support, with a distance of 50 cm between their eyes and a 23.8″ computer screen with a resolution of 1920 × 1080 pixels. Participants wore headphones during the whole experiment. Pupil area and eye movements were recorded continuously from both eyes during the task using an eye tracking device, EyeLink 1K (SR Research Ltd.), with a sampling rate of 1000 Hz. PsychoPy software (Peirce et al., 2019) was used to present the stimuli for the experiment. Prior to the experimental trials, a nine-point eye-tracker calibration was conducted.

Experimental trials consisted of the presentation of the target word in a word-association task that was preceded by a global context. The trials were grouped in blocks, such that two word-association tasks shared the same global context. This way, the duration of the experiment was reduced, controlling for the fatigue that could affect pupil size measures. Each block thus included the presentation of a global context, followed by two word-association tasks (see Figure 1 for an illustrative example).

**Experiment 1 example block.** Each global context is followed by two word-associations: one corresponding to the matched condition (e.g., ‘note’) and one to the unmatched condition (e.g., ‘bank’). All words are presented in pseudorandom order across blocks.

The global context consisted of the presentation of a short text paired with an image. In order to maintain participants’ attention, the screen had a gamified design, and the text had a missing word that participants needed to complete, earning 1 point for each correct response. Participants had 30 seconds to select the correct option among four, and were given feedback with a tick or a cross appearing on the screen immediately after their response.

In the word-association task, a target or filler word was presented and participants were asked to type in the first word that came to mind upon reading the word. They were instructed to try to avoid giving phrases or sentences as responses. Words were presented in auditory format, contingent with a fixation cross that appeared on screen during baseline and word presentation periods (see Blott et al., 2022, for an auditory word-association task). First, we decided to use auditory and not written stimuli so that the baseline was isoluminant to when the stimuli were shown and, in that way, we controlled for pupil changes derived from luminance changes (Hutton, 2019). Second, small and central stimuli (i.e., a fixation cross) are recommended to avoid the pupil foreshortening effect, which is an apparent change in pupil area that happens when eyes rotate (Hayes & Petrov, 2016). For the auditory stimuli, we used a male Argentinian Spanish neural voice created with Texvoz (www.texvoz.com), a human-like text-to-speech voice generator. The audio of the ambiguous words had a mean duration of 431 ms ± 100 SD (Range = 253–675 ms). A fixation cross (black on gray [#808080 Hex] background) appeared on screen 3000 ms before the target or filler word was aurally presented (baseline period) and remained for a total of 4000 ms since the word onset. After the fixation cross disappeared, participants were asked to type their response. The mean duration of the experiment was 32 min.

The experiment began with a single practice block, after which we checked that participants could correctly hear the auditory stimuli. Upon concluding the experiment, participants were required to fill out a final questionnaire. In this questionnaire, they were asked to report their levels of concentration, any inconveniences they may have experienced during the experiment, whether they had noticed any relationship between the two tasks (contexts and word association task), and their attraction/interest towards each of the presented categories. Note that results for the individual category interest are not presented, as they were not the main focus of the present study.

Design: The experiment used a 1 × 2 within-subjects design and included 96 trials, and 48 blocks. In 32 of the blocks, the context was followed by two ambiguous words presented sequentially: one in the matched context condition and the other in the unmatched context condition (pseudorandom order so that in half of the blocks the matched condition appeared first and vice versa). In the matched context condition, the semantic category of the context matched the semantic category of the word (for example, the ambiguous word ‘note’ was preceded by a music-related context). In the unmatched context condition, the category of the context did not match the semantic category of the word. To counterbalance these conditions, the categories were randomly divided into two lists: half of the categories (32 contexts and 32 ambiguous words) appeared in the matched context condition in one list, while the other half appeared in the matched context condition in the other list. Thus, for each participant, an ambiguous word could fall into one of the two context conditions (matched or unmatched). In the 16 remaining blocks, the context was followed by two filler words. The same 32 filler words appeared in both lists and were always preceded by contexts that did not match their semantic category.

Online Validation Experiment

An online validation experiment (details and data available in the “Validation experiment” folder on the OSF) was performed to evaluate whether our protocol and stimuli could reliably evidence global context effects on a behavioral task with a large sample of participants (n = 78). The results of this experiment show that the global context—comprising a short text and an image with specific thematic content—preceding an ambiguous word influences meaning preferences in a word-association task. Specifically, the global context biased participants’ associations toward the context-related meaning of the ambiguous word (e.g., following a music-related context, the ambiguous word ‘note’ elicited more associations consistent with its musical meaning). Based on these results, we proceeded to investigate our research questions using pupillometry with the same stimuli and a similar in-lab procedure.

Data processing and statistical analysis

Data were analyzed within R (v4.4.0; R Core Team 2024). All models included participants and words as random effects. The structure of random effects was determined by comparing each model’s Akaike’s Information Criterion and with a Likelihood-Ratio Test, using a forward best-path approach (Barr et al., 2013). The significance of the fixed effect was also determined with a Likelihood-Ratio Test comparing the model that includes the factor with a model dropping that factor. The unmatched condition was treated as the reference level and parameters were estimated for the matched condition.

Behavior

In the word-association task, responses for target words in each context condition were coded as 1 if they were consistent with the meaning related to the ambiguous word category and 0 if they were not. For example, the target word ‘note’ was classified under the music category, so the response ‘sing’ was coded as 1, while ‘paper’ was coded as 0. Notably, we did not further categorize inconsistent responses to distinguish whether they reflected an alternative meaning or were unrelated, as our main focus was on the modulation of context-matching associations. The proportion of consistent responses in each context condition will be referred to as consistency score.

Consistency data was analyzed with the glmer function from the lme4 package (v1.1.35.3; Bates et al., 2015) to perform a generalized linear mixed model (GLMM) using a binomial family and the logit link, including context as fixed effect (within-participants levels: unmatched and matched). A second model was performed to examine if the effect of the context depended on the relative meaning frequency. This second model included context, relative meaning frequency (between-words quantitative factor), and their interaction as fixed effects. Full specifications of the random effects structures for behavioral models are provided in Appendix C.

Pupillometry

For pupil data, the pre-processing guidelines described in Mathôt & Vilotijević (2022) were followed. Data for the left eye is shown here because it better suited a polynomial shape that was used in the Growth Curve Analysis (explained below), but the same pattern of results is observed for both eyes. First, blinks were detected and interpolated using the ‘advanced’ blink-reconstruction mode as implemented in the DataMatrix Python library (v1.0.13; Mathôt, 2024). This algorithm first tries to perform cubic-spline interpolation and, if not possible, performs a linear interpolation. Next, data were downsampled by a factor of 10, from 1000 Hz to 100 Hz. After downsampling, each trial time course was baseline-corrected by subtracting the mean pupil size from the 500 ms before the presentation of the word. Two trial exclusion criteria were applied for the pupil size analysis (note that these trials were not excluded from the word-association analysis). First, a trial was excluded if it contained 40% or more NaN entries from 500 ms prior to word presentation to 4000 ms after word onset (fixation cross disappeared). In total, 19 trials were excluded based on this criterion (1.35% of the trials). Second, we excluded trials with extreme baseline pupil sizes. For each participant separately, baseline pupil sizes were converted to z-scores, and trials where the z-scored baseline pupil size was larger than 2 or smaller than –2 were excluded. This second criterion excluded 63 more trials (4.47% of the total trials). The final number of trials was 1326, with 661 trials in the matched condition and 665 trials in the unmatched condition.

Data analysis for pupil data was performed with the lme4 R package by implementing a Growth curve analysis (GCA). Unlike traditional approaches that use singular value methods such as the mean and peak amplitude, GCA enables a temporal analysis of pupil dilation (Geller et al., 2016; Kuchinsky et al., 2013; McLaughlin et al., 2022; 2024; Winn et al., 2015). It consists of a mixed-effects modeling in which a functional form can be defined to describe non-linear effects of time. In pupillometry, higher-order polynomials are typically a good option to use as a functional form for the curve of a task-evoked pupil response (Mirman, 2014). Therefore, we decided to use GCA because it allows time to be treated as a continuous variable instead of using a time window average, and because it allows the explicit modeling of variations across conditions, individuals, and stimuli (Hershman et al., 2023; McLaughlin et al., 2023).

The time window selected for the GCA analysis was from 500 ms to 3000 ms after the word onset. This time window was selected taking into account the delay of the pupil response and to better suit a polynomial shape while conserving the pupil response curve. Importantly, the time window was decided before conducting the analysis of the effects of interest (Peelle & Van Engen, 2021). The GCA model included fourth-order orthogonal polynomials, context (within-participants levels: unmatched and matched), and their interactions as fixed effects. In this model, the fixed effect of context will determine whether there are differences in overall magnitude between levels (i.e., shifting the curve vertically). On the other hand, interactions between the condition and each polynomial parameter will determine whether the shape of the pupil response differs by condition: the steepness of the slope for the linear component and the sharpness of the peaks for the quadratic, cubic, and quartic components.

Results

In this first experiment, we asked if global contextual effects have an associated pupillary response, such that a smaller pupil size is expected when the context matches the word’s meaning, indicating reduced cognitive load. For this purpose, the influence of a global context on the subsequent word-meaning preference was evaluated using a word-association task while recording pupillary measures.

The behavioral results are depicted in Figure 2, showing the mean consistency score for each condition. The unmatched condition presented a consistency score of 0.39. As predicted, this score rose to 0.52 when a thematically congruent context was presented before the target word, that is, in the matched condition. Statistical analysis confirmed that the consistency score difference between the unmatched and the matched context conditions was significant [𝛽 = 0.65, SE = 0.12; χ2 (1) = 29.48, p < .001] (see Appendix C for a summary of the behavioral statistical models and results.)

**Experiment 1. Consistency scores of the unmatched and matched context conditions.** Each bar provides the mean (±*SEM*) proportion of associate responses consistent with the context-related meaning across each condition. Points represent the mean consistency for each cue.***p < .0001.

In addition, we hypothesized that the effect of semantic context depended on the meaning frequency, such that more modulation would occur for less frequent meanings. However, we found no significant interactions between context and meaning frequency [𝛽 = –0.22, SE = 0.76; χ² (1) = 0.08, p = .78]. Thus, even for words with a skewed meaning frequency, the context is able to shift the associated responses increasing the consistency with the context-related meaning.

As for the pupillometry analysis, the linear, quadratic, cubic, and quartic polynomials all significantly improved fit (all p < .001). As shown in Table 1, including the context condition term on the intercept, linear term, and cubic term improved model fit.

Table 1

Experiment 1. Log-likelihood model comparisons for growth curve analysis.

EFFECT	χ2	Df	p
Linear polynomial	9673.6	1	<.001***
Quadratic polynomial	19097	1	<.001***
Cubic polynomial	397.32	1	<.001***
Quartic polynomial	313.74	1	<.001***
Context (levels: unmatched, matched)	65.27	1	<.001***
Context × Linear polynomial	34.99	1	<.001***
Context × Quadratic polynomial	1.10	1	.29
Context × Cubic polynomial	30.44	1	<.001***
Context × Quartic polynomial	.86	1	.35

[i] Note. All models included random intercepts for participants and words. Random slopes for the linear, quadratic, and cubic terms were included by participants, and random slopes for the linear, quadratic, cubic, and quartic terms were included by words.

The examination of the dummy-coded main effect of context condition (reference level: unmatched) indicated that the matched condition presented a smaller mean pupil size term (𝛽 = –1.55, SE = 0.19, p < .001), with a more positive slope term (β = 17.92, SE = 3.03, p < .001), and sharper inflection term (𝛽 = 16.73, SE = 3.03, p < .001), compared to the unmatched condition (Figure 3).

**Experiment 1. Pupil diameter (EyeLink arbitrary units) over time for the unmatched (red) and matched (green) context conditions.** Points represent the raw data means with standard errors, and GCA model fit is overlaid with solid lines. Auditory word onset time is represented as a solid vertical line.

Thus, it is observed that pupil diameter increases after the presentation of an isolated ambiguous word (auditorily) reaching a peak at around 1800 ms, following a polynomial trajectory. Noteworthy, this increase is modulated by the semantic congruence between previous context and a target word’s meaning. We show that a matched condition results in a smaller overall peak and a different shape of the pupil response curve, compared to an unmatched condition. From these results, it is suggested that global context reduces the neurocognitive demands associated with the processing of an ambiguous word-meaning.

Supplemental check measures

Three additional models were performed to check if the context effect depended on: (1) the accuracy of the responses during the global context presentation (i.e., a gamified format was used where participants were required to select a missing word); (2) participants’ awareness of the relationship between the two tasks (as informed during the final questionnaire); (3) the position at which the target word appears (first or second). The first model included context, missing word accuracy (within-participants levels: 0 or 1), and their interaction as fixed factors. The second model included context, awareness (between-participants levels: 0 or 1), and their interaction as fixed factors. The third model included context, target word position (within-words levels: first or second), and their interaction as fixed factors.

For the missing word, 59% of the responses were accurate (above chance level of 25%). The analyses of this factor revealed no significant interactions between context and missing word accuracy [𝛽 = –0.05, SE = 0.27; χ² (1) = 0.04, p = .84]. Regarding task awareness, 74% of the participants reported they had noticed the relationship between the two tasks. However, we also found no significant interaction between context and awareness [𝛽 = 0.35, SE = 0.33; χ² (1) = 1.07, p = .30]. Finally, the interaction between context and target word position was also not significant [𝛽 = –0.33, SE = 0.24; χ²(1) = 1.89, p = .17], indicating that the effect of context did not vary depending on the position at which the target word was presented.

These results indicate that responding incorrectly to the missing word, being unaware of the relationship between the tasks or presenting the target word in the second position does not eliminate or diminish the context modulation on meaning selection.

Discussion

In Experiment 1 we demonstrated global context effects (i.e., a short text and image with specific thematic content) on a word-association task. When the theme of the global context matched one of the meanings of the biased target word (e.g., reading a text related to music and then reading the target word ‘note’), we obtained 13% more responses consistent with the context-related meaning (Figure 2), compared to when they did not match. A similar result was also obtained in the validation experiment, performed online and in a visual modality. These results suggest that word associations can reliably capture meaning shifts according to a previous global semantic context. Importantly, the consistency scores obtained in the unmatched condition are similar to the data from SWOW-RP, a large-scale word-association corpus in Rioplatense Spanish (Cabana et al., 2024), that did not contain any contextual information prior to the cue word. This rules out possible nonspecific effects of the global context on the word-association task, suggesting that the unmatched condition serves as a reliable baseline for our experiments.

Our approach aligns with studies that showed that the interpretation of an isolated ambiguous word in a word association task can be biased by a single encounter with a sentence that includes the ambiguous word in a disambiguating context (Curtis et al., 2022; Rodd et al., 2013). In addition, these studies also observed a biasing effect when using sentences that did not contain the ambiguous word itself but rather a semantically similar word. However, it is worth noting an important methodological difference with our experiments. While the previously mentioned studies use sentences that favor a certain meaning and include the target word itself or a semantically similar word, our experiments use contexts that are broadly thematically related to the target words. We believe that this better reflects the effect of the general thematic of the context, namely the global context, in comparison to the specificity of biasing a particular meaning.

Importantly, we combined the word-association task with a pupillometry measurement. Pupillometric measures, unlike behavioral measures, are time-series data that can track the elevation of effort during a language processing event such as a word presentation and a subsequent verbal response. Our results show that the matched condition presents an overall decreased peak of the pupil response curve, compared to an unmatched condition (Figure 3). From this, it is suggested that global context mitigate the cognitive effort required to process ambiguous words. We consider that this result is particularly striking, given the open-ended nature of the task and despite the absence of explicit instructions of semantic engagement.

Regarding the dynamics of the pupil response (Figure 3), we observe an early divergence between conditions starting around 500 ms and a later re-emergence of the effect around 1500 ms. This pattern likely reflects distinct components of word processing, such as early word recognition and later semantic activation. Therefore, the early divergence may indicate facilitated recognition when the global context supports the intended meaning, while the later effect may correspond to more efficient meaning access or semantic integration for context-matching interpretations. We will expand on this interpretation in the General discussion.

To address whether there is a differential effect of context according to the target’s word ambiguity, the next experiment will also include a set of non-ambiguous words. Moreover, we will use a semantic relatedness task to assess word processing. The rationale for this decision is that the word-association task imposes some practical limitations, as its relatively slow nature hampers the correlation between pupillometry and behavioral measurements. In addition, this task does not force participants to choose a specific word-meaning and it is unsuitable for addressing changes in meaning access when words are not ambiguous. We suggest that a task that involves a deeper semantic processing would elicit larger semantic context effects in the pupillary response (see Haro et al., 2023 for a comparison of different tasks regarding semantic engagement). Thus, in the following experiment, we use a semantic relatedness task that encourages participants to give a fast and correct response, increasing the sensibility of the experimental measurement and reducing response bias.