Research
The invisible and implicit nature of bias calls for training in new methods.
This project's focus is on the kind of bias that creates unequal
i
mpacts on human beings; the kind of bias that creates systematic advantages and disadvantages
for individuals or groups of people. It is important to note, that bias is not inherent in a data set itself. Bias will depend on context and direction, and can emerge in how people use data processing/decision-making
tools.
Trainees will be encouraged to join existing research efforts with core and participating faculty, or they may choose to develop their own project collaborating with a partner from a different field.
Ongoing Research Topics
(See References Cited at the end of this list)
Co-PI Wei Zhu, statistician and data scientist, has spent 25+ years developing new statistical procedures and machine learning methods to help reduce bias in data and analysis. More recently, she has developed realistic and robust methods for regression analysis with errors in regressors [14], random forests for classifying unbalanced data [27] [18], optimal clustering algorithms [35] and deep reinforcement learning with multiple objectives [10]. Zhu is also an expert in stochastic carcinogenesis modeling [33] [34] [36]. She has discovered that opposite models and conclusions can be drawn based on the same data sets, and furthermore, papers from the same research group can feature contradictory results without the authors realizing such internal conflict.
Trainees will explore how the analyses applied to data sets can be misused, but how such analyses can be audited and corrected. As a tool, AI wields enormous transformative power, as demonstrated by AlphaFold [25], which solved a 50-year-old grand challenge in biology on folding proteins. Foundational training in how to avoid, detect, and correct bias is essential for wielding such powerful tools responsibly.
Co-PI Bonita London’s research examines processes associated with social identity threat and the consequences of this threat for academic engagement, performance, and well-being, particularly among members of historically marginalized groups. Her research shows that social identity threats contribute to gender and racial disparities in education and career advancement and success, career decision-making, professional networking and relationships, and mental health and well-being [1]. In studies ranging from experimental lab studies that test specific coping strategies in response to social identity threat, to experience-sampling methods (ESM) studies exploring the day-to-day lived experiences of students navigating institutions where biases are experienced, London has identified core factors that promote versus undermine engagement and success among traditionally underrepresented students [5], including those with intersectional identities (e.g., for a person of color who is also a woman or a gender minority).
Trainees, including those who identify already as data scientists, will learn about social identity threat and how it affects who is welcomed and who thrives (whether in academia or in the STEM workforce) and incorporate relevant variables into research designs about bias.
Evaluations affect individuals’ access to college, graduate school, scholarships employment, and other resources. They are by definition subjective: the goal is to express an assessment, on which well-intentioned people can differ. As with all subjective human assessments, bias can affect decisions. Bias in evaluations includes the usual demographic categories of age, gender, ethnicity, national origin (e.g. [19]), but in addition may include the perceived value of the work; the perceived communication ability of the person; their perceived ability to get along with others; and so on. The latter criteria are troublesome to address, as they can in fact represent valid bases for a negative evaluation, depending on the context. On the other hand, if such biases are correlated with demographics, that should be addressed.
Trainees, with participating faculty (Brennan, Rambow, others) will examine bias in existing corpora (e.g., letters of recommendation; reviews of conference papers) that have been annotated by trained annotators for “doubt-raisers” [16] and other potential issues, or flagged in their natural life cycle. Machine learning methods will be applied, taking account of the evaluation's language and critically, its context (see [4]). The study will yield qualitative and quantitative findings, as well as tools to help detect potential bias within specific contexts.
The idea that simpler explanations of observations should be preferred to more complex ones is a pillar of scientific thought. It is also a hypothesis about human perception and cognition: there is a cognitive bias towards simple explanations [9]. What counts as simple and what counts as complex, however, is the subject of much work in mathematics [15], philosophy [26], and information theory [5]. Research in computational and mathematical linguistics by Co-PI/Linguist Jeffrey Heinz and colleagues explores these ideas in the domain of language and language acquisition.
The extraordinary ability of children to learn a first language has been famously theorized to be due to innate parameters; however, the extent to which the grammatical generalizations made during learning languages are simple, and what simple means in this context, provide a strong rationale for both “big data”- and "small data"-centric analyses of language-based data using textual corpora. Simplicity in the context of learning systems is relevant to neural systems as well. Neural systems can be described as having optimal bias under a cost function; Associate Professor Memming Park studies the theoretical properties of neural systems [23].
“Science is a highly stratified social system” ([20], p. 1). Persistent and pernicious biases emerge in scientific institutions and from traditional practices that affect interaction at seminars [6], who gets published, who gets cited [7] [24], and who gets funded [12]. This bias can arise quite unintentionally within systems that pride themselves on being “objective”, advantaging certain authors, groups, and laboratories. Such advantages, deserved or not, may amplify exposure to certain work through conference presentations, prestigious journals, social media attention, and accelerating citations. There is obviously a social aspect to this cycle of reinforcing bias within the scientific community [7]. Although science is often assumed to be self-correcting, biased practices can lead to forgotten studies and ignored authors.
Solutions have been proposed at the institutional level to even out the playing field, including double blind reviewing, randomized exposure at conferences, increasing awareness of biases that arise when researchers build their own knowledge networks with attached names, and avoiding “manels” (panels with only male participants; [8]). Blind review and other blinded practices are not always the answer [28]; investigation is needed to determine which interventions are effective in which contexts.
Gender-disaggregated data are often used by policy makers to address institutional barriers to women’s equal participation in the labor marketplace. However, such data often conceal important differences. In Africa (and elsewhere), such barriers can be due to political, economic, social, cultural and religious institutional factors [1].
Adryan Wallace (Africana Studies, Women's Studies, Political Science) conducts fieldwork across Africa that blends qualitative data (interviews, surveys, and ethnographic observations) with quantitative data collected from governmental and policy organizations and labor treaties, in order to address differences in women and men’s labor force participation, along with other disparities, including in human rights [29] [30] [31]. An intersectional analysis (rather than disaggregating data simply by gender) reveals a dynamic picture of how underlying political and social factors impact women’s economic experiences are invisible within the current gender labor statistics conceptual framework. NRT trainees who are interested in institutional bias and/or in working with multiple data types may collaborate on analyses.