Masters Thesis

“The Role of Trust in Predicting Attitudes Towards AI-Generated Digital Human Avatars”

Abstract

This paper explores how trust influences consumer perceptions of AI-generated digital human avatars. Through quantitative analysis and survey data, it examines social and technological trust factors impacting user attitudes, and investigates the role of general trust in predicting preference for or aversion to digital human avatars. Findings highlight the influence of both social trust, rooted in interpersonal relationships, and technological trust, linked to digital technology and AI’s perceived capabilities. The study identifies specific trust dimensions driving user trust or distrust in avatars. Practical implications include insights for designers and policymakers to enhance user trust and acceptance, crucial for the successful integration of AI-driven avatar technologies. In summary, this research contributes valuable insights into the role of trust in shaping attitudes towards AI-generated avatars, offering practical implications for their development and deployment.

Methods

This study focuses on the perceptions around AI-Generated avatars, and the influence of trust in predicting the outcome to trust or distrust such an avatar. The avatars being tested in this instance were created in HeyGen, the front-running software suite for designing digital human avatars. As a result of the current restrictions on real-time avatar generation and real-time user interaction, the use of pre-recorded videos made the most sense for this study. 

As this is a fundamentally comparative study, it necessitates a point of comparison to evaluate differences in the variables and outcomes being tested. The use of a “Human-Made” video provides the natural reference point in this instance, allowing for case-by-case testing of how users respond to a human made versus an AI-Generated video. However, since this is a study focusing on perceptions around, rather than capabilities of this technology, it made sense to level the playing field and make both videos either human-made or AI-generated. In this experiment, it was determined that making both videos AI-generated made the most sense for both the underlying purpose of the study, as well as its potential applications. 

The use of two AI-generated videos not only accounts for any potential limitations of the technology – the empirically evident but hard to quantify “uncanny valley” for instance – it also refocuses the findings towards the application of this technology in potential business or communications settings. Ultimately, interest in this research question comes from the potential application of this technology in real-world scenarios, so it seems fair to put it to the test. Furthermore, by concealing true video authorship, condition sorting was made simpler with Group A being told Video A was AI, and Group B believing Video B was AI. This random group allocation controls for any potential bias inherent to the videos, be it specific speech patterns, facial expressions, shirt color, etc. 

Picking a video topic proved more challenging than originally expected. It had to be a realistic, real-world application of this technology in a predominantly people-facing industry in a setting that deals with matters of trust. Eventually, after research into job titles most likely to be replaced by AI, and industries that frequently deal with high or low levels of trust, it was decided that Real Estate would provide a useful backdrop for exploring the research questions of this study. 

Two HeyGen avatars were created using the likeness of the Author. They were given different fake names and presented participants with real-estate information about two fictional listings from two fictional companies: “Harmony” and “Melody” real estate. In these videos, the realtor regretfully informs the participant that an apartment they were interested in has sold, but a new, similar listing is available in the same neighborhood. The participant is then invited to an open house viewing, and urged to place a downpayment on the property, as it’s likely to sell quickly.

The videos had to be very similar, though not identical, so users wouldn’t guess their true AI authorship. This was achieved by training the avatars on two videos where the original human model wore the same jacket and tie, but different, neutral-colored, shirts. These videos were recorded against a white wall in similar lighting conditions, where the model repeated the same script to ensure consistency in training the models’ mannerisms and facial expressions. Next, the script fed to the AI-Avatars was nearly identical, with details like the property name, real-estate company name, and street address being changed to match their respective videos. Finally, the stock photos representing the “available apartments” were style matched, both depicting white-wall, modern, empty apartments to account for any potential preferential bias.  

An additional software, ElevenLabs, was used for generating a cloned voice that was used for both avatars, again ensuring consistency between the videos. The ElevenLabs API is integrated into HeyGen, making its application seamless. This cloned voice was trained on 4 minutes of speech from the author, and modified to sound more natural in its pauses and tempo. Finally, after the videos were rendered, minor post-effects like subtle room noise, equalization, and slight echo were added to the audio to give it a sense of authenticity. While this is a minor detail, ensuring that the audio sounds like it’s coming from the room and the speaker on the screen is essential to the illusion of authenticity and “human” authorship.

Participants in the study, regardless of their assigned group, were first tasked with completing a demographic survey. This information regarding age, gender, education, and occupation proves useful for segmenting data and finding the strongest explanatory variables in the outcomes of video preference, and the decision to trust or distrust the video labeled as AI-Generated. Participants were also asked about their use of technology and social media, and any previous experience with generative AI tools. Again, providing more data points for further analyzing the results of the experiment. 

Next, participants were tasked with completing a self-reported panel of questions regarding trust borrowed from Yamagishi, T. & Yamagishi, M. (1994). Trust and commitment in the United States and Japan. Motivation and Emotion, 18, 129-166. This 6 question panel was chosen for its simplicity in scoring, and its long-term application in similar studies, and its quick completion time so as not to exhaust participants. The subsequent questions took the same underlying ideas and applied them specifically to technological trust to gauge participants’ trust of technology, again, providing another data point for analyzing predictors of outcomes in this experiment. 

Participants then watched both videos, confirmed their viewership, and were tasked with evaluating the trustworthiness of each video, its information, and its speaker. At this point, they were also asked whether they could tell the video labeled AI-Generated was AI, and if so, how. Finally, participants were asked about preference between the videos on a 1-10 scale. Again, if users had a preference, they could explain why at this point. It was then revealed on the last page that both videos were AI-Generated, and users were given the option to withdraw from the study or enter their email for the chance to win a gift card.