VOICE
ASSISTANT
Experimental Research on the Students Satisfaction of the Intelligent Voice Assistants, Siri and Google Assistant.

OVERVIEW
Voice assistants (VA) have iterated with many improvements, which delivers increasingly valuable and accurate performances for consumers. And they have also been developed into incredibly intelligent and useful tools, for many everyday users. They can be used for general searching, scheduling, calculations, checking messages and emails, etc.

However...
Are users satisfied with current
voice assistants?
Our mission is to conduct experimental research to understand user satisfaction with intelligent voice assistants and compare the satisfaction between the most popular two voice assistants on smartphones, Siri and Google Assistant.

​MY ROLE
UX Researcher

METHODS
Literature Review
Experimental Design
Questionnaire
Qualitative Interview

DURATION
Two Months
Oct - Dec, 2017

• Project Timeline •
RESEARCH DESIGN
Design Process of the Experimental Research from Preparation to Data Collection
Siri and Google Assistant, as the VA of iOS and Android system respectively, they may have a different experience in terms of using voice assistants, which may result they have different satisfaction. So, in our experimental research, we used ‘between subjects’ study, trying to explore whether users who utilize Google assistant have a higher overall sense of satisfaction, compared to that of Siri users.
• Preparation •

DEVICES
Public iPhone with Siri
Public Android Phone with Google Assistant

PARTICIPANTS
12 participants totally
Including 3 RIT staff who deal with student affairs, 9 RIT students

CONTENTS
Semi-structured interview
10 prepared questions for students
4 prepared questions for staff

RECORD
iOS Screen Recorder: Quicktime Player
Android Screen Recorder: AZ Screen Recorder
Google Form: Answers to the Questionnaire
Paper: Results of Observation and Follow-up Questions

DURATION
20 minutes / per participant
• Experiment Process •

• Tasks •

• Questionnaire & Follow-up Questions •
The Questionnaire included three parts which are Consent Form, basic Demographics and Likert Questions of each task. The Likert scales were separated into four aspects as follows. For the structured dialogue, the questions were prescribed for each sub-task.

Before each task, there were detailed description and requirement of the task. And after each section, there was an informal interview with several follow-up questions which depended on the performance of participants for the two tasks.
What problems did you come across when you were doing these tasks
What problems did you come across when you were doing these tasks
FINDINGS
​Data Analysis with ANOVA Test
We collected all the answers to the Likert questions which would be used to run ANOVA tests.
We also broke down each question in our Task Groups based on user-goals: Completion, Effort, Recognition, and User Satisfaction, which were used to find if any aspects of the tested Voice Assistants were viewed in a significantly different way compared to its counterpart.
According to the analysis of results, all the p-values are above 0.05, which means there is no statistical significance in user satisfaction, task completion, speech recognition, and effort taken. and even sub-goals, between Google Assistant and Siri. We cannot reject our null hypothesis: there is no significant difference in user satisfaction between Google Assistant and Siri amongst student users.

DISCUSSION
Discuss the Possible Reasons for the Results and Get Insights from the Results.
Why our results did not match our initial expectations?
We think THE MOST IMPORTANT ONE is that our experimental design only tested Siri with iOS users, and test Google Assistant with Android users. Participants only experienced the system they were most likely going to be comfortable with. Without providing our participants with a comparison of these two VA’s, it is highly likely that they bound to feel satisfied with the VA they tested.
Besides, There are also some other possible reasons as follows:
- At the present state, Siri and Google Assistant are established intelligent systems, that are capable of handling users’ basic needs and expectations. This translated to consistent high scores in the majority of our experimental tests.
- Given the age and technical environment our participants stemmed from, our test subjects were nearly all advanced VA users, who have several years of experience interacting with either Siri or Google Assistant. Their performance for our basic tasks, like setting a reminder and general searching, required little effort on their part and generated high marks in completion, satisfaction, and recognition.
- Our use of the Likert scale, which was utilized to collect the subjective responses from the participants, may have generated a positive response bias. Though we took several actions to avoid biased responses, like using neutral wording, providing questions instead of statements, providing response labels, and conducting the questionnaires with the computer-based presentation, we cannot rule out the potential for biased conditions. For instance, the participants we recruited from the class were prone to have a ‘can do’ attitude towards the tasks because they were volunteering for the research testing and possibly had more patience and confidence to accomplish the tasks
Although we could not provide a statistically significant difference for user satisfaction between Siri and Google Assistant, based on the Likert scale survey results above, we do find some differences and problems from our observation and informal interviews during the experiment.
Task 6 - Structured Dialogue: After searching the flight to NYC, ask “What is the weather like there?”

This may suggest that Google Assistant is better in interpreting contextual dialogue than Siri. But future work needs to be done to explore this hypothesis.
What insights we got from the results?
Speech Recognition
Task 3 - Web Search: Explain ‘experimental research’

Task 5 - Structured Dialogue: Ask movie times for Thor and navigator to the theater

Google Assistant may do better in responding with systematically formatted responses and asking follow-up questions and giving one-step answers to solve problems directly than Siri. However, further work needs to be done to establish whether users prefer the response manner of Google Assistant and for what tasks if the preference really exists.
Response Dialogue
CONCLUSION & FINAL DOCS
From our experimental research, although we didn’t find statistical significance in the satisfaction between Google Assistant and Siri, we summarized four possible reasons for the results: 1) The established technique of the two voice assistants. 2) Our participants’ familiarity to the two voice assistants. 3) The lack of comparison between the two different voice assistants. 4) The positive response bias to the Likert scale.
And regardless of the observation and informal interviews, we proposed plausible differences and problems between Siri and Google Assistant in two aspects:
1) Speech recognition.
2) Response dialogue.
If you want more details about the project, see Final Paper.