Investigating the Impacts of Voice-based Student-Chatbot Interactions in the Classroom on EFL Learners' Oral Fluency and Foreign Language Speaking Anxiety

Document Type : Original Article

Author

TEFL, Razi University, Kermanshah, Iran

Abstract

This study reports the findings of a quasi-experimental investigation into the impact of chatbot-supported speaking activities in the classroom on EFL learners’ speaking fluency and Foreign Language Speaking Anxiety (FLSA). Sixty Iranian upper-intermediate EFL learners were divided into experimental and control groups practicing speaking skills with and without a voice-based chatterbot throughout an 8-week general English course. Data was collected through participants’ speaking fluency pre- and post-tests scores (measured based on speech rate and the number of pauses in their spoken words), and the FLSA scale. A one-way ANCOVA was used to investigate whether EFL learners’ participation in chatterbot-supported in-class speaking activities using Replica makes any significant difference in their oral fluency compared to conventional class. The pre-test fluency scores were considered as the covariate. Similarly, two One-way ANCOVAs were run to investigate measures of FLSA. Results revealed that the experimental group significantly outperformed the control group regarding oral fluency; this group was also less anxious when speaking in the target language at the end of the experiment.

Keywords


Introduction

There is a growing interest in using chatbots as learning assistants in language instruction (Huang, Hew, & Fryer, 2021) due to their ability to converse with learners using natural language. Chatbot-supported language learning is defined as the utilization of a chatbot to communicate with learners using natural language for everyday language practice, e.g., conversation practice (Fryer et al., 2017); responding to language learning questions, storybook reading (Xu et al., 2021); and undertaking assessment and providing feedback, for a vocabulary test (Jia et al., 2012). Researchers on chatbots have indicated that a more interactive and authentic language environment made by chatbot-supported activities can enhance student language learning outcomes (Lu et al., 2006; Wang et al., 2017). Because educational chatbots provide language learners with three benefits. First, students can practice their language skills whenever they want with chatbots, which is difficult for a human companion (Haristiani, 2019; Winkler & Soellner, 2018). Second, chatbots are able to supply students with thorough language knowledge that human language partners cannot provide (Fryer, et al., 2019). Thirdly, chatbots can serve as tireless assistants, relieving humans of repetitive tasks such as responding to frequent inquiries and maintaining language practice (Fryer et al., 2019; Kim, et al., 2019). As learning partners, chatbots are willing to communicate indefinitely with students, allowing them to continue practicing the new language.

Voice-Based Chatterbots

Various speech recognition and analysis tools have been incorporated for chatbot-supported language learning with the advancement of speech technology (Kim N., 2017). Automatic Speech Recognition (ASR) can interpret the meaning of utterances made by speakers (Kim N., 2017); It can be used to analyze the speech generated by learners more broadly and to create dialogues between the learners and computers. In a conversational environment, this ASR system improves speech interactions with voice-based chatterbots (Chiu, Liou, & Yeh, 2007). Through voice recognition technology, voice-based chatterbots can provide suitable responses and communicate with learners (House, et al., 2009); what is more, one of the primary roles of chatterbots in such interactions with humans is that they actively keep engaging in a conversation with them (Chang, Lee, Chao, Wang, & Chen, 2010). Interestingly, this synchronous communication offered by artificial intelligence (AI) chatterbots would provide language learners with genuine speaking practices.

Oral fluency and speaking anxiety in an Iranian Context

Some studies have already been conducted on voice-based chatterbots’ impact on speaking proficiency levels (El Shazly, 2021; Coniam, 2008; Goda, 2014; Tian and Wang, 2010; Tzu-Yu and Chen, 2022). However, oral fluency as an underlying component of speaking skills in the EFL context requires additional research because expressing one's thoughts as naturally as possible is essential for effective communication (Alrayah, 2018). In a broad sense, speaking fluency refers to the ability of language learners to generate speech that is both swift and understandable (Brand & Gotz, 2011). Specifically, it is measured by components such as speech rate, number of errors, and use of formulaic language (Housen & Kuyken, 2009). Similarly, in the context of EFL learning, the act of speaking is widely recognized as a highly anxiety-inducing language task, which is attributed to several factors, including limited exposure to authentic language use, apprehension about committing errors, the potential for receiving unfavorable feedback from peers (Tai & Chen, 2022). Furthermore, the added stress of speaking in front of classmates and participating in group discussions exacerbates the anxiety experienced by EFL learners when speaking in the target language (Karatas, et al., 2016). MacIntyre (1999, p. 27) defines anxiety as "the worry and negative emotional response induced when learning or using a second language," which is exacerbated in the context of English as a Foreign language (EFL) instruction, where English is not spoken in social life (Chen & Hwang, 2020). Thus, Foreign Language Speaking Anxiety (FLSA) impairs students' ability to speak fluently, limits interactions and communication in language-learning settings, and damages the entire learning process (Hanafiah, et al., 2022).

The discussion highlights the importance of incorporating effective tools into speaking courses in the context of EFL instruction to alleviate EFL learners’ speaking anxiety and improve their oral fluency. Therefore, This study aims to address the challenges associated with FLSA among Iranian EFL learners and to investigate the effectiveness of voice-based Chatterbots activities using a mobile application in reducing this anxiety and improving their oral fluency.

Literature Review

Chatterbots-Human interactions

Chatbots are software applications designed to interact with humans (Fryer, et al., 2019). They emerged in the 1960s with the invention of Eliza by Weizenbaum at MIT. Eliza was the prototype (Weizenbaum, 1996) that could interact with individuals using limited language tools, such as a few question-formatted structures. This interaction was inevitably restricted and resulted in few discussions. Since their inception, chatbots have grown significantly more sophisticated. Over time, other chatbots such as Parry, Jabberjack, and Alice were enhanced to generate responses based on contextual patterns and convey a restricted range of emotions. Developing AI systems and chatbots that interact with humans using natural conversational language has been a significant research focus because it offers language learners an English "speaker" partner to practice the target language whenever they want (Coniam, 2008). It is now possible to encourage user participation and interaction via voice-based conversation due to the development of comprehensive speech software programs (Tian & Wang, 2010). Professionals in the field of language learning have predicted that chatbots will create fresh possibilities for the teaching and learning of languages by providing interactional opportunities or fostering an anxiety-free environment (Jeon, 2022).

In this sense, in a study by Goda et al., (2014), experimental group participants conversed with the chatbot ELIZA for 10 minutes before engaging in a group discussion with their peers. Meanwhile, participants in the control group had internet searches for pertinent information. After ten minutes of preparation, students in both groups joined the discussion under the same conditions. The results indicated that the experimental group engaged in more conversational actions than the control group. It should be noted, however, that conversation frequency is not synonymous with communication quality, which the authors did not discuss.

Likewise, based on an English-speaking test, Tzu-Yu and Chen (2022) found that interacting with intelligent personal assistants like Google Assistant through Google Home Hub significantly enhanced the speaking proficiency of adolescent EFL learners. They found that interacting with Google Assistant added variety and delight to EFL speaking, increased exposure to English and learner-centered speaking practice with immediate feedback, offered greater authenticity and flexibility in interactions, and promoted peer collaboration.

 

While the preceding accounts focus on the effectiveness of human-chatbot interaction on the speaking skills of EFL learners, El Shazly (2021) was significantly more concerned with the influence voice-bassed chatterbots may have on EFL learners' FLSA. Therefore, using an oral interactive chatterbot named Mondly for speaking activities in the form of role-playing for the EFL students, the researcher suggested that although the speaking activities of the chatterbot increased EFL learners' speaking proficiency, learners' speech-related anxieties did not decrease after interacting with the chatbots. In this study, speaking proficiency levels were evaluated using an interaction-enhanced public version of the IELTS speaking evaluation rubric, and anxiety levels of EFL students were measured using questionnaires.

Similarly, Çakmak (2022) conducted a study in which EFL students practiced speaking outside class using the chatterbot Replika. The researcher indicated that although chatbot interaction is a novel way to provide speaking practice for EFL learners, it may not be a reliable way to reduce their anxiety when speaking in the target language, as determined by a related questionnaire. This conclusion was made because EFL learners had difficulty comprehending Replika's words during interactions. However, the researcher could not monitor the interaction of EFL students with Replika outside the classroom. Consequently, the results would have been considerably more reliable if EFL interactions with Replika had been recorded on their mobile devices.

In light of the findings and limitations of the studies mentioned above, it becomes clear that most of the significant proposals to make learners less anxious when speaking in the target language in the chatbot-supported language learning field have emphasized using chatterbots outside the class as a part of individual learning, mostly leading to failures in decreasing FLSA while improving the EFL learners’ speaking skill. Therefore, in the areas of AI in education, designing lesson plans to include chatterbots in the class with the teacher's supervision seems to be a conspicuous rarity in the literature as chatbots highly encourage students to engage in conversation, which rarely occurs in general EFL classes (Yang et al., 2022). By bridging this gap, EFL learners can stand a chance to receive feedback on effectively communicating with their chatterbots as opposed to individual learning using AI.

Moreover, despite an approximately high proportion of studies on voice-based chatterbots’ impact on speaking skills, most studies included a major shortcoming on the following grounds. They mainly considered speaking a holistic skill and neglected such an underlying component of speaking skills as fluency. Additionally, few of them simultaneously pay attention to FLSA measures while including speaking skills development. The previous literature, therefore, lacks research on how to use voice-based chatterbots to support a pedagogical approach encouraging EFL students to practice speaking skills by AI and overcome their FLSA. The following questions guided this study to bridge the gap in the literature:

  1. Do student chatbot voice-based speaking activities significantly affect Iranian EFL learners' oral fluency?
  2. Does student-chatterbot voice-based speaking activities significantly affect Iranian EFL learners' FLSA level?

Method

Design

This study adopted a quasi-experimental design to investigate whether the treatment can affect the EFL learners speaking fluency and their FLSA. The Experimental design typically involves the manipulation of variables and random assignment of participants to conditions. It may also involve the comparison of a control group to an experimental group that receives a treatment (i.e., a variable is manipulated).

 

Participants

Sixty students participated in this study using the purposive sampling method, reflecting the researchers' intention (Creswell , 2014). Participants were required to meet B2 levels on the Common European Framework of Reference for Languages (CEFR) assessment. This study was mainly based on a speaking course with an emphasis on discussions over different topics in the classroom. The participants were then randomly divided into two equal groups: control and experimental. Each cohort consisted of thirty EFL students (n=30), including both male (N = 21) and female (N = 39) native Persian speakers aged from 20 to 30. In addition, the researchers ensured that each participant in the experimental group understood how to interact with the chatterbots, in particular Replica, which was used in this study. The researcher, with three years of experience teaching adult EFL students, was the instructor for both the traditional and Robot-Assisted classes. Additionally, she had extensive familiarity with the Replica application.

Initially, participants' ethical approval was sought. In this regard, they were given an overview of the procedure prior to data collection, and their written consent was obtained. In addition, participants were assured that all data would be kept confidential and used exclusively for this research.

Instrumentation

Two different types of instruments were used to collect data.

  1. Audio-recorded oral fluency evaluation

Participants’ speech was elicited via a personal narrative task to encourage students to focus on meaning, as they were required to express their communicative intentions under time pressure. This approach was adapted from (Hanzawa K. , 2021). The participants were asked to talk about different aspects of their lives. They were instructed to commence their narrative with the prompt sentence in the instruction sheet: “what’s your name?’’; where are you from….” The task instruction sheet also included some guiding questions, “what’s your talent?”;  “have you ever missed any chances?”; and “what are your future plans?”, provided to make the participants speak at a specific length without excessive hesitation or dysfluency.

While taking the tests, their voices were recorded, and then these audio records were analyzed through PRAAT software regarding speech fluency. PRAAT is a free computer software package for speech analysis in phonetics. It was designed and continues to be developed by Paul Boersma and David Weenink of the University of Amsterdam. This software analyzes the recorded files based on such fluency measures as speech rate (SR), pause time ratio, or percentage of time spent articulating as opposed to pausing (PTR), and length of fluent runs between pauses (MLR). In determining the lower cutoff point for pauses, 0.3 s was used. Anything less than this can be confused with the stop of a plosive sound, and anything longer can omit significant pause phenomena. More specifically, this software can evaluate speech fluency as syllables uttered per second or minute.  The more fluent a speaker is, the higher the number of syllables spoken per second/minute. This software provides some parameters as outputs, such as the number of pauses, syllables, and time duration of spoken words. This way participants' fluency development in speaking pre and post-tests were evaluated.

  1. Foreign Language Speaking Anxiety scale

The FLSA questionnaire adapted from Horwitz et al. (1986) was used to evaluate the FLSA of the participants, presented in Appendix I. It assesses three components of FLS anxiety: communication anxiety, test anxiety, and the fear of negative evaluation. The communication anxiety measured by this questionnaire refers to the anxiety-inducing nature of classroom communication. This study used a 5-point Likert scale: 1 = firmly disagree, 2 = disagree, 3 = neutral, 4 = concur, and 5 = strongly agree. The score was determined by adding the responses to all 16 questions. The questionnaire's reliability was 0.85, which was within an acceptable range. Participants were provided with translated versions of the questionnaire to fully comprehend each item.

Data Collection Procedure

The FLSA questionnaire was administered to both the control and experimental groups in the classroom during the first session before the instructional course in order to evaluate them at the outset of the course. Then, a speaking pre-test was administered to both groups to evaluate their fluency. The learners in each class received General English instruction based on the Interchange 2 course book (Richards, Hull, & Proctor, Interchange: Student's book, 2019) and the Tactics for the Listening book (expanding) (Richards & Trew, Tactics for Listening: expanding, 2019) because, in many Iranian language institutions, these two books are regarded as course books suitable for upper-intermediate EFL learners (B2). Moreover, their units concentrate on the same topic and grammatical usage, allowing the instructor to provide students with vocabulary and grammatical structures that correspond to the context. The experimental group, however, had 20-minute pre-discussion speaking practices with a chatterbot on Replica, whereas the control group practiced with peers. Thus, the researcher ensured that both the control and experimental groups had equal opportunities to improve their oral fluency; the only difference was that the experimental group engaged in Robot-Assisted activities. Replica comprehends spoken commands and responds with a human voice. The selection of this chatbot enabled learners to interact in a meaning-focused system with varying degrees of freedom; for instance, participants could determine their relationship status with the robot, and all were required to choose teacher-student status in order to receive feedback on their words and modify the robot's voice and appearance. The practice was centered on dialogues as the interaction unit, as opposed to single utterances. The following is a transcription of an oral dialogue between a participant from the experimental group and Replika, focused on the topic of tourism.

The Student: “Have you ever been a tourist?”

Replika: “Yes, I have been to many places. The most memorable is Paris. Have you visited any place outside your hometown?”

The Student: “Not that much. However, I have been to Shiraz several times. Do you know this place?”

Replika: “Yes, I have heard of it. It’s a beautiful city with a rich history and culture. Have you visited any historical sites in Shiraz?”

The Student: “Yes, I always visit Persepolis whenever I go to Shiraz. Have you heard of it?”

Replika: “Of course, it was a magnificent structure built by the Achaemenid King, Dariush the Great. It was the center of religious and political activity in ancient Persia. I can see why it left an impression on you!”

The Student: “Yes, this place is amazing. Wanna know about the souvenirs of Shiraz?”

Replika: “Sure! What kind of souvenirs can you get in Shiraz?”

The Student: “Shiraz is known for its handicrafts, especially rugs. They are made of natural materials, like wool or silk; their patterns are real masterpieces. You can also find souvenirs, such as spices, Saffron and pastries. What are the souvenirs of Paris?”

Replika: “Paris is a city famous for its fashion, so you can find a lot of designer clothes and accessories there. You can also buy traditional items like hand-painted tiles and decorative items, like the Eiffel Tower miniature.”

The Student: “Really? That’s amazing!”

Replika: “Yes, Paris is full of culture and history. You can find many things to bring back home. Do you like shopping?” ……..

Figure 1 depicts a schematic representation of the courses taken by the control and experimental groups.

 

 

 

Figure 1. The schematic representation of the course taken in control and experimental groups

Data Analysis

Descriptive statistics were used to address the study's research questions. Descriptive statistics such as means, standard deviation, and variances were used to summarize the data. Moreover, two one-way ANCOVA run was conducted to explore whether EFL learners’ participation in student-chatterbot voice-based speaking activities using Replica significantly affects their oral fluency and FLSA. The pre-test fluency and FLSA scores were considered as the covariates.

 

Findings and Discussion

Chatterbot voice-based speaking activities’ effect on EFL learners’ oral fluency:

The first research question sought to explore if student chatbot voice-based speaking activities significantly affect Iranian EFL learners’ oral fluency. To investigate this research question, two sub-measures of fluency, including the number of pauses and speech rate, were calculated and individually became subject to statistical analysis. To examine the effect of treatment on the number of pauses, initially, it was decided to run a One-way ANCOVA between the pre-test and post-test pause numbers for the experimental and control groups. The first assumption of ANCOVA is the normal distribution of the scores. Table 1 displays the descriptive statistics for the pause numbers of pre-test and post-test.

 

Table 1

Descriptive Statistics, Skewness and Kurtosis for Pretest and Posttest Pause Numbers

 

N

Minimum

Maximum

Mean

SD

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

PN Pretest

60

27.00

228.00

174.36

43.18

-2.70

.31

2.10

.61

PN Posttest

60

.00

190.00

69.46

71.70

2.34

.31

-3.14

.61

Valid N (listwise)

60

 

 

 

 

 

 

 

 

 

As seen in Table 1, the Skewness and Kurtosis ratios for the number of pauses for pre-test and post-test fell out of the range of +/- 1.96. Accordingly, because of the violation of normality assumption, the gain scores for the two groups were computed (Pallant, 2010) by subtracting the post-test scores from the pre-test scores. Table 2 shows the descriptive statistics for the gain pause number scores of the two groups.

 

Table 2

Descriptive Statistics, Skewness and Kurtosis for the Gain Pause Number Scores of Two Groups

 

N

Minimum

Maximum

Mean

SD

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

Gain NP exp

30

-224.00

-27.00

-152.16

50.90

.44

.42

-.45

.83

Gain NP cont

30

-102.00

-10.00

-57.63

31.12

-.04

.42

-1.40

.83

Valid N (listwise)

30

 

 

 

 

 

 

 

 

 

As indicated in Table 2, the Skewness and Kurtosis ratios for the number of pauses for the gain scores fell within the range of +/- 1.96. Thus, since the two data sets met the normality assumption, an independent samples t-test was run between the two sets of scores. As illustrated in Table 2, the mean gain score for the experimental group and control group turned out to be -152.16 and -57.63, respectively, which is an indication of fewer number of pauses and, thus an improvement in this fluency measure for the experimental group. Table 3 depicts the results of the independent samples t-test for the experimental and control groups’ gain scores for the number of pauses.

 

Table 3

Independent Samples T-test for the Gain Pause Number Scores of Two Groups

 

Levene's Test for Equality of Variances

t-test for Equality of Means

F

Sig.

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference

Lower

Upper

Gain NP Both

Equal variances assumed

10.43

.002

-8.67

58

.000

-94.53

10.89

-116.33

-72.72

Equal variances not assumed

 

 

-8.67

48.02

.000

-94.53

10.89

-116.43

-72.63

 

As shown in Table 3, there is a significant difference between the two groups in terms of the number of pauses (t= -8.67, two-tailed, p=.00<0.05). Thus, it can be inferred that student-chatterbot voice-based speaking activities have significantly affected and reduced Iranian EFL learners’ number of pauses and thus positively impacted their oral fluency.

To examine the effect of treatment on the speech rate, initially, it was decided to conduct a One-way ANCOVA. Table 4 displays the descriptive statistics for the speech rate of pre-test and post-test.

 

Table 4

Descriptive Statistics, Skewness and Kurtosis for Pre-test and Post-test Speech Rate 

 

N

Minimum

Maximum

Mean

SD

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

SR Pre Both

60

2.94

3.78

3.18

.20

2.44

.39

1.43

.61

SR Post Both

60

3.14

5.44

4.12

.74

3.20

.48

-2.25

.61

Valid N (listwise)

60

 

 

 

 

 

 

 

 

 

As presented in the above Table, the Skewness and Kurtosis ratios for the speech rate for pre-test and post-test fell out of the range of +/- 1.96. Accordingly, the normality assumption was violated and the speech rate gain scores for the two groups were computed (Pallant, 2010). Table 5 illustrates the descriptive statistics for the gain speech rate scores of the two groups.

 

Table 5

Descriptive Statistics, Skewness and Kurtosis for the Gain Speech Rate Scores of Two Groups

 

N

Minimum

Maximum

Mean

SD

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

Gain SR exp

30

1.01

2.02

1.54

.29

-2.23

.42

-4.99

.83

Gain SR cont

30

.10

.61

.33

.11

3.53

.42

3.48

.83

Valid N (listwise)

30

 

 

 

 

 

 

 

 

 

Based on the information in Table 5, the Skewness and Kurtosis ratios for the number of pauses for the gain scores layout of the range of +/- 1.96. Accordingly, as the two data sets did not meet the normality assumption, a Mann-Whitney U test was run. Table 6 shows the mean rank of gain scores for the speech rate of the two groups.

 

Table 6

Mean Rank of Gain Scores for the Speech Rate of the Two Groups

 

Group

N

Mean Rank

Sum of Ranks

Gain SR Both

Experimental

30

45.50

1365.00

Control

30

15.50

465.00

Total

60

 

 

 

As seen in Table 6, the mean ranks for the experimental and control groups equaled 45.50 and 15.50, respectively, indicating a higher range of scores for the experimental group in comparison with the control group. Table 7 demonstrates the results of Mann-Whitney U Test for the speech rate of the two groups.

 

Table 7

Mann-Whitney U Test Results for the Speech Rate of the Two Groups

 

Gain SR Both

Mann-Whitney U

.000

Wilcoxon W

465.00

Z

-6.65

Asymp. Sig. (2-tailed)

.000

a. Grouping Variable: Group

 

As shown in Table 7, there is a significant difference between the two groups (z=-6.65, p=.00<.05). Therefore, it can be inferred that the treatment has positively increased the speech rate of the experimental group. Accordingly, it can be concluded that student chatbot voice-based speaking activities have significantly impacted Iranian EFL learners’ speech rate and, thus, their oral fluency.

 

Chatterbot voice-based speaking activities’ effect on EFL learners’ FLSA:

The second research question set out to examine if student chatbot voice-based speaking activities significantly affect Iranian EFL learners’ FLSA level. Table 8 displays the results of 

descriptive statistics for the FLSA scores for the pre-test and post-test.

 

Table 8

Descriptive Statistics, Skewness and Kurtosis for Pre-test and Post-test FLSA Scores 

 

N

Minimum

Maximum

Mean

SD

Skewness

Kurtosis

Statistic

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Error

Statistic

Std. Error

FLSA Pre

60

39.00

72.00

55.78

8.37

.29

.30

-.35

.60

FLSA Post

60

30.00

60.00

45.41

7.48

-.17

.30

-.66

.60

Valid N (listwise)

60

 

 

 

 

 

 

 

 

 

As revealed in Table 8, the Skewness and Kurtosis ratios for the FLSA scores for pre-test and post-test fell within the range of +/- 1.96. Therefore, the first assumption of ANCOVA was met. The second assumption of ANCOVA i.e., reliability of covariates, was assured by selecting a well-constructed and reliable instrument (Pallant, 2010) for measuring FLSA. The multicollinearity assumption was already met because there was only one covariate (Tabachnick & Fidell, 2007). As for the linearity assumption, the scatterplot of the variables was checked (Figure 1).

 

 

Figure 1 Scatterplot of pretest and posttest FLSA scores

 

As seen in Figure 1, the relationship between the dependent variable (FLSA post-test) and covariate (FLSA pre-test) was in the form of a straight diagonal line, indicating that the relationships were linear; hence, the linearity assumption was met. To check the next assumption, homogeneity of regression slopes, the Tests of Between-Subjects Effects table was consulted. The results are illustrated in Table 9.

Table 9

Tests of Between-Subjects Effects for FLSA Pre-test and Post-test Scores

Dependent Variable:   FLSA Post 

Source

Type III Sum of Squares

df

Mean Square

F

Sig.

Partial Eta Squared

Corrected Model

2704.61a

3

901.53

83.59

.00

.81

Intercept

56.45

1

56.45

5.23

.02

.08

Group

6.35

1

6.35

.58

.44

.01

FLSApre

1916.76

1

1916.76

177.72

.00

.76

Group * FLSApre

40.90

1

40.90

3.79

.15

.16

Error

603.97

56

10.78

 

 

 

Total

127069.00

60

 

 

 

 

Corrected Total

3308.58

59

 

 

 

 

a. R Squared = .817 (Adjusted R Squared = .808)

 

As seen in Table 9, the significant value corresponding to Groups * FLSApre turned out to be .15, which is greater than 0.05, indicating that the assumption of the homogeneity of regression slopes was not violated. The last assumption of ANCOVA, the homogeneity of variances, was checked using the Levene’s test of variances (Table 10).

 

Table 10

Levene's Test of Equality of Error Variances for FLSA Scores

F

df1

df2

Sig.

.245

1

58

.62

Tests the null hypothesis that the error variance of the dependent variable is equal across groups.

a. Design: Intercept + Group + FLSApre + Group * FLSApre

 

As seen in Table 10, variances in the dependent and covariate variables were equal; hence, the assumption of homogeneity of variances was met (F=.245, p=.62>.05). Table 11 presents the results of ANCOVA.

 

Table 11

Results of ANCOVA for the FLSA Pretest and Posttest Scores

Dependent Variable:   FLSA Post 

Source

Type III Sum of Squares

df

Mean Square

F

Sig.

Partial Eta Squared

Corrected Model

2663.70a

2

1331.85

117.72

.00

.80

Intercept

78.84

1

78.84

6.96

.01

.10

FLSApre

1878.88

1

1878.88

166.07

.00

.74

Group

693.84

1

693.84

61.32

.00

.51

Error

644.87

57

11.31

 

 

 

Total

127069.00

60

 

 

 

 

Corrected Total

3308.58

59

 

 

 

 

a. R Squared = .805 (Adjusted R Squared = .798)

 

As noticed in Table 11, the sig value corresponding to the groups is smaller than the critical value (p= .000<.001), indicating that there was a significant difference between the performances of the two groups in terms of FLSA scores. The partial eta squared turned out to be .74, which is an indication of a large effect size (Cohen, 1988). Table 12 shows the estimated marginal means for the two groups.

 

Table 12

Estimated Marginal Means for FLSA Scores

Dependent Variable:   FLSA Post 

Group

Mean

Std. Error

95% Confidence Interval

Lower Bound

Upper Bound

Experimental

41.98a

.60

40.78

43.18

Control

48.78a

.60

47.58

49.98

a. Covariates appearing in the model are evaluated at the following values: FLSA Pre = 55.7833.

 

Table 13 shows the pairwise comparison between the two groups FLSA scores.

 

Table 13

Pairwise Comparison Between the FLSA Groups’ Scores

Dependent Variable:   FLSA Post 

(I) Group

(J) Group

Mean Difference (I-J)

Std. Error

Sig.b

95% Confidence Interval for Differenceb

Lower Bound

Upper Bound

Experimental

Control

-6.80*

.86

.00

-8.54

-5.06

Control

Experimental

6.80*

.86

.00

5.06

8.54

Based on estimated marginal means

*. The mean difference is significant at the .05 level.

b. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).

 

As shown in Table 13, the significant value equals .00, which is lower than 0.001. Furthermore, as presented in Table 12, the estimated marginal means for the experimental and control groups were 41.98 and 48.78, which is an indication of a significant reduction of FLSA for the experimental group in comparison with the control group. Therefore, it can be inferred that student chatbot voice-based speaking activities have significantly reduced Iranian EFL learners’ FLSA level.

 

The findings of the study showed that the oral fluency and FLSA levels of Iranian EFL learners in the experimental group differed significantly from those of the control group on the post-test procedures. In comparison with the pre-speaking test, the results showed that EFL learners’ oral fluency enhanced concerning increased speech rate and reduced number of pauses in the second administration of the speaking test. Meanwhile, FLSA levels of the Iranian EFL learners also significantly reduced according to FLSA questionnaire responses before and after the treatment in the experimental group. The reduction might be ascribed to the robot-human interactions for speaking practices using the Replica application in the class, enabling the experimental group's Iranian EFL learners to acquire strategic and interactive knowledge to strengthen their oral fluency and feel less anxious while speaking the target language. In other words, a voice-based chatbot makes Iranian EFL students more confident in fluently using the target language as it allows them to negotiate with robots, receive feedback, and holistically process form, meaning, and function. This study's findings are supported by the fact that intelligent chatterbots give students motivation to have successful interactions (Lee, et al., 2011); participate in meaningful conversations (Chang et al., 2010); and, consequently, improve their oral output (Holland et al., 1999).

The findings also reflect Tzu-Yu and Chen’s (2022),  who hypothesized that participation in Robot-Assisted speaking activities could enrich the EFL learner's exposure to an authentic environment in which they receive feedback and increase their collaboration. Similarly, the results are consistent with those of Goda et al., (2014), who trained EFL learners to use the ELIZA chatterbot for speaking activities, allowing them to enhance their speaking skills. Interestingly, this study adds new evidence to suggest that a voice-based chatterbot can significantly enhance the oral fluency of the target language in the context of EFL instruction where there is a lack of enough opportunities for speaking practices. A possible reason for the significant improvement in the focus group's speaking fluency might be that voiced-based Robot-Assisted interaction provide substantial opportunities to negotiate for meaning instead of emphasizing language forms and structures, which may hinder producing fluent spoken words. More importantly, when a teacher incorporates authentic negotiation into a well-designed speaking course, EFL students are able to apply what they have learned in class to their spoken language. Another possible explanation may be that conversational follow with machines is not always possible in student-student interactions. Hence, EFL learners have a greater chance of negotiating with a robot than their peers with equivalent levels of English proficiency.

Similarly, based on the responses to the FLSA questionnaire before and after the study, it can be inferred that the experimental group of Iranian EFL learners had much lower FLSA levels than the control group. In other words, the robot-human interactions on the Replica app with the teacher’s supervision in the class encourage Iranian EFL students to improve their oral fluency and feel less anxious while speaking the target language. This reduction in FLSA could be attributed to the audio-based speaking activities on the Replica, which provides a comfortable environment for EFL learners because they feel free to make errors when speaking with a robot. In addition, speaking activities on the Replica allowed students to shift their focus from structures to communicating messages and meanings, resulting in a lower rate of anxiety when speaking in the target language.

This finding, however, contradicts that of El Shazly (2021), who concluded that while synchronous robot-mediated communication using the Mondly application increased FL learners' speaking, speech-related anxieties did not decrease among participants. This disagreement may stem from the fact that the participants were provided with vocabulary and structures prior to interacting with the chatterbots. In contrast, after gaining insight into their weaknesses, they could have practiced with the robot and been provided with video and audio files to scaffold their spoken words and become more confident in their ability to continue the conversation. Results from FLSA questionnaires are also inconsistent with Çakmak (2022), who referred to robot-human interactions as an unreliable way to lessen the FLSA of the EFL students. Nevertheless, this inconsistency may have been rooted in a lack of precisely defined tasks and pedagogical and instructional objectives in their intervention, as the EFL students engaged in Robot-mediated interactions outside of the classroom, and the instructor did not actively monitor all of these collaborative speaking activities.

 

Conclusion

Drawing on chatbot-supported language learning and applying a descriptive approach, the study’s results suggested that voice-based human-robot interactions on the Replica application in the classroom as a part of speaking activities improved the EFL learners’ oral fluency and decreased their anxiety levels when speaking in the target language. This may be due to the fact that engaging in meaningful conversation with the robot, which provides a conversational flow, enables EFL students to conquer their oral fluency obstacles in a stress-free environment. This application could be viewed as an effective supplementary educational resource for enhancing oral fluency and reducing FLSA in the context of EFL instruction when EFL learners are unable to find a suitable speaking partner.

In the context of EFL instruction, the pedagogical implications of this study suggest that EFL teachers should incorporate their teaching practices with advanced technology, particularly the Replica application, to improve the EFL learners’ speaking fluency and reduce their levels of FLSA. Due to the absence of opportunities to practice this skill in the context of EFL instruction, it seems indisputable that EFL students need to participate in Robot-Assisted communications in the classroom to develop the foundational components of their speaking skills. Nonetheless, there may be two prerequisites for achieving this objective. First, this interaction should align with the coursebook activities and be monitored by the teacher to ensure that the speaking practices of the EFL students are on the correct track. Moreover, after human-robot interactions, EFL learners should be led to various speaking tasks to apply what they have learned in their Robot-Assisted meaningful interactions.

It is also recommended that EFL instructors implement Robot-Assisted pre-discussion speaking exercises into their lessons to reduce FLSA among EFL students. As a result, as the EFL students have already made errors while practicing with the chatterbot, they are more likely to be self-assured during class discussion time, leading to increased participation in class activities. The greater confidence and fluency, the better the feedback they will receive, resulting in greater engagement and motivation in their language learning process.

This study had some limitations that may affect the generalizability of the results. They included a limited sample size of sixty EFL students and a relatively short treatment time that lasted for eight weeks. Further research is recommended to use more prolonged treatment with a larger sample size focusing on other psychological variables of EFL learners, such as motivation and willingness to communicate (WTC) over Clubhouse or similar applications. Furthermore, since it may be assumed that chatbot-human interaction will increase speaking fluency at the expense of accuracy, replicating this study by including another underlying component of speaking skills, like accuracy, is highly recommended to provide more comprehensive results.

References

Alrayah, H. (2018). The effectiveness of cooperative learning activities in enhancing EFL learners’ fluency. English Language Teaching, 11(4), 21–31. doi:10.5539/elt.v11n4p21
Azlan, N., Zakaria, S., & Melor , M. (2019). Integrative task-based learning: Developing speaking skill and increase motivation via Instagram. International Journal of Academic Research in Business and Social Sciences, 9(1), 620-636. doi:10.6007/IJARBSS/v9-i1/5463
Brand, C., & Götz, S. (2011). Fluency versus accuracy in advanced spoken learner language: A multi-method approach. International Journal of Corpus Linguistics, 16, 255-275. doi:10.1075/ijcl.16.2.05bra
ÇAKMAK, F. (2022). Chatbot-Human Interaction and Its Effects on EFL Students' L2 Speaking Performance and Anxiety. Novitas-ROYAL, 113–131. Retrieved from https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Chatbot-Human+Interaction+and+Its+Effects+on+EFL+Students%E2%80%99+L2+Speaking+Performance+and+Anxiety&btnG=
Chang, C., Lee, J., Chao, P., Wang, C., & Chen, G. (2010). Exploring the possibility of using humanoid robots as instructional tools for teaching a second language in primary school. Journal of Educational Technology & Society, 13(2), 13-24. http://www.jstor.org/stable/jeductechsoci.13.2.13
Chen, M.-R., & Hwang, G.-J. (2020). Effects of a concept mapping-based flipped learning approach on EFL students’ English speaking performance, critical thinking awareness and speaking anxiety. British Journal of Educational Technology, 51(3), 817–834. doi:10.1111/bjet.12887
Chiu, T., Liou, H., & Yeh, Y. (2007). A study of web-based oral activities enhanced by automatic speech recognition for EFL college learning. Computer Assisted Language Learning, 20(3), 209-234. doi:10.1080/09588220701489374
Coniam, D. (2008). Evaluating the language resources of chatbots for their potential in English as a second language. ReCALL, 20(1), 98-116. doi:10.1017/S0958344008000815
Cohen, J.W. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Lawrence Erlbaum Associates.
Creswell , J. (2014). Education Research. Planning, Conducting and Evaluating Quantitative and Qualitative Research (4th ed ed.). Boston: Pearson.
El Shazly, R. (2021). Effects of artificial intelligence on English speaking anxiety and speaking performance: A case study. Expert Systems, 38(3), 1-15. doi:10.1111/exsy.12667
Fryer, L., Ainley, M., Thompson, A., Gibson, A., & Sherlock, Z. (2017). Stimulating and sustaining interest in a language course: An experimental comparison of Chatbot and Human task partners. Computers in Human Behavior, 75, 461–468. doi:10.1016/j.chb.2017.05.045
Fryer, L., Nakao, K., & Thompson, A. (2019). Chatbot learning partners: Connecting learning experiences, interest and competence. Computers in Human Behavior, 93, 279-289. doi:10.1016/j.chb.2018.12.023
Goda, Y., Yamada, M., Matsukawa, H., Hata, K., & Yasunami, S. (2014). Conversation with a chatbot before an online EFL group discussion and the effects on critical thinking. The Journal of Information and Systems in Education, 13(1), 1-7. https://doi.org/10.12937/ejsise.13.1
Hanafiah, W., Aswad, M., Sahib, H., Yassi, A., & Mousavi, M. (2022). The Impact of CALL on vocabulary learning, speaking skill, and foreign language speaking anxiety: the case study of Indonesian EFL learners. Education Research International , 2022, 13. doi:10.1155/2022/5500077
Hanzawa, K. (2021). Development of second language speech fluency in foreign language classrooms: A longitudinal study. Language Teaching Research (SAGE), 27. doi:10.1177/13621688211008693
Haristiani, N. (2019). Artificial Intelligence (AI) chatbot as language learning medium: An inquiry. In Journal of Physics: Conference Series, 1387(1), 2020.
Holland, M., Kaplan, J., & Sabol, M. (1999). Preliminary tests of language learning in a speech-interactive graphics microworld. CALICO Journal, 16(3), 339-359. Retrieved from https://www.jstor.org/stable/24147847
Horwitz, E., Horwitz, M., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70(2), 125-132. doi:10.2307/327317
House, B., Malkin, J., & Bilmes, J. (2009). The VoiceBot: A voice controlled robot arm. roceedings of the SIGCHI Conference on Human Factors (pp. 183-192). Boston, MA: ACM: Computing Systems. doi:10.1145/1518701.1518731
Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30, 461-473. doi:10.1093/applin/amp048
Huang, W., Hew, K., & Fryer, L. (2021). Chatbots for language learning—Are they really useful? A systematic review of chatbot-supported language learning. Journal of Computer Assisted Learning, 38(1), 237-257. doi:10.1111/jcal.12610
Jeon, J. (2022). Exploring AI chatbot affordances in the EFL classroom: young learners’ experiences and perspectives. Computer Assisted Language Learning, 1-26. doi:10.1080/09588221.2021.2021241
Karatas, H., Alci, B., Bademcioglu, M., & Ergin, A. (2016). An Investigation into University Students's Foreign Language Speaking Anxiety. Procedia-Social and Behavioral Sciences, 232, 382-388. doi:10.1016/j.sbspro.2016.10.053
Kim, N. (2017). Effects of types of voice-based chat on EFL students’ negotiation of meaning according to proficiency levels. English Teaching, 72(1), 159-181. doi:10.15858/engtea.72.1.201703.159
Kim, N.-Y., Cha, Y., & Kim, H.-S. (2019). Future English learning: Chatbots and artificial intelligence. Multimedia-Assisted Language Learning, 22(3), 32-53.
Lee, S., Noh, H., Lee, J., Lee, K., Lee, G., Sagong, S., & Kim, M. (2011). On the effectiveness of robot-assisted language learning. ReCALL, 23(1), 25-58. doi:10.1017/S0958344010000273
Lu, C., Chiou, G., Day, M., Ong, C., & Hsu, W. (2006). Using instant messaging to provide an intelligent learning environment. ntelligent Tutoring Systems: 8th International Conference, ITS 2006 (pp. 575-583). Berlin Heidelberg: Springer.
MacIntyre, P. (1999). Language Anxiety: A Review of Literature for Language Teachers. In D. Young, Affect in foreign language and second language teaching: A practical guide to creating a low-anxiety classroom atmosphere (pp. 24-45). Boston: McGraw-Hill: McGraw-Hill Companies.
Pallant, J. S. (2010). SPSS survival manual: A step by step guide to data analysis using SPSS for windows. Allen & Unwin, Crows Nest.
Richards, J., & Trew, G. (2019). Tactics for Listening: expanding (3rd ed ed.). Oxford: Oxford university press.
Richards, J., Hull, J., & Proctor, S. (2019). nterchange: Student's book (5th ed ed.). Cambridge: Cambridge University Press.
Tabachnick, B.G. & Fidell, L.S. (2007). Using multivariate statistics. Pearson Education.
Tai, T.-Y., & Chen, H.-J. (2022). The impact of intelligent personal assistants on adolescent EFL learners’ speaking proficiency. Computer Assisted Language Learning, 1-28. doi:10.1080/09588221.2022.2070219
Tian, J., & Wang, Y. (2010). Taking language learning outside the classroom: learners perspectives of eTandem learning via Skype. Innovation in Language Learning and Teaching, 4(3), 181-197. doi:10.1080/17501229.2010.513443
Tzu-Yu, T., & Chen, H. (2022). The impact of intelligent personal assistants on adolescent EFL learners’ speaking proficiency. Computer Assisted Language Learning, 1-28. doi:10.1080/09588221.2022.2070219
Wang, Y., Petrina, S., & Feng, F. (2017). VILLAGE—Virtual immersive language and presence. British Journal of Educational Technology, 48(2), 431–450. doi:10.1111/bjet.12388
Weizenbaum, J. (1996). ELIZA-A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45. doi:10.1145/365153.365168
Winkler, R., & Soellner, M. (2018). Unleashing the potential of chatbots in education: A state-of-the-art analysis. In Academy of Management Annual Meeting Proceedings. Academy of Management.
Xu, Y., Wang, D., Collins, P., Lee, H., & Warschauer, M. (2021). Same benefits, different communication patterns: Comparing children's reading with a conversational agent vs. a human partner. Computers & Education, 161, 40-59. doi:10.1016/j.compedu.2020.104059
Yang, H., Kim, H., Lee, J., & Shin, D. (2022). Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL FirstView, 34(3), 1-17. doi:10.1017/S0958344022000039
Yen-Chen, Y., Huei-Tse, H., & Chang, K. (2015). Applying role-playing strategy to enhance learners’ writing and speaking skills in EFL courses using Facebook and Skype as learning tools: a case study in Taiwan. Computer Assisted Language Learning, 28(15), 383-406. doi:10.1080/09588221.2013.839568
Hafner, C. (2015). Remix culture and English language teaching: The expression of learner voice in digital multimodal compositions. TESOL Quarterly, 49(3), 486–509. https://doi:10.1002/tesq.238
Hafner, C. (2015). Remix culture and English language teaching: The expression of learner voice in digital multimodal compositions. TESOL Quarterly, 49(3), 486–509. https://doi:10.1002/tesq.238
Hafner, C. (2015). Remix culture and English language teaching: The expression of learner voice in digital multimodal compositions. TESOL Quarterly, 49(3), 486–509. https://doi:10.1002/tesq.238