IEEE Systems, Man and Cybernetics Magazine - April 2023 - 28

Dataset
As shown in Table 1, Mission Rehearsal [11] and M4 [23]
are the oldest AD datasets. The public release of these
datasets has been discontinued; hence, we did not
review them.
Tsai et al. [24] explored the benefits of using multimodality
over single modality for AD by extending data from
a multiparty dialogue setup by Bohus and Horvitz [25]. The
dataset consists of a scenario that involves groups of two
and three people conversing with the agent through
answering a question forwarded from the agent. The dataset
consists of audio, video, beam forming, the system state,
and automatic speech recognition. A speech activity detector
automatically segmented the beam-formed audio, and
the resulting utterances were annotated with speech,
speaker, and addressee information. The dataset consists
of 2,001 training and 1,952 testing set utterances. The conversation
had a limited number of participants, was
recorded in human-to-robot settings, and did not contain
spatiotemporal annotations.
The AMI corpus [26] was recorded in a confined meeting
room that consists of 100 h of two types of recorded
meetings: task oriented and open discussion. Four participants
took part in the meeting. The AMI corpus is a multimodal
interaction corpus that consists of several
annotations for different meetings, including dialogue acts
(DAs), speaker and listener information, the focus of attention,
adjacency pairs, addressee information, etc. The
Table 1. A comparison with previous works.
Frameworks
Traum et al. [11]
Jovanovic [12]
op den Akker and
Traum [13]
op den Akker and
op den Akker [29]
Baba et al. [30]
Minh et al. [2]
Malik et al. [31]
Malik et al. [32]
Ours
Modalities
Current and previous utterances
and speakers as well as current and
previous DA
Current and previous speakers and
utterances, topic, gaze, etc.
Gaze, current and previous speakers,
utterance, and addressee
Current utterance, previous utterance,
speaker, the topic of discussion, gaze,
and several meta features
Head orientation, acoustic features,
and text as input features
Gaze and utterance
Textual, contextual, and gaze features
Textual, contextual, and gaze features
Facial and audio
Environment
Human to human
Human to human
Human to human
Human to human
Mixed human to human
and human to agent
Human to human
Human to human
Human to human
Mixed human to human
and human to robot
Approach
Rule
based
Bayesian
network
Rule
based
Logistic
model
trees
SVM
CNN and
LSTM
6 ML algorithms
XGBoost
CNN,
CA,
and SA
CNN: convolutional neural network; LSTM: long short-term memory; ML: machine learning; SVM: support vector machine.
AMI [26]
Custom using
Wizard of Oz
GazeFollow
dataset
AMI [26]
AMI [26]
E-MuMMER
92
80.28
62.5
74.26
78.77
71
Dataset
Mission Rehearsal
Exercise [11]
M4 [23]
AMI [26]
Performance
(%)
65-100 and
36 on AMI
81
65
corpus contains more than 117,000 utterances annotated
with DAs, out of which 9,071 utterances have been annotated
with the speaker focus and 8,874 utterances with the
addressee information. However, out of all 9,071 utterances,
only 5,628 utterances contain the three annotations of
speaker focus, addressee, and DA. The scenario is recorded
in a confined meeting in a human-to-human setting, not
spatiotemporally annotated.
The MULTISIMO corpus was proposed by Koutsombogera
and Vogel [27]. It consists of 4 h spanning 23 meetings,
each with an average duration of 10-16 min, in
which only 3 participants are involved. In the meeting
scenario, the facilitator (one participant) asks a question
for which there are three best answers. Then, the other
participants find the answers and rank them. The corpus
has different annotations for every meeting, such as
speech, acoustic, visual, lexical, perceptual, and demographic
information. Among these meetings, two were
annotated with gaze information. Like the AMI corpus,
this dataset is recorded inside a confined environment in
human-to-human settings and does not contain dense
spatiotemporal annotation.
The Vernissage corpus [17] consists of 13 sessions of
an Nao [47] in which the robot interacts with two people
in an office setting, lasting around 11 min. The Wizard of
Oz (WoZ) method was used to manage the dialogue, the
robot's gaze, and nodding. The limitation of this dataset
is the limited variations in situations in which the
28 IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE April 2023

IEEE Systems, Man and Cybernetics Magazine - April 2023

Table of Contents for the Digital Edition of IEEE Systems, Man and Cybernetics Magazine - April 2023

IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 1
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 2
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 4
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 5
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 6
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 7
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 8
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 9
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 10
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 11
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 12
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 13
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 14
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 15
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 16
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 17
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 18
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 19
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 20
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 21
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 22
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 23
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 24
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 25
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 26
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 27
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 28
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 29
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 30
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 31
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 32
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 33
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 34
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 35
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 36
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 37
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 38
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 39
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 40
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 41
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 42
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 43
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 44
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 45
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 46
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 47
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 48
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 49
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 50
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 51
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 52
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 53
IEEE Systems, Man and Cybernetics Magazine - April 2023 - 54
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover3
IEEE Systems, Man and Cybernetics Magazine - April 2023 - Cover4
https://www.nxtbook.com/nxtbooks/ieee/smc_202310
https://www.nxtbook.com/nxtbooks/ieee/smc_202307
https://www.nxtbook.com/nxtbooks/ieee/smc_202304
https://www.nxtbook.com/nxtbooks/ieee/smc_202301
https://www.nxtbook.com/nxtbooks/ieee/smc_202210
https://www.nxtbook.com/nxtbooks/ieee/smc_202207
https://www.nxtbook.com/nxtbooks/ieee/smc_202204
https://www.nxtbook.com/nxtbooks/ieee/smc_202201
https://www.nxtbook.com/nxtbooks/ieee/smc_202110
https://www.nxtbook.com/nxtbooks/ieee/smc_202107
https://www.nxtbook.com/nxtbooks/ieee/smc_202104
https://www.nxtbook.com/nxtbooks/ieee/smc_202101
https://www.nxtbook.com/nxtbooks/ieee/smc_202010
https://www.nxtbook.com/nxtbooks/ieee/smc_202007
https://www.nxtbook.com/nxtbooks/ieee/smc_202004
https://www.nxtbook.com/nxtbooks/ieee/smc_202001
https://www.nxtbook.com/nxtbooks/ieee/smc_201910
https://www.nxtbook.com/nxtbooks/ieee/smc_201907
https://www.nxtbook.com/nxtbooks/ieee/smc_201904
https://www.nxtbook.com/nxtbooks/ieee/smc_201901
https://www.nxtbook.com/nxtbooks/ieee/smc_201810
https://www.nxtbook.com/nxtbooks/ieee/smc_201807
https://www.nxtbook.com/nxtbooks/ieee/smc_201804
https://www.nxtbook.com/nxtbooks/ieee/smc_201801
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1017
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0717
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0417
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0117
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1016
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0716
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0416
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0116
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_1015
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0715
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0415
https://www.nxtbook.com/nxtbooks/ieee/systems_man_cybernetics_0115
https://www.nxtbookmedia.com