In recent years, research on collaborative interaction has relied on manual coding of rich audio/video recordings. The fine-grained analysis of such material is extremely time-consuming and labor-intensive. This is not only difficult to scale, but, as a result, might also limit the quality and completeness of coding due to fatigue, inherent human biases, (accidental or intentional), and inter-rater inconsistencies. In this paper, we explore how recent advances in machine learning may reduce manual effort and loss of information while retaining the value of human intelligence in the coding process. We present ACACIA (AI Chain for Augmented Collaborative Interaction Analysis), an AI video data analysis application which combines a range of advances in machine perception of video material for the analysis of collaborative interaction. We evaluate ACACIA's abilities, show how far we can already get, and which challenges remain. Our contribution lies in establishing a combined machine and human analysis pipeline that may be generalized to different collaborative settings and guide future research.
Thomas Neumayr University of Applied Sciences Upper Austria; JKU Linz, Mirjam Augstein University of Applied Sciences Upper Austria, Johannes Schönböck University of Applied Sciences Upper Austria, Sean Rintel Microsoft Research, Cambridge, Helmut Leeb University of Applied Sciences Upper Austria, Thomas Teichmeister University of Applied Sciences Upper Austria
Qin Wu The University of Auckland, Rao Xu Chengdu University of Information Technology, Yuantong Liu Chengdu University of Information Technology, Danielle Lottridge University of Auckland, Suranga Nanayakkara Auckland Bioengineering Institute, The University of Auckland