Affiliation:
1. School of Computing and Engineering, University of Gloucestershire, The Park, Cheltenham GL50 2RH, UK
Abstract
Annotation tools are an essential component in the creation of datasets for machine learning purposes. Annotation tools have evolved greatly since the turn of the century, and now commonly include collaborative features to divide labor efficiently, as well as automation employed to amplify human efforts. Recent developments in machine learning models, such as Transformers, allow for training upon very large and sophisticated multimodal datasets and enable generalization across domains of knowledge. These models also herald an increasing emphasis on prompt engineering to provide qualitative fine-tuning upon the model itself, adding a novel emerging layer of direct machine learning annotation. These capabilities enable machine intelligence to recognize, predict, and emulate human behavior with much greater accuracy and nuance, a noted shortfall of which have contributed to algorithmic injustice in previous techniques. However, the scale and complexity of training data required for multimodal models presents engineering challenges. Best practices for conducting annotation for large multimodal models in the most safe and ethical, yet efficient, manner have not been established. This paper presents a systematic literature review of crowd and machine learning augmented behavioral annotation methods to distill practices that may have value in multimodal implementations, cross-correlated across disciplines. Research questions were defined to provide an overview of the evolution of augmented behavioral annotation tools in the past, in relation to the present state of the art. (Contains five figures and four tables).
Subject
Industrial and Manufacturing Engineering
Reference434 articles.
1. Athey, S. (2019). Economics of Artificial Intelligence, University of Chicago Press.
2. ITUTrends (2018). Assessing the Economic Impact of Artificial Intelligence, ITUTrends.
3. Ipsos MORI (2017). Public Views of Machine Learning, Ipsos MORI.
4. The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications;Magudia;J. Digit. Imaging,2021
5. Data reuse and the open data citation advantage;Piwowar;PeerJ,2013
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献