Benchmarking Open Source and Paid Services for Speech to Text: An Analysis of Quality and Input Variety
- 1Department of Electrical Engineering and Information Technology, Polytechnic and Basic Sciences School, University of Naples Federico II, Italy
Speech to text (STT) technology has become increasingly popular in recent years due to the growing demand for automated transcription of spoken language. With the availability of both open source and paid services for STT, it is important to evaluate their performance and quality in order to select the most appropriate tool for a given task.In this paper, we present a benchmarking study of open source and paid STT services, with a focus on evaluating their performance in relation to the variety of input text. We consider six datasets collected from different sources, including interviews, lectures, and speeches, and used them as input for the STT tools.We evaluated the instruments using one of the standard metrics for evaluating STTs: the Word Error Rate (WER). Our analysis of the results revealed that the performance of the STT tools varied significantly depending on the input text, with some tools performing better on certain types of audio samples than others.Our study provides insights into the performance of STT tools when processing large amounts of data, as well as the challenges and opportunities presented by the multimedia nature of the data. We found that paid services generally outperformed open source alternatives in terms of accuracy and speed, but their performance varied depending on the input text.
Keywords: ASR, Speech to Text, speech recognition, Benchmark, Multimedia
Received: 22 Apr 2023;
Accepted: 19 Aug 2023.
Copyright: © 2023 Ferraro, Galli, Valerio and Postiglione. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Mx. Antonino Ferraro, Department of Electrical Engineering and Information Technology, Polytechnic and Basic Sciences School, University of Naples Federico II, Naples, Italy