Benchmarking Open Source and Paid Services for Speech to Text: An Analysis of Quality and Input Variety

Ferraro, Antonino; Galli, Antonio; Valerio, La Gatta; Postiglione, Marco

doi:10.3389/fdata.2023.1210559

ORIGINAL RESEARCH article

Front. Big Data
Sec. Data Mining and Management
Volume 6 - 2023 | doi: 10.3389/fdata.2023.1210559

This article is part of the Research Topic

Big Multimedia Data and Applications

View all Articles

Benchmarking Open Source and Paid Services for Speech to Text: An Analysis of Quality and Input Variety

Antonino Ferraro^1*

Antonio Galli¹

La Gatta Valerio¹

Marco Postiglione¹

¹Department of Electrical Engineering and Information Technology, Polytechnic and Basic Sciences School, University of Naples Federico II, Italy

The final, formatted version of the article will be published soon.

You just subscribed to receive the final version of the article

Speech to text (STT) technology has become increasingly popular in recent years due to the growing demand for automated transcription of spoken language. With the availability of both open source and paid services for STT, it is important to evaluate their performance and quality in order to select the most appropriate tool for a given task.In this paper, we present a benchmarking study of open source and paid STT services, with a focus on evaluating their performance in relation to the variety of input text. We consider six datasets collected from different sources, including interviews, lectures, and speeches, and used them as input for the STT tools.We evaluated the instruments using one of the standard metrics for evaluating STTs: the Word Error Rate (WER). Our analysis of the results revealed that the performance of the STT tools varied significantly depending on the input text, with some tools performing better on certain types of audio samples than others.Our study provides insights into the performance of STT tools when processing large amounts of data, as well as the challenges and opportunities presented by the multimedia nature of the data. We found that paid services generally outperformed open source alternatives in terms of accuracy and speed, but their performance varied depending on the input text.

Keywords: ASR, Speech to Text, speech recognition, Benchmark, Multimedia

Received: 22 Apr 2023; Accepted: 19 Aug 2023.

Copyright: © 2023 Ferraro, Galli, Valerio and Postiglione. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Antonino Ferraro, Department of Electrical Engineering and Information Technology, Polytechnic and Basic Sciences School, University of Naples Federico II, Naples, Italy

ORIGINAL RESEARCH article

This article is part of the Research Topic

Benchmarking Open Source and Paid Services for Speech to Text: An Analysis of Quality and Input Variety

People also looked at