International journal of imaging science and engineering

2 Years Impact Factor: 2.85

5 Years Impact Factor: 1.53

Abstract

Speech Emotion Recognition Using LSTM Networks and MFCC Features

Author: 34-40

Abstract:

Understanding human emotions from speech signals has applications in virtual assistants, mental health analysis, and human-computer interaction. This paper presents a Long Short-Term Memory (LSTM) network-based approach for speech emotion recognition using Mel-frequency cepstral coefficients (MFCC) as audio features. We preprocess audio recordings from the RAVDESS and EMO-DB datasets by extracting 13-dimensional MFCC vectors, energy coefficients, and delta features. LSTM models, capable of modeling temporal dependencies, are trained to classify utterances into emotion categories: happiness, sadness, anger, fear, disgust, and neutral. We compare our LSTM model with traditional classifiers like SVM and random forests, observing a 7–10% improvement in accuracy across datasets. On RAVDESS, our best model achieves 81.4% accuracy, outperforming CNN and GRU-based baselines. We conduct ablation studies on input window size and recurrent layer depth to analyze their influence on perfo

Download PDF

International Journal Of Imaging

Science And Engineering

Abstract

Speech Emotion Recognition Using LSTM Networks and MFCC Features

Contents are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Privacy Policy | Terms of Use