Welcome to SPNLP 2023

International Conference on
Speech and NLP (SPNLP 2023)

May 13-14, 2023, Virtual Conference



Accepted Papers

Improved Speech Enhancement by Using Both Clean Speech and ‘clean’ Noise

Jianqiao Cui and Stefan Bleeck, Institute of Sound and Vibration Research, University of Southampton, Southampton, UK

ABSTRACT

Generally, speech enhancement (SE) models based on supervised deep learning technology, use input features from both noisy and clean speech but not from the noise itself. We suggest here that this ‘clean’ background noise, before mixing it with speech, can also help SE and that is to our knowledge not described yet. In our proposed model, not only the speech, but also the noise is enhanced initially and later combined for improved intelligibility and quality. We also present a second innovation to capture better contextual information that traditional networks are often poor in. To leverage both speech and background noise information and long-term context information, this paper describes a sequence-to-sequence (S2S) mapping structure using a novel two-path speech enhancement system, consisting of two parallel paths: a Noise Enhancement Path (NEP) and a Speech Enhancement Path (SEP). In the NEP, the encoder-decoder structure is used for enhancing only the ’clean’ noise, while the SEP is used to suppress the background noise in the clean speech. In the SEP, a Hierarchical Attention (HA) mechanism is adopted to leverage long-range sequence capture. In the NEP, we us traditional gated controlled mechanism from ConvTasnet [2] but improve it by adding dilated convolution to increase receptive fields. Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of various measures, including ESTOI and PESQ scores. We conclude that the simple speech plus noise paradigm that is often adopted for training such models is not optimal.

KEYWORDS

Supervise speech enhancement, separate paths, hierarchical attention mechanism, gated control, magnitude.



Gpt-4: a Review on Advancements and Opportunities in Natural Language Processing

Jawid Ahmad Baktash, Mursal Dawodi, and ChatGPT

ABSTRACT

Generative Pre-trained Transformer 4 (GPT-4) is the fourth-generation language model in the GPT series, developed by OpenAI, which promises significant advancements in the field of natural language processing (NLP). In this research article, we have discussed the features of GPT-4, its potential applications, and the challenges that it might face. We have also compared GPT-4 with its predecessor, GPT-3. GPT-4 has a larger model size (more than one trillion), better multilingual capabilities, improved contextual understanding, and reasoning capabilities than GPT-3. Some of the potential applications of GPT-4 include chatbots, personal assistants, language translation, text summarization, and question-answering. However, GPT-4 poses several challenges and limitations such as computational requirements, data requirements, and ethical concerns.

KEYWORDS

Large language models, Unsupervised learning, GPT-4, GPT -3 .