Speech Hub | Sabbir Hossain Ujjal

Keywords: Automatic Speech Recognition(ASR), Summarizer, Keyword Extraction, Promotional Content Extraction, LLM

Brief Description

SpeechHub is a sophisticated productivity tool powered by Automatic Speech Recognition (ASR) technology, supporting both Bengali and English. It allows users to upload any audio conversation recorded or live, and transcribes it into a conversational format. Additionally, it provides summarization and detects keywords and speakers.

This application efficiently summarizes business meetings, identifies speakers, and organizes dialogues into a clear conversational format. It also pinpoints important business keywords, making information easy to find.

Key feature of SpeechHub:

Incredibly low-latency (approximately 50s for 1 hour of audio) Audio Transcription in Bengali and English languages
Speaker Diarization and dialogue-style conversation generation from audio data
Summarization of meetings and conversations
Mentioned keywords detection and frequency count

Key Technologies used in SpeechHub:

whisper based Automatic Speech Recognition (ASR) model for both Bangla and English audio transcription
ntegration with PyAnnote based speaker diarization for dialogue style conversation generation.
BERT based dialogue summarization pipeline
Integration of FastAPI based devOps system and SQL based database system for seamless usage.

Language/Framework: Python 3.9, PyTorch

Simple illustration of the project

N.B: The code for this project can’t be made public for propritory reasons

Collaborators:

1. A F M Mahfuzul Kabir
2. Sawradip Saha