Speech Hub

Audio-Centric business management platform

Keywords: Automatic Speech Recognition(ASR), Summarizer, Keyword Extraction, Promotional Content Extraction, LLM


Brief Description

SpeechHub is a sophisticated productivity tool powered by Automatic Speech Recognition (ASR) technology, supporting both Bengali and English. It allows users to upload any audio conversation recorded or live, and transcribes it into a conversational format. Additionally, it provides summarization and detects keywords and speakers.

This application efficiently summarizes business meetings, identifies speakers, and organizes dialogues into a clear conversational format. It also pinpoints important business keywords, making information easy to find.

Key feature of SpeechHub:

  • Incredibly low-latency (approximately 50s for 1 hour of audio) Audio Transcription in Bengali and English languages
  • Speaker Diarization and dialogue-style conversation generation from audio data
  • Summarization of meetings and conversations
  • Mentioned keywords detection and frequency count

Key Technologies used in SpeechHub:

  • whisper based Automatic Speech Recognition (ASR) model for both Bangla and English audio transcription
  • ntegration with PyAnnote based speaker diarization for dialogue style conversation generation.
  • BERT based dialogue summarization pipeline
  • Integration of FastAPI based devOps system and SQL based database system for seamless usage.

Language/Framework: Python 3.9, PyTorch



Simple illustration of the project

SpeechHub

N.B: The code for this project can’t be made public for propritory reasons

Collaborators:

1. A F M Mahfuzul Kabir
2. Sawradip Saha