Bangla NLP Toolkit

BanglaNLPToolkit is a package for several classic NLP text preprocessing and augmentations for Bangla NLP tasks

keywords: NLP, Deep Learning, PyPi


Brief Description

BanglaNLPToolkit is a package for several classic NLP text preprocessing and augmentations for Bangla NLP tasks.

Key features:

  • Bangla Text Normalization.
    • Bangla text unicode normalization for text preprocessing using bnunicodenormalizer and csebuetnlp/normalizer.
    • Removal of punctuations or replacement of punctuations with desired sign as user desires.
  • Bangla Punctuation
    • Add punctuations to Bangla texts with no punctuations: Uses deep learning based Named Entity Recognition models for accurate punctuation addition.
  • Bangla Text Augmentation
    • Text augmentation techniques for generating similar but different texts for augmenting Bangla dataset.
    • Uses paraphrasing, cross translation and masked word prediction algorithms for augmented text generation.
  • Simple Bangla Tokenizer
    • Robust simple word level and sententence level tokenizer for Bangla texts.

Project Link : BanglaNLPToolkit