AI Generated Text Detection
AI-Human Text differentiation system fusioning feature engineering and transformer based NLP techniques
keywords: Deep Learning, NLP, AI Text Detection, Classification, Feature Engineering, Machine Learning, Deberta, Bert
Brief Description
With the rise of AI revolution, In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. Modern LLM are so powerful that students could use LLMs to generate essays that are not their own, missing crucial learning keystones, which also bring significant changes in education system.
In this project I developed a Deep Learning based model that can accurately detect whether an essay was written by a student or an LLM which may help the evaluator to take proper action.
Dataset
For the task we needed two type of data
- Human written text data:
- persuade corpus 2.0
- This dataset comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 topics.
- AI generated data
- For AI generated Data we used different available LLM models (
Chat-GPT-3.5,LLAMA-2,Mistral,Gemini) for the same topics as the human written text.
- For AI generated Data we used different available LLM models (
Modeling Approach
- The task is a Binary classification task.
- We used two type of modeling approach for the task.
- Feature Based ML Model
- Deep Learning Based Model
ML Modeling:
For conventional ML model we extracted different features from the dataset. We extracted feature on different level for the model.
- Paragraph level features
- Sentence level features
- Word level features
DL Modeling:
For our task we leverage different transformer base models
- Bert-base-cased
- Bert-small
- Deverta-V3-small
Simple illustration of the project