Luca Engel

Optimizing LLMs for Education: GPT-2 with Fine-Tuning, DPO, RAG & Quantization

Apr 20, 2024

Optimizing LLMs for Education: GPT-2 with Fine-Tuning, DPO, RAG & QuantizationCover image generated with OpenAI's GPT-5 model.

TL;DR

At a glance

Problem

Students often lack immediate personalized help outside class. Traditional tutoring and office hours do not scale to diverse needs. Can a compact, optimized LLM act as a reliable AI tutor, answering scientific multiple-choice questions with accuracy, efficiency, and explainability?

Solution overview

We optimized GPT-2 to answer scientific questions by combining:

  1. Fine-tuning on domain-specific datasets.
  2. Direct Preference Optimization (DPO) to align responses with user preferences.
  3. Retrieval-Augmented Generation (RAG) to incorporate external context.
  4. Quantization (GPTQ) for efficient deployment.

Architecture

The system extends GPT-2 with four modular enhancements:

Data

Method

1. Fine-Tuning

2. Direct Preference Optimization (DPO)

3. Retrieval-Augmented Generation (RAG)

4. Quantization (GPTQ)

Experiments & Results

Benchmarks

Model VariantAccuracy (SciQ MCQA)
Baseline GPT-20.291
Baseline + RAG0.319
Fine-tuned0.227
Fine-tuned + RAG0.334
Fine-tuned + DPO0.319
Fine-tuned + DPO + RAG0.311
Fine-tuned + Quantized0.227
Fine-tuned + Quantized + RAG0.329

Evaluation protocol. Compared accuracy across SciQ test split (n=1000). Postprocessing ensured consistent single-choice outputs.

Speed Tradeoff

The following table summarizes inference speed (tokens/ms) on a Google Colab T4 GPU:

ModelmeanstdΔ vs baseline
GPT-2 Baseline12.411.65
Fine-tuned12.681.77+2%
Fine-tuned + RAG14.531.79+16% slower

Impact

What I learned

Future Work

References