🚀 GPT-2 Pseudo-Code to Python Code Generator

Transform natural language descriptions into executable Python code using fine-tuned GPT-2!

This model is trained on the SPOC (Search-based Pseudo-code to Code) dataset and can generate Python code from pseudo-code descriptions.

� Model Status

Model Information

✍️ Enter Pseudo-Code

📚 Load Example

Pseudo-Code Description

Reference Code (Optional - for BLEU score calculation)

⚙️ Generation Parameters

Max Length

Maximum tokens to generate

50 500

Temperature

Higher = more creative

0.1 1.5

Top-K

Vocabulary filtering

10 100

Top-P

Nucleus sampling

0.5 1

Number of Variations

Generate multiple versions

1 5

💻 Generated Python Code

Generated Code

📊 Performance Metrics

🎯 BLEU Scores

📚 How to Use

1️⃣ Load Your Model

Upload the best_model.pkl file (trained GPT-2 model)
Click "Load Model" and wait for confirmation
You'll see model configuration and training metrics

2️⃣ Generate Code

Quick Start: Select an example from the dropdown
Custom Input: Type your own pseudo-code description
Optional: Add reference code to calculate BLEU scores
Adjust generation parameters for different outputs
Click "Generate Code"

3️⃣ Understand the Metrics

🎯 BLEU Score (Bilingual Evaluation Understudy)

Measures similarity between generated and reference code
BLEU-1: Word-level similarity (unigrams)
BLEU-2: 2-word phrase similarity (bigrams)
BLEU-3: 3-word phrase similarity (trigrams)
BLEU-4: 4-word phrase similarity (most comprehensive)

Score Interpretation:

🟢 > 0.4: Excellent match - Generated code is very similar to reference
🟡 0.3-0.4: Good match - Code captures most key elements
🟠 0.2-0.3: Fair match - Some similarity exists
🔴 < 0.2: Poor match - Significant differences

📈 Additional Metrics

Precision: How many generated words appear in reference
Recall: How many reference words appear in generated code
F1-Score: Harmonic mean of precision and recall
Length Ratio: Generated vs reference code length
Character Overlap: Character-level similarity

🎛️ Generation Parameters

Parameter	Low Value	High Value	Use Case
Temperature	0.1-0.3	0.8-1.2	Low: Deterministic, focused High: Creative, diverse
Top-K	10-30	60-100	Low: Conservative choices High: More variety
Top-P	0.5-0.8	0.9-1.0	Low: Safe predictions High: Exploratory
Max Length	50-100	200-500	Short: Simple code Long: Complex implementations

💡 Example Pseudo-Code Prompts

Basic Operations

create a list of numbers from 1 to 10
define a function to calculate the sum of two numbers
iterate through a list and print each element

Conditionals & Logic

check if a number is even or odd
find the maximum of three numbers
validate if a string is empty

Data Structures

sort a list in descending order
remove duplicates from a list
merge two dictionaries

Algorithms

implement binary search algorithm
create a recursive function to calculate factorial
generate fibonacci sequence up to n terms
check if a string is palindrome

Advanced

create a class to represent a student with name and grades
implement a function to read CSV file and return dataframe
create a decorator to measure function execution time

🎓 About the Model

This model is fine-tuned on the SPOC (Search-based Pseudo-code to Code) dataset:

📄 Paper: SPOC: Search-based Pseudo-code to Code
🏛️ Source: Stanford University
🤖 Base Model: GPT-2 (Decoder-Only Transformer)
📊 Training: 10,000+ pseudo-code to code pairs
🎯 Task: Causal Language Modeling

⚠️ Limitations

Model may not handle very complex algorithms perfectly
Generated code should be tested before production use
Best results with clear, specific pseudo-code descriptions
Model trained on C++ code, adapted for Python generation

🤝 Tips for Best Results

✅ Be Specific: "create a function to sort list in ascending order" vs "sort list"
✅ Use Action Words: "create", "define", "implement", "calculate"
✅ Mention Data Types: "list", "string", "dictionary", "integer"
✅ Include Details: "recursive function" vs just "function"
✅ Try Variations: Generate multiple times with different temperatures

🌟 Features

✅ Upload and use custom trained models
✅ BLEU score calculation for quality assessment
✅ Multiple evaluation metrics (Precision, Recall, F1)
✅ Generate multiple code variations
✅ Real-time performance tracking
✅ Example prompts library
✅ Generation history

📝 Citation

If you use this model, please cite:

@article{kulal2019spoc,
  title={SPOC: Search-based Pseudo-code to Code},
  author={Kulal, Sumith and Pasupat, Panupong and Chandra, Kartik and Lee, Mina and Padon, Oded and Aiken, Alex and Liang, Percy},
  journal={arXiv preprint arXiv:1906.04908},
  year={2019}
}

Built with ❤️ using HuggingFace Transformers & Gradio