๐ GPT-2 Pseudo-Code to Python Code Generator
Transform natural language descriptions into executable Python code using fine-tuned GPT-2!
This model is trained on the SPOC (Search-based Pseudo-code to Code) dataset and can generate Python code from pseudo-code descriptions.
๏ฟฝ Model Status
โ๏ธ Enter Pseudo-Code
๐ Load Example
โ๏ธ Generation Parameters
50 500
0.1 1.5
10 100
0.5 1
1 5
๐ป Generated Python Code
๐ How to Use
1๏ธโฃ Load Your Model
- Upload the
best_model.pklfile (trained GPT-2 model) - Click "Load Model" and wait for confirmation
- You'll see model configuration and training metrics
2๏ธโฃ Generate Code
- Quick Start: Select an example from the dropdown
- Custom Input: Type your own pseudo-code description
- Optional: Add reference code to calculate BLEU scores
- Adjust generation parameters for different outputs
- Click "Generate Code"
3๏ธโฃ Understand the Metrics
๐ฏ BLEU Score (Bilingual Evaluation Understudy)
- Measures similarity between generated and reference code
- BLEU-1: Word-level similarity (unigrams)
- BLEU-2: 2-word phrase similarity (bigrams)
- BLEU-3: 3-word phrase similarity (trigrams)
- BLEU-4: 4-word phrase similarity (most comprehensive)
Score Interpretation:
- ๐ข > 0.4: Excellent match - Generated code is very similar to reference
- ๐ก 0.3-0.4: Good match - Code captures most key elements
- ๐ 0.2-0.3: Fair match - Some similarity exists
- ๐ด < 0.2: Poor match - Significant differences
๐ Additional Metrics
- Precision: How many generated words appear in reference
- Recall: How many reference words appear in generated code
- F1-Score: Harmonic mean of precision and recall
- Length Ratio: Generated vs reference code length
- Character Overlap: Character-level similarity
๐๏ธ Generation Parameters
| Parameter | Low Value | High Value | Use Case |
|---|---|---|---|
| Temperature | 0.1-0.3 | 0.8-1.2 | Low: Deterministic, focused High: Creative, diverse |
| Top-K | 10-30 | 60-100 | Low: Conservative choices High: More variety |
| Top-P | 0.5-0.8 | 0.9-1.0 | Low: Safe predictions High: Exploratory |
| Max Length | 50-100 | 200-500 | Short: Simple code Long: Complex implementations |
๐ก Example Pseudo-Code Prompts
Basic Operations
create a list of numbers from 1 to 10
define a function to calculate the sum of two numbers
iterate through a list and print each element
Conditionals & Logic
check if a number is even or odd
find the maximum of three numbers
validate if a string is empty
Data Structures
sort a list in descending order
remove duplicates from a list
merge two dictionaries
Algorithms
implement binary search algorithm
create a recursive function to calculate factorial
generate fibonacci sequence up to n terms
check if a string is palindrome
Advanced
create a class to represent a student with name and grades
implement a function to read CSV file and return dataframe
create a decorator to measure function execution time
๐ About the Model
This model is fine-tuned on the SPOC (Search-based Pseudo-code to Code) dataset:
- ๐ Paper: SPOC: Search-based Pseudo-code to Code
- ๐๏ธ Source: Stanford University
- ๐ค Base Model: GPT-2 (Decoder-Only Transformer)
- ๐ Training: 10,000+ pseudo-code to code pairs
- ๐ฏ Task: Causal Language Modeling
โ ๏ธ Limitations
- Model may not handle very complex algorithms perfectly
- Generated code should be tested before production use
- Best results with clear, specific pseudo-code descriptions
- Model trained on C++ code, adapted for Python generation
๐ค Tips for Best Results
- โ Be Specific: "create a function to sort list in ascending order" vs "sort list"
- โ Use Action Words: "create", "define", "implement", "calculate"
- โ Mention Data Types: "list", "string", "dictionary", "integer"
- โ Include Details: "recursive function" vs just "function"
- โ Try Variations: Generate multiple times with different temperatures
๐ Generation History
๐ Features
- โ Upload and use custom trained models
- โ BLEU score calculation for quality assessment
- โ Multiple evaluation metrics (Precision, Recall, F1)
- โ Generate multiple code variations
- โ Real-time performance tracking
- โ Example prompts library
- โ Generation history
๐ Citation
If you use this model, please cite:
@article{kulal2019spoc,
title={SPOC: Search-based Pseudo-code to Code},
author={Kulal, Sumith and Pasupat, Panupong and Chandra, Kartik and Lee, Mina and Padon, Oded and Aiken, Alex and Liang, Percy},
journal={arXiv preprint arXiv:1906.04908},
year={2019}
}
Built with โค๏ธ using HuggingFace Transformers & Gradio