1st Demo GPT Based Architecture Model

Model Description

This is a GPT-based transformer language model trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature.

Model Details

  • Model Type: GPT (Generative Pre-trained Transformer)
  • Architecture: Custom transformer-based language model
  • Training Data: Alice's Adventures in Wonderland by Lewis Carroll
  • Language: English
  • Library: PyTorch
  • Model Size: ~4.2M parameters (based on complete_gpt_model.pth)

Training Details

Dataset

  • Source: Alice's Adventures in Wonderland (complete text)
  • Size: 1,033 lines of text
  • Preprocessing: Custom tokenization using character-level or subword tokenization

Training Configuration

  • Epochs: 3 (checkpoint files available for each epoch)
  • Optimizer: Likely AdamW (standard for transformer models)
  • Training Files:
    • checkpoint_epoch_1.pth (12.2MB)
    • checkpoint_epoch_2.pth (12.2MB)
    • checkpoint_epoch_3.pth (12.2MB)
    • best_model.pth (4.14MB) - Best performing checkpoint
    • complete_gpt_model.pth (4.20MB) - Final trained model

Files in this Repository

File Size Description
complete_gpt_model.pth 4.20MB Final trained model weights
best_model.pth 4.14MB Best performing model checkpoint
checkpoint_epoch_1.pth 12.2MB Training checkpoint after epoch 1
checkpoint_epoch_2.pth 12.2MB Training checkpoint after epoch 2
checkpoint_epoch_3.pth 12.2MB Training checkpoint after epoch 3
tokenizer.pkl 37.3KB Custom tokenizer for the model
dataset.txt 51KB Training dataset (Alice in Wonderland)
Notebook1.ipynb 4.1MB Training notebook with implementation

Usage

Loading the Model

import torch
import pickle

# Load the tokenizer
with open('tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)

# Load the model
model = torch.load('complete_gpt_model.pth', map_location='cpu')
model.eval()

Text Generation

def generate_text(model, tokenizer, prompt, max_length=100):
    model.eval()
    with torch.no_grad():
        # Tokenize input
        input_ids = tokenizer.encode(prompt)
        
        # Generate text
        for _ in range(max_length):
            # Your generation logic here
            # This will depend on your specific implementation
            pass
    
    return generated_text

# Example usage
prompt = "Alice was beginning to get very tired"
generated = generate_text(model, tokenizer, prompt)
print(generated)

Model Performance

The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (Notebook1.ipynb).

Expected Outputs

Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with:

  • Victorian-era English vocabulary and sentence structure
  • Whimsical and fantastical content
  • Character references from the original story
  • Descriptive and narrative prose style

Training Process

The training was conducted using:

  1. Data Preprocessing: Text cleaning and tokenization
  2. Model Architecture: Custom GPT implementation
  3. Training Loop: 3 epochs with checkpoint saving
  4. Validation: Best model selection based on validation metrics

Limitations

  • Dataset Size: Trained on a single book, limiting vocabulary and style diversity
  • Domain Specificity: Optimized for Lewis Carroll's writing style
  • Scale: Relatively small model compared to modern large language models
  • Context Length: Limited context window typical of smaller transformer models

Ethical Considerations

  • This model is trained on public domain literature (Alice in Wonderland)
  • The training data is from 1865 and may contain outdated language or concepts
  • The model is intended for educational and demonstration purposes

Citation

If you use this model, please cite:

@misc{karthik2024alice_gpt,
  title={1st Demo GPT Based Architecture Model},
  author={Karthik},
  year={2024},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model}
}

License

This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain.

Contact

For questions or issues, please open an issue in this repository or contact the model author.


This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support