1st Demo GPT Based Architecture Model

Model Description

This is a GPT-based transformer language model trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature.

Model Details

Model Type: GPT (Generative Pre-trained Transformer)
Architecture: Custom transformer-based language model
Training Data: Alice's Adventures in Wonderland by Lewis Carroll
Language: English
Library: PyTorch
Model Size: ~4.2M parameters (based on complete_gpt_model.pth)

Training Details

Dataset

Source: Alice's Adventures in Wonderland (complete text)
Size: 1,033 lines of text
Preprocessing: Custom tokenization using character-level or subword tokenization

Training Configuration

Epochs: 3 (checkpoint files available for each epoch)
Optimizer: Likely AdamW (standard for transformer models)
Training Files:
- checkpoint_epoch_1.pth (12.2MB)
- checkpoint_epoch_2.pth (12.2MB)
- checkpoint_epoch_3.pth (12.2MB)
- best_model.pth (4.14MB) - Best performing checkpoint
- complete_gpt_model.pth (4.20MB) - Final trained model

Files in this Repository

File	Size	Description
`complete_gpt_model.pth`	4.20MB	Final trained model weights
`best_model.pth`	4.14MB	Best performing model checkpoint
`checkpoint_epoch_1.pth`	12.2MB	Training checkpoint after epoch 1
`checkpoint_epoch_2.pth`	12.2MB	Training checkpoint after epoch 2
`checkpoint_epoch_3.pth`	12.2MB	Training checkpoint after epoch 3
`tokenizer.pkl`	37.3KB	Custom tokenizer for the model
`dataset.txt`	51KB	Training dataset (Alice in Wonderland)
`Notebook1.ipynb`	4.1MB	Training notebook with implementation

Usage

Loading the Model

import torch
import pickle

# Load the tokenizer
with open('tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)

# Load the model
model = torch.load('complete_gpt_model.pth', map_location='cpu')
model.eval()

Text Generation

def generate_text(model, tokenizer, prompt, max_length=100):
    model.eval()
    with torch.no_grad():
        # Tokenize input
        input_ids = tokenizer.encode(prompt)
        
        # Generate text
        for _ in range(max_length):
            # Your generation logic here
            # This will depend on your specific implementation
            pass
    
    return generated_text

# Example usage
prompt = "Alice was beginning to get very tired"
generated = generate_text(model, tokenizer, prompt)
print(generated)

Model Performance

The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (Notebook1.ipynb).

Expected Outputs

Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with:

Victorian-era English vocabulary and sentence structure
Whimsical and fantastical content
Character references from the original story
Descriptive and narrative prose style

Training Process

The training was conducted using:

Data Preprocessing: Text cleaning and tokenization
Model Architecture: Custom GPT implementation
Training Loop: 3 epochs with checkpoint saving
Validation: Best model selection based on validation metrics

Limitations

Dataset Size: Trained on a single book, limiting vocabulary and style diversity
Domain Specificity: Optimized for Lewis Carroll's writing style
Scale: Relatively small model compared to modern large language models
Context Length: Limited context window typical of smaller transformer models

Ethical Considerations

This model is trained on public domain literature (Alice in Wonderland)
The training data is from 1865 and may contain outdated language or concepts
The model is intended for educational and demonstration purposes

Citation

If you use this model, please cite:

@misc{karthik2024alice_gpt,
  title={1st Demo GPT Based Architecture Model},
  author={Karthik},
  year={2024},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model}
}

License

This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain.

Contact

For questions or issues, please open an issue in this repository or contact the model author.

This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.

Downloads last month: -; Downloads are not tracked for this model. How to track