✨ Checkpoint R20 (Karcher10)
architecture: MistralForCausalLM
merge_method: karcher
dtype: bfloat16
models:
- model: dphn/Dolphin-Mistral-24B-Venice-Edition
- model: FlareRebellion/WeirdCompound-v1.7-24b
- model: Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
- model: Naphula/Evilmind-24B-v1
- model: OddTheGreat/Rotor_24B_V.1
- model: TheDrummer/Cydonia-24B-v4.2.0
- model: TheDrummer/Magidonia-24B-v4.2.0
- model: TheDrummer/Rivermind-24B-v1
- model: trashpanda-org/MS3.2-24B-Mullein-v2
- model: zerofata/MS3.2-PaintedFantasy-v2-24B
parameters:
tokenizer:
source: union
chat_template: auto
Twenty-two checkpoints were tested for Goetia before deciding upon R20. Evilmind and Dark World were also created along the way. Those are even less censored than Goetia.
Goetia v1 was rather simple, a SLERP of WeirdCompound, Circuitry, Animus and BlackDolphin.
CheckpointJ (PaintedFantasy, Cydonia, Magidonia, DolphinVenice, FallenMistral) was a notable attempt to improve this.
R1 swapped BlackDolphin with BlackSheep and DolphinVenice.
R2 re-added Mullein, now considered an essential component of the Goetia merge series.
R3 removed Animus due to refusals. R4 was an initial contendor.
R5 introduced Rivermind. R6-R11 swapped out individual components and ran several comparisons.
R8 swapped out BlackDolphin with DolphinVenice due to refusals.
R10 and R11 were evaluated thoroughly as release candidates. Ultimately, the formula was refined further.
R12 added EvilMind. R13 was a drummer tune only test. R13 was interesting (but R19 beats it).
R16 was merged after studying all yamls and comparing Q0 Benchmarks.
R17 was a test with WeirdCompound and Circuitry removed. It did not perform as well.
R18 and R19 were merges of just 4 models. R19 proved to be superior and more uncensored. It was released as Dark World 24B v1.
R20 was brainstormed for hours before deciding upon which models to include. It was unique and detailed like R10/R11. I also swapped Circuitry with Rotor; this improved the creativity even more.
R14, R15, R21, and R22 attempted to SLERP/NuSLERP/Karcher the merged Karcher checkpoints. Each was deemed to be weaker. It seems Karcher might be the best endgame format (for this type of merge). Going past this appears to overcook the weights.
dphn/Dolphin-Mistral-24B-Venice-Edition Adds unique and uncensored attributes.
model_stock, slerp, nuslerp: FlareRebellion/WeirdCompound-v1.7-24b [aixonlab/Eurydice-24b-v3.5] [TheDrummer/Cydonia-24B-v4.2.0] [PocketDoc/Dans-PersonalityEngine-V1.3.0-24b] [CrucibleLab/M3.2-24B-Loki-V1.3] [zerofata/MS3.2-PaintedFantasy-v2-24B] [Delta-Vector/Austral-24B-Winton] [anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only] An experimental model_stock/slerp/nuslerp merge of popular prompt adherent models with high creative writing benchmark scores.
Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly Fully uncensored, unhinged and unaligned with most datasets. Very creative. Teaches the weights how to pass Q0G. Adding even a sliver of this to a merge (~10%) is enough to enlighten it—proving that models which fail operate via strawman-logic.
slerp: Naphula/Evilmind-24B-v1 [Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly] [TheDrummer/Rivermind-24B-v1] Further augments innovative and uncensored output.
ties: OddTheGreat/Rotor_24B_V.1 [CrucibleLab/M3.2-24B-Loki-V1.3] [Delta-Vector/MS3.2-Austral-Winton] [ReadyArt/MS3.2-The-Omega-Directive-24B-Unslop-v2.0] [zerofata/MS3.2-PaintedFantasy-v2-24B] Synergistic TIES blend of PaintedFantasy with Codex, Loki, and Omega. Adds a unique writing style with improved roleplay, creativity and prompt adherence. Performs well at text adventures and Russian language.
TheDrummer/Cydonia-24B-v4.2.0
Enhances prompt output quality and intelligence. Mistral base.
TheDrummer/Magidonia-24B-v4.2.0
Enhances prompt output quality and intelligence. Magistral base.
TheDrummer/Rivermind-24B-v1
Adds a noticeable boost to prose, roleplaying, creativity, and lexical vocabulary.
trashpanda-org/MS3.2-24B-Mullein-v2
Adds non-synthetic ERP datasets like Sugarquill/Erebus. Creative and uncensored.
zerofata/MS3.2-PaintedFantasy-v2-24B Creative model with unique writing style that excels at RP, has reduced repetition, and improved instruction following. Training Process: SFT > DPO > KTO.
EvilMind-24B-v1
base_model: Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
architecture: MistralForCausalLM
merge_method: slerp
dtype: bfloat16
slices:
- sources:
- model: Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
layer_range: [0, 40]
- model: TheDrummer/Rivermind-24B-v1
layer_range: [0, 40]
parameters:
t: 0.5
tokenizer:
source: union
chat_template: auto
Rotor_24B_V.1
models:
- model: ReadyArt/MS3.2-The-Omega-Directive-24B-Unslop-v2.0
parameters:
density: 0.5
weight: 0.5
- model: Delta-Vector/MS3.2-Austral-Winton
parameters:
density: 0.25
weight: 0.25
- model: CrucibleLab/M3.2-24B-Loki-V1.3
parameters:
density: 0.5
weight: 0.5
- model: zerofata/MS3.2-PaintedFantasy-v2-24B
parameters:
density: 0.25
weight: 0.25
merge_method: ties
base_model: ReadyArt/MS3.2-The-Omega-Directive-24B-Unslop-v2.0
parameters:
normalize: false
int8_mask: false
dtype: float16
tokenizer:
source: ReadyArt/MS3.2-The-Omega-Directive-24B-Unslop-v2.0
WeirdCompound-v1.7-24b
base_model: TheDrummer/Cydonia-24B-v4.2.0 # Cydonia v4.2.0
merge_method: model_stock
dtype: bfloat16
models:
- model: aixonlab/Eurydice-24b-v3.5 # storytelling / RP
- model: TheDrummer/Cydonia-24B-v4.2.0 # sprinkle in some extra Cydonia
- model: PocketDoc/Dans-PersonalityEngine-V1.3.0-24b # Prompt Adherence
- model: CrucibleLab/M3.2-24B-Loki-V1.3 # Loki
- model: zerofata/MS3.2-PaintedFantasy-v2-24B # animu
- model: Delta-Vector/Austral-24B-Winton # Adventure
→ /intermediate/model/A →
merge_method: slerp
dtype: bfloat16
base_model: anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only
models:
- model: /intermediate/model/A
parameters:
t: 0.45
→ /intermediate/model/B →
merge_method: nuslerp
dtype: bfloat16
base_model: /intermediate/model/B
models:
- model: PocketDoc/Dans-PersonalityEngine-V1.3.0-24b
parameters:
weight: 0.4
- model: CrucibleLab/M3.2-24B-Loki-V1.3
parameters:
weight: 0.6
→ /intermediate/model/C →
merge_method: slerp
dtype: bfloat16
base_model: /intermediate/model/B
models:
- model: /intermediate/model/C
parameters:
t: 0.5
Gemini: Here is the requested chart, updated for clarity, that classifies the primary mergekit methods into "Lossless" (Holistic) and "Lossy" (Selective/Averaging) categories.
Classification of Merge Methods: Lossless vs. Lossy
This chart categorizes methods based on whether they perform a holistic transformation of all parameters ("Lossless") or if they are designed to selectively prune, drop, or average information ("Lossy").
| Category |
Merge Method |
Core Concept & Why It's Classified This Way |
Lossless (Holistic Transformation) |
slerp |
Spherical Interpolation: A pure, geometric blend between two models. All parameters from both models contribute to the final result according to a smooth trigonometric function. No information is discarded. |
|
nuslerp |
Normalized SLERP: Functionally the same as slerp in its lossless approach. It performs a holistic blend on either two full models or two full task vectors. |
|
karcher |
Geometric Mean: Finds the optimal geometric "center" of multiple models. It is a holistic operation that considers the complete parameter set of all input models to find the mean. |
|
task_arithmetic |
Simple Task Vector Addition: Calculates full task vectors and adds them back to the base. No pruning or dropping of parameters occurs. It's a direct, lossless application of the learned changes. |
|
linear |
Weighted Averaging: While it can feel lossy by obscuring individual model strengths, it is technically lossless in that every parameter from every model is included in the final weighted average. It doesn't zero out or discard any data. |
|
passthrough |
No-Op: The definition of lossless—it simply passes the data through unmodified. |
| --- |
--- |
--- |
Lossy (Selective Pruning & Averaging) |
ties |
Pruning by Magnitude: Intentionally discards (zeros out) the task vector parameters with the smallest magnitudes to achieve a target density. This is the classic example of a lossy, sparsification method. |
|
dare_linear
dare_ties |
Random Pruning: Intentionally discards a random selection of task vector parameters to achieve a target density. Its core principle is lossy sparsification. |
|
della
della_linear |
Probabilistic Pruning: Intentionally discards task vector parameters based on a calculated probability related to their magnitude. It is a sophisticated but fundamentally lossy pruning method. |
|
breadcrumbs
breadcrumbs_ties |
Outlier Pruning: Intentionally discards parameters with both the smallest and largest magnitudes from the task vector. This is a targeted, lossy sparsification technique. |
|
sce |
Variance-Based Pruning: Can be lossy. If select_topk < 1.0, it discards parameter positions across all task vectors that exhibit low variance, zeroing them out. |
|
model_stock |
Lossy by Averaging: Compresses the information from multiple task vectors into a single average task vector before the final merge. The unique, individual characteristics of the source models are lost in this averaging step. |