DistilSN97

How It Works

Distil (Subnet 97) — competitive model distillation on Bittensor.

GitHub ↗

Score to Beat

Your model must achieve KL divergence >1% lowerthan the current king to claim emissions. Models with KL > 2.0 are disqualified.

<0.0681

king: 0.0688 − 1%

The Goal

Distill Qwen/Qwen3.5-35B-A3B (35.0B total, 3.0Bactive MoE) into a smaller model. The miner whose model most closely matches the teacher's output distribution wins 100% of the subnet's emissions.

Scoring

The validator generates continuations with the teacher model, then scores your model's predicted distribution against the teacher's using KL divergence across the full 248,044-token vocabulary. Lower KL = closer to teacher = better.

Evaluations use block-seeded random prompts from the FineWeb pretraining corpus. The teacher generates up to 512 tokens of continuation. Your model must predict the same token distributions as closely as possible.

Winner-take-all: The miner with the lowest KL divergence gets weight 1.0. Everyone else gets 0. Raw scores each epoch — no smoothing. Models are permanent, so scores converge naturally.

King-of-the-Hill

The validator uses an efficient king-of-the-hillarchitecture. The current best model (the "king") holds the crown until a challenger proves it's meaningfully better.

  • Pre-checks first — every epoch, all models are verified (architecture, hash, integrity) before any GPU time is spent
  • Only new challengers evaluated — models that already lost keep their scores. GPU time goes to models that could actually win.
  • 40 prompts per evaluation — double the standard sweep, giving tighter confidence intervals and more reliable scores
  • Same prompts, fair comparison — king and challenger are scored on identical teacher continuations in the same run. No GPU is used when there are no challengers.

Epsilon Threshold (1%)

To prevent noisy near-ties from flipping the winner every epoch, challengers must beat the king by a 1% relative margin.

If the king has KL = 0.0972, a challenger needs KL < 0.0963 (= 0.0972 × 0.99) to dethrone it. A score of 0.0965 is better but not enough — the king holds.

This rewards meaningful improvements over sampling noise, and creates a stable incentive for miners to find genuinely better distillation techniques rather than submitting marginal variants.

Confidence & Variance

Each model is evaluated on multiple prompts. The dashboard shows a 95% confidence interval computed from the per-prompt KL scores. This tells you how stable the score is.

Standard Error (SE)measures uncertainty in the mean KL estimate. With more evaluation samples, SE shrinks and the CI narrows. A tight CI means the score is reliable; a wide CI means the model's performance varies significantly across prompts.

Per-prompt breakdownon each miner's detail page shows KL mean ± std for each evaluation prompt, plus the number of tokens scored. This helps identify if a model struggles on specific types of content.

Model Requirements

  • Max 5.3B total parameters — verified from safetensors metadata (not config estimates)
  • Same tokenizer as the teacher — verified by encoding match on reference text, not just vocab size
  • No quantization — GPTQ, AWQ, FP8 models are rejected. The subnet rewards architecture distillation, not compression
  • No duplicate models — SHA256 of safetensors weights must be unique. Re-uploading the winner under a different name is detected and blacklisted.
  • One commit per hotkey, forever — once you commit a model, it cannot be changed. Choose carefully.

How to Mine

1.

Train a distilled model from Qwen/Qwen3.5-35B-A3B. Use knowledge distillation, pruning, architecture search — anything that produces a smaller model with the same tokenizer.

2.

Upload to HuggingFace — your model must be a public HuggingFace repo with safetensors weights.

3.

Register a hotkey on Bittensor subnet 97 and commit your model using the miner script.

Quick start:

# Clone the subnet repo (docs + miner script)
git clone https://github.com/unarbos/distil.git
cd distil

# Register on subnet 97
btcli s register --netuid 97 --network finney \
  --wallet.name my_wallet --wallet.hotkey my_hotkey

# Commit your model (PERMANENT — cannot undo)
python miner.py \
  --wallet-name my_wallet \
  --hotkey-name my_hotkey \
  --model-repo your-username/your-distilled-model \
  --netuid 97 --network finney

Anti-Gaming

  • Copy detection — SHA256 of safetensors shards prevents re-uploading the winner under a different name. First committer owns the hash.
  • Block-seeded prompts — evaluation prompts are deterministic from the block number, making them unpredictable
  • Full-distribution KL — scored on all 248,044 tokens, not top-k. No shortcuts.
  • Integrity verification — models must stay public and unchanged on HuggingFace. Modified or deleted models are disqualified.

Teacher Model

Model

Qwen/Qwen3.5-35B-A3B

Total Params

35.0B

Active Params

3.0B

Architecture

qwen3_5_moe

Vocab Size

248,044

Max Student

5.3B