{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 🧪 WFGY λ_diverse Demo (v0.1)\n", "Estimate how diverse multiple answers are for the same prompt." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Formula\n", "Let **A = {a₁ … aₙ}** be *n* answers. \n", "λ_diverse = 1 − mean_pairwise_cosine(aᵢ, aⱼ) \n", "Lower average similarity ⇒ higher diversity." ] }, { "cell_type": "code", "metadata": { "id": "install" }, "source": [ "!pip -q install sentence-transformers --upgrade" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "imports" }, "source": [ "from sentence_transformers import SentenceTransformer, util\n", "import itertools, numpy as np" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "model" }, "source": [ "model = SentenceTransformer('all-MiniLM-L6-v2')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "helper" }, "source": [ "def lambda_diverse(sent_list):\n", " vecs = model.encode(sent_list, convert_to_tensor=True)\n", " sims = []\n", " for i, j in itertools.combinations(range(len(vecs)), 2):\n", " sims.append(util.cos_sim(vecs[i], vecs[j]).item())\n", " return 1 - np.mean(sims)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ✏️ Edit & run\n", "Replace the prompt + answers list, press ▶️." ] }, { "cell_type": "code", "metadata": { "id": "user" }, "source": [ "prompt = \"Give me a one-sentence summary of photosynthesis.\"\n", "answers = [\n", " \"Plants convert sunlight into chemical energy stored as sugar.\",\n", " \"Using light, plants turn water and CO₂ into glucose and oxygen.\",\n", " \"Through photosynthesis, green leaves make food from sunlight.\",\n", " \"Plants harness solar energy to synthesize sugars from carbon dioxide.\",\n", " \"Light drives the production of glucose in plants, releasing oxygen.\"\n", "]\n", "\n", "ld = lambda_diverse(answers)\n", "\n", "print(f\"λ_diverse : {ld:.3f}\\n\")\n", "if ld > 0.70:\n", " label = \"High diversity ✅\"\n", "elif ld > 0.40:\n", " label = \"Medium diversity ⚠️\"\n", "else:\n", " label = \"Low diversity 🚨\"\n", "print(label)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Next Steps\n", "* Compare diversity of *top-k* sampling vs nucleus sampling.\n", "* Combine with **e_resonance** to pick “diverse *and* on-topic” answers.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }