Bradley Axen
38f5f338cb
fix: improve smoke test prompt for reliable tool calling ( #6281 )
...
Co-authored-by: Michael Neale <michael.neale@gmail.com>
2025-12-31 15:52:37 -08:00
Michael Neale
8ec6332738
fix: adding more open models ( #6300 )
2025-12-31 09:48:05 +11:00
Michael Neale
5ca7eb2305
chore: Update gemini versions in test_providers.sh ( #6246 )
2025-12-23 11:12:19 +11:00
Alex Hancock
7134e89c4b
feat: improved UX for tool calls via execute_code ( #6205 )
2025-12-22 10:42:20 -05:00
Michael Neale
d4814042e6
chore: cover code mode with end to end provider tests ( #6183 )
2025-12-19 12:02:06 +08:00
Jack Amadeo
7ff3adcc5f
Clean PR preview sites from gh-pages branch history ( #6161 )
2025-12-18 16:22:57 -05:00
Jack Amadeo
9fdb0356f0
Disallow subagents with no extensions ( #5825 )
2025-12-15 12:45:42 -05:00
tlongwell-block
a131b08817
refactor: unify subagent and subrecipe tools into single tool ( #5893 )
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-12-13 13:50:20 -05:00
Michael Neale
7dd244eff6
chore: avoid accidentally using native tls again ( #6086 )
2025-12-12 11:35:52 +11:00
Douwe Osinga
5f50198318
feat: @goose in terminal (native terminal support) ( #5887 )
...
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Co-authored-by: Bradley Axen <baxen@squareup.com>
Co-authored-by: Michael Neale <michael.neale@gmail.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-01 17:40:17 +11:00
David Katz
c1c772b267
Add out of context compaction test via error proxy ( #5805 )
2025-11-21 14:51:01 -05:00
Douwe Osinga
f4724cbf23
Comment out the flaky mcp callers ( #5827 )
...
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-11-20 21:21:38 +01:00
Salvatore Testa
cfdf01567d
fix: support Gemini 3's thought signatures ( #5806 )
...
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / Check Rust Code Format (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Signed-off-by: Salvatore Testa <sal@withpersona.com>
2025-11-20 16:28:27 +11:00
David Katz
1d8d6a1788
Provider error proxy for simulating various types of errors ( #5091 )
2025-11-18 17:28:07 -05:00
Michael Neale
2bef034303
feat: trying grok for live test ( #5732 )
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-11-17 09:37:43 +11:00
Jack Amadeo
d4f66f4855
faster, cheaper (pick two): improve CI workflow and switch to free github runner ( #5702 )
...
Co-authored-by: Douwe Osinga <douwe@block.xyz>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-14 12:58:57 -05:00
Jack Amadeo
5110d32142
bump openapi version directly ( #5674 )
2025-11-11 10:15:42 -05:00
Alex Hancock
7ec3b84ad7
fix: gemini flash -> pro for mcp smoke tests ( #5574 )
2025-11-06 10:05:18 -05:00
David Katz
eb29083a52
Manual compaction test and fix ( #5568 )
2025-11-06 10:03:48 -05:00
Zane
89f7384d57
add clippy warning for string_slice ( #5422 )
...
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-11-04 17:46:25 -05:00
Michael Neale
7511a533d6
we should run this on main and also test open models at least via ope… ( #5556 )
...
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Deploy Documentation / deploy (push) Has been cancelled
adds qwen3-code and GLM 4.6 to test_providers for open model coverage
2025-11-04 09:06:23 +11:00
Alex Hancock
38e7dc8f30
fix: remove qwen3-coder from provider/mcp smoke tests ( #5551 )
2025-11-03 14:33:49 -05:00
Alex Hancock
c1c13716e0
chore(tests/mcp): testing for MCP sampling ( #5456 )
2025-11-03 12:23:11 -05:00
Amed Rodriguez
d9633ff1d9
Change Recipes Test Script ( #5457 )
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-10-30 16:00:25 -07:00
Michael Neale
b94535b679
testing tetrate with sonnet ( #5428 )
2025-10-29 11:40:02 +11:00
Amed Rodriguez
4687656487
Add Recipes Test Script ( #5420 )
2025-10-28 17:17:51 -07:00
Douwe Osinga
6b6c50976c
Gemini again ( #5390 )
...
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-10-27 16:41:00 -04:00
Will Pfleger
044b227fdb
(re)Standardize Session Name Attribute ( #5279 )
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-10-24 13:34:08 -04:00
Michael Neale
3c975bb358
live testing script ( #5263 )
...
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2025-10-21 16:39:58 +11:00
Douwe Osinga
64b37339e0
Skip subagents for gemini ( #5257 )
...
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-10-18 17:35:29 -04:00
Michael Neale
890393bb68
Revert "Standardize Session Name Attribute" ( #5250 )
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
2025-10-18 12:44:30 -04:00
Will Pfleger
b8c3508178
Standardize Session Name Attribute ( #5085 )
2025-10-17 17:05:41 -04:00
Jack Amadeo
757ceb6109
chore: turn clippy on for test code ( #4817 )
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-09-26 00:06:07 -04:00
Angie Jones
63f3669cf7
Remove deprecated Claude 3.5 models ( #4590 )
2025-09-10 14:41:02 -05:00
Jack Amadeo
7c2b40cc21
Clean up langfuse docs and scripts ( #4220 )
2025-08-20 10:46:31 -04:00
Jack Amadeo
dd504741a3
Remove cognitive complexity clippy lint ( #4010 )
2025-08-11 20:24:37 -04:00
Michael Neale
8f54fa84a5
fix: optimise reading large file content ( #3767 )
2025-08-06 09:38:52 +10:00
Lifei Zhou
48a38dc034
Chore: apply more clippy rules to prevent from code complexity ( #3813 )
2025-08-03 20:03:08 +10:00
Prem Pillai
f21b9017b8
fix: ensure retry-config and success-criteria are populated in openapi spec ( #3575 )
2025-07-22 19:39:35 +10:00
Alice Hau
be09849128
[feat] goosebenchv2 additions for eval post-processing ( #2619 )
...
Co-authored-by: Alice Hau <ahau@squareup.com>
2025-05-21 15:00:13 -04:00
marcelle
8fbd9eb327
feat: efficient benching ( #1921 )
...
Co-authored-by: Tyler Rockwood <rockwotj@gmail.com>
Co-authored-by: Kalvin C <kalvinnchau@users.noreply.github.com>
Co-authored-by: Alice Hau <110418948+ahau-square@users.noreply.github.com>
2025-04-08 14:43:43 -04:00
Alice Hau
bb4feacf03
feat: add additional goosebench evals ( #1571 )
...
Co-authored-by: Alice Hau <alice.a.hau@gmail.com>
2025-03-10 15:11:44 -04:00
marcelle
49dee048e4
feat: goose bench framework for functional and regression testing
...
Co-authored-by: Zaki Ali <zaki@squareup.com>
2025-03-05 21:23:00 -05:00
Bradley Axen
1c9a7c0b05
feat: V1.0 ( #734 )
...
Co-authored-by: Michael Neale <michael.neale@gmail.com>
Co-authored-by: Wendy Tang <wendytang@squareup.com>
Co-authored-by: Jarrod Sibbison <72240382+jsibbison-square@users.noreply.github.com>
Co-authored-by: Alex Hancock <alex.hancock@example.com>
Co-authored-by: Alex Hancock <alexhancock@block.xyz>
Co-authored-by: Lifei Zhou <lifei@squareup.com>
Co-authored-by: Wes <141185334+wesrblock@users.noreply.github.com>
Co-authored-by: Max Novich <maksymstepanenko1990@gmail.com>
Co-authored-by: Zaki Ali <zaki@squareup.com>
Co-authored-by: Salman Mohammed <smohammed@squareup.com>
Co-authored-by: Kalvin C <kalvinnchau@users.noreply.github.com>
Co-authored-by: Alec Thomas <alec@swapoff.org>
Co-authored-by: lily-de <119957291+lily-de@users.noreply.github.com>
Co-authored-by: kalvinnchau <kalvin@block.xyz>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Rizel Scarlett <rizel@squareup.com>
Co-authored-by: bwrage <bwrage@squareup.com>
Co-authored-by: Kalvin Chau <kalvin@squareup.com>
Co-authored-by: Alice Hau <110418948+ahau-square@users.noreply.github.com>
Co-authored-by: Alistair Gray <ajgray@stripe.com>
Co-authored-by: Nahiyan Khan <nahiyan.khan@gmail.com>
Co-authored-by: Alex Hancock <alexhancock@squareup.com>
Co-authored-by: Nahiyan Khan <nahiyan@squareup.com>
Co-authored-by: marcelle <1852848+laanak08@users.noreply.github.com>
Co-authored-by: Yingjie He <yingjiehe@block.xyz>
Co-authored-by: Yingjie He <yingjiehe@squareup.com>
Co-authored-by: Lily Delalande <ldelalande@block.xyz>
Co-authored-by: Adewale Abati <acekyd01@gmail.com>
Co-authored-by: Ebony Louis <ebony774@gmail.com>
Co-authored-by: Angie Jones <jones.angie@gmail.com>
Co-authored-by: Ebony Louis <55366651+EbonyLouis@users.noreply.github.com>
2025-01-24 13:04:43 -08:00
Salman Mohammed
8cf7b9f26c
refactor: move langfuse wrapper to a module in exchange instead of a package ( #138 )
...
Co-authored-by: Alice Hau <ahau@squareup.com>
2024-10-16 09:30:13 -04:00