Commit graph

45 commits

Author SHA1 Message Date
Bradley Axen
38f5f338cb
fix: improve smoke test prompt for reliable tool calling (#6281)
Co-authored-by: Michael Neale <michael.neale@gmail.com>
2025-12-31 15:52:37 -08:00
Michael Neale
8ec6332738
fix: adding more open models (#6300) 2025-12-31 09:48:05 +11:00
Michael Neale
5ca7eb2305
chore: Update gemini versions in test_providers.sh (#6246) 2025-12-23 11:12:19 +11:00
Alex Hancock
7134e89c4b
feat: improved UX for tool calls via execute_code (#6205) 2025-12-22 10:42:20 -05:00
Michael Neale
d4814042e6
chore: cover code mode with end to end provider tests (#6183) 2025-12-19 12:02:06 +08:00
Jack Amadeo
7ff3adcc5f
Clean PR preview sites from gh-pages branch history (#6161) 2025-12-18 16:22:57 -05:00
Jack Amadeo
9fdb0356f0
Disallow subagents with no extensions (#5825) 2025-12-15 12:45:42 -05:00
tlongwell-block
a131b08817
refactor: unify subagent and subrecipe tools into single tool (#5893)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-12-13 13:50:20 -05:00
Michael Neale
7dd244eff6
chore: avoid accidentally using native tls again (#6086) 2025-12-12 11:35:52 +11:00
Douwe Osinga
5f50198318
feat: @goose in terminal (native terminal support) (#5887)
Some checks are pending
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Co-authored-by: Bradley Axen <baxen@squareup.com>
Co-authored-by: Michael Neale <michael.neale@gmail.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-01 17:40:17 +11:00
David Katz
c1c772b267
Add out of context compaction test via error proxy (#5805) 2025-11-21 14:51:01 -05:00
Douwe Osinga
f4724cbf23
Comment out the flaky mcp callers (#5827)
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-11-20 21:21:38 +01:00
Salvatore Testa
cfdf01567d
fix: support Gemini 3's thought signatures (#5806)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / Check Rust Code Format (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Signed-off-by: Salvatore Testa <sal@withpersona.com>
2025-11-20 16:28:27 +11:00
David Katz
1d8d6a1788
Provider error proxy for simulating various types of errors (#5091) 2025-11-18 17:28:07 -05:00
Michael Neale
2bef034303
feat: trying grok for live test (#5732)
Some checks are pending
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-11-17 09:37:43 +11:00
Jack Amadeo
d4f66f4855
faster, cheaper (pick two): improve CI workflow and switch to free github runner (#5702)
Co-authored-by: Douwe Osinga <douwe@block.xyz>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-14 12:58:57 -05:00
Jack Amadeo
5110d32142
bump openapi version directly (#5674) 2025-11-11 10:15:42 -05:00
Alex Hancock
7ec3b84ad7
fix: gemini flash -> pro for mcp smoke tests (#5574) 2025-11-06 10:05:18 -05:00
David Katz
eb29083a52
Manual compaction test and fix (#5568) 2025-11-06 10:03:48 -05:00
Zane
89f7384d57
add clippy warning for string_slice (#5422)
Some checks are pending
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-11-04 17:46:25 -05:00
Michael Neale
7511a533d6
we should run this on main and also test open models at least via ope… (#5556)
Some checks failed
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
Deploy Documentation / deploy (push) Has been cancelled
adds qwen3-code and GLM 4.6 to test_providers for open model coverage
2025-11-04 09:06:23 +11:00
Alex Hancock
38e7dc8f30
fix: remove qwen3-coder from provider/mcp smoke tests (#5551) 2025-11-03 14:33:49 -05:00
Alex Hancock
c1c13716e0
chore(tests/mcp): testing for MCP sampling (#5456) 2025-11-03 12:23:11 -05:00
Amed Rodriguez
d9633ff1d9
Change Recipes Test Script (#5457)
Some checks are pending
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-10-30 16:00:25 -07:00
Michael Neale
b94535b679
testing tetrate with sonnet (#5428) 2025-10-29 11:40:02 +11:00
Amed Rodriguez
4687656487
Add Recipes Test Script (#5420) 2025-10-28 17:17:51 -07:00
Douwe Osinga
6b6c50976c
Gemini again (#5390)
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-10-27 16:41:00 -04:00
Will Pfleger
044b227fdb
(re)Standardize Session Name Attribute (#5279)
Some checks are pending
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-10-24 13:34:08 -04:00
Michael Neale
3c975bb358
live testing script (#5263)
Co-authored-by: Jack Amadeo <jackamadeo@squareup.com>
2025-10-21 16:39:58 +11:00
Douwe Osinga
64b37339e0
Skip subagents for gemini (#5257)
Co-authored-by: Douwe Osinga <douwe@squareup.com>
2025-10-18 17:35:29 -04:00
Michael Neale
890393bb68
Revert "Standardize Session Name Attribute" (#5250)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
2025-10-18 12:44:30 -04:00
Will Pfleger
b8c3508178
Standardize Session Name Attribute (#5085) 2025-10-17 17:05:41 -04:00
Jack Amadeo
757ceb6109
chore: turn clippy on for test code (#4817)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
2025-09-26 00:06:07 -04:00
Angie Jones
63f3669cf7
Remove deprecated Claude 3.5 models (#4590) 2025-09-10 14:41:02 -05:00
Jack Amadeo
7c2b40cc21
Clean up langfuse docs and scripts (#4220) 2025-08-20 10:46:31 -04:00
Jack Amadeo
dd504741a3
Remove cognitive complexity clippy lint (#4010) 2025-08-11 20:24:37 -04:00
Michael Neale
8f54fa84a5
fix: optimise reading large file content (#3767) 2025-08-06 09:38:52 +10:00
Lifei Zhou
48a38dc034
Chore: apply more clippy rules to prevent from code complexity (#3813) 2025-08-03 20:03:08 +10:00
Prem Pillai
f21b9017b8
fix: ensure retry-config and success-criteria are populated in openapi spec (#3575) 2025-07-22 19:39:35 +10:00
Alice Hau
be09849128
[feat] goosebenchv2 additions for eval post-processing (#2619)
Co-authored-by: Alice Hau <ahau@squareup.com>
2025-05-21 15:00:13 -04:00
marcelle
8fbd9eb327
feat: efficient benching (#1921)
Co-authored-by: Tyler Rockwood <rockwotj@gmail.com>
Co-authored-by: Kalvin C <kalvinnchau@users.noreply.github.com>
Co-authored-by: Alice Hau <110418948+ahau-square@users.noreply.github.com>
2025-04-08 14:43:43 -04:00
Alice Hau
bb4feacf03
feat: add additional goosebench evals (#1571)
Co-authored-by: Alice Hau <alice.a.hau@gmail.com>
2025-03-10 15:11:44 -04:00
marcelle
49dee048e4
feat: goose bench framework for functional and regression testing
Co-authored-by: Zaki Ali <zaki@squareup.com>
2025-03-05 21:23:00 -05:00
Bradley Axen
1c9a7c0b05
feat: V1.0 (#734)
Co-authored-by: Michael Neale <michael.neale@gmail.com>
Co-authored-by: Wendy Tang <wendytang@squareup.com>
Co-authored-by: Jarrod Sibbison <72240382+jsibbison-square@users.noreply.github.com>
Co-authored-by: Alex Hancock <alex.hancock@example.com>
Co-authored-by: Alex Hancock <alexhancock@block.xyz>
Co-authored-by: Lifei Zhou <lifei@squareup.com>
Co-authored-by: Wes <141185334+wesrblock@users.noreply.github.com>
Co-authored-by: Max Novich <maksymstepanenko1990@gmail.com>
Co-authored-by: Zaki Ali <zaki@squareup.com>
Co-authored-by: Salman Mohammed <smohammed@squareup.com>
Co-authored-by: Kalvin C <kalvinnchau@users.noreply.github.com>
Co-authored-by: Alec Thomas <alec@swapoff.org>
Co-authored-by: lily-de <119957291+lily-de@users.noreply.github.com>
Co-authored-by: kalvinnchau <kalvin@block.xyz>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Rizel Scarlett <rizel@squareup.com>
Co-authored-by: bwrage <bwrage@squareup.com>
Co-authored-by: Kalvin Chau <kalvin@squareup.com>
Co-authored-by: Alice Hau <110418948+ahau-square@users.noreply.github.com>
Co-authored-by: Alistair Gray <ajgray@stripe.com>
Co-authored-by: Nahiyan Khan <nahiyan.khan@gmail.com>
Co-authored-by: Alex Hancock <alexhancock@squareup.com>
Co-authored-by: Nahiyan Khan <nahiyan@squareup.com>
Co-authored-by: marcelle <1852848+laanak08@users.noreply.github.com>
Co-authored-by: Yingjie He <yingjiehe@block.xyz>
Co-authored-by: Yingjie He <yingjiehe@squareup.com>
Co-authored-by: Lily Delalande <ldelalande@block.xyz>
Co-authored-by: Adewale Abati <acekyd01@gmail.com>
Co-authored-by: Ebony Louis <ebony774@gmail.com>
Co-authored-by: Angie Jones <jones.angie@gmail.com>
Co-authored-by: Ebony Louis <55366651+EbonyLouis@users.noreply.github.com>
2025-01-24 13:04:43 -08:00
Salman Mohammed
8cf7b9f26c
refactor: move langfuse wrapper to a module in exchange instead of a package (#138)
Co-authored-by: Alice Hau <ahau@squareup.com>
2024-10-16 09:30:13 -04:00