| |
|
| |
BenchFlow is excited to announce SkillsBench 1.0 containing more than 100 expert curated tasks measuring how well agents use skills across diverse & complex domains.
In collaboration with Kaggle, we're launching an afterparty for ACM CAIS Agent Skills'26 workshop featuring researchers & practitioners working on Skills design, benchmarking, optimization, security, & ecosystem infrastructure.
We'll feature Live Demos/Talks
How to create new benchmarks/ RLenvironments using the Benchflow SDK
Takeaways from building Kaggle's new Agent Benchmarks for open model evaluations
and more!
Limited spaces available, sign up today! Excited to see everyone at the venue :)
Hosted in partnership with Kernel Labs!
|
|
|
|
|
|
|
|