Daytona AI Researchers | SF Tech Events - GarysGuide

COMING UP

NY Fintech Week 2026
(Apr 27 - May 01)

NYC AI Agent Week 2026
(May 04 - May 08)

Daytona AI Researchers Gary Event

With Jay Ram (Founder/CEO, Hud), Andy Lyu (Founder/CTO, Osmosi), Zhengyang Qi (Research Scientist, Snorkel AI), Vivek Pandit (Frontier AI Lead, Turing), Pramanya Guda (Pacer, Daytona), Muhammad Hashmi (DevRel, Daytona).

	Venue, To Be Announced, San Francisco
	Apr 29 (Wed) , 2026 @ 05:30 PM
	FREE

DETAILS

Beyond Episodes: Infrastructure, Evaluation, & Benchmarking for Long-Running

For thirty years, RL has been built on a simple premise: episodes are brief, state is cheap, & you can always start over. Today's long-running agents violate all three - they run for days, accumulate irreplaceable environment state, & branch across speculative decision trees. The tooling we inherited wasn't built for this.

On Wednesday, April 29, Daytona & FounderCoHo are again co-hosting an exclusive, high-signal evening dedicated to researchers at Stanford University to explore when we take long-horizon, stateful agents seriously - from the infrastructure that makes them possible, to the evaluation frameworks that make them trustworthy.

Agenda
5:30 pm - 5:35 pm
Welcome & Opening Remarks
Pramanya Guda, Community Ambassador - Pacer at Daytona

5:35 pm - 5:50 pm
Talk "Today's Agents Don't Live In Episodes"
Muhammad Annas Hashmi, DevRel at Daytona

Outline:
The 'episode' (short, stateless, resettable) has been RL's foundational abstraction since ATARI. It underpins the Gym API, GRPO, PPO, & the conventional sandbox lifecycle. Today's agents no longer fit it. Tasks span for days; the env state at hour 18 of an agent session with warm caches, installed deps, live processes, open sockets, dirty git tree, is worth hours of wall clock to reproduce.

Three things are scaling simultaneously. Rollout horizon: seconds -> days. Env state: disposable between episodes -> first-class learning substrate. Branching: absent in modern LLM-RL -> speculative fork trees. Each stresses the inherited toolkit in a different way, & all three have been gated on the same missing primitives: VMs you can fork cheaply, pause without killing processes, snapshot mid-run, & resume hours later.

This talk walks through what opens up when those primitives become available. Live demo of long-horizon sessionful rollouts, mid-trajectory forking, & cross-calendar-time training. The research questions that follow (long-horizon benchmarks, speculative RL algorithms, event-driven training, to name a few) are where the next wave of agent RL gets built.

5:50 pm - 6:05 pm
Talk "Closing the Visibility Gap: Lessons from Safety Critical Agentic Systems"
Vivek Pandit, Frontier AI Lead at Turing

Outline:
AI agents are moving from demos to production, but their success depends on how well we can evaluate, benchmark, & trust them in high stakes workflows. This talk explores why traditional software metrics & static benchmarks fall short for agentic systems, especially when agents must reason, plan, call tools, recover from failure, & operate over long horizons. I'll argue for evaluation frameworks that treat execution traces, reasoning trajectories, & tool interactions as first class signals, alongside outcome based metrics such as task success, pass rates, coverage, & behavioral robustness.

To ground these ideas, the talk draws from chip design verification, where over 60% of development time is spent validating design intent against complex specifications. Verification is not just a tooling problem but a reasoning problem, making it a strong testbed for agent evaluation. I'll share lessons from building agents that interoperate with EDA toolchains, coordinate across stages like mental model formation, test planning, testbench generation, & run & debug, & use auto correction loops to safely adapt from tool feedback. The broader lesson is that better observability & domain aware benchmarking are essential for deploying reliable agents in production.

6:05 pm - 6:20 pm
Talk "TBA"
Jay Ram, Founder & CEO at hud YC W25)

Outline:
TBA

6:20 pm - 6:35 pm
Talk "Building Production RL Training Pipelines with Scalable Sandboxes for Agent Execution"
Andy Lyu, Co-Founder & CTO at Osmosis (YC W25)

Outline:
Osmosis is building an RL platform that enables developers to easily fine-tune open source models that can outperform foundation models. A core RL infrastructure challenge is container orchestration (i.e. spinning up & terminating thousands rapidly). We discuss how we designed a production-grade RL pipeline to train Qwen3.5 MoE models, specifically covering our usage of FP8 quantization, LoRA RL, & Daytona sandboxes.

6:35 pm - 6:50 pm
"Automating Benchmark Design"
Zhengyang (Jason) Qi, Research Scientist at Snorkel AI

Outline:

The talk will cover how we actively design evaluations through iterative rollouts, & I'll also discuss how Daytona integrates with this workflow as well as helps us with Terminal Bench & Harbor rollouts.

6:50 pm - 8:30 pm
Networking
With food & beverages

________________________

About event
An engaging meetup designed for AI researchers to connect, share ideas, & explore the latest advancements in artificial intelligence. The event features informal networking, short talks, & discussions on current research trends, fostering collaboration & knowledge exchange within the AI community.