10 deep, unsolved problems inside the niha AI platform. Pick one — or bring your own. Ship it in a week. Change the game.
When a conversation exceeds 80% of the token budget, the compaction engine fires. It is supposed to summarize the conversation so the agent can continue with context intact. Here is what it actually does:
# compaction.py — _build_summary() — THIS IS THE ENTIRE IMPLEMENTATION
# Extracts: first 100 chars of each user message
# Extracts: tool call names (no arguments, no results)
# Extracts: last 200 chars of assistant messages
# Everything else? Destroyed.
The model continues responding but it's flying blind. It re-reads files it already analyzed. It forgets constraints. It contradicts decisions it already made.
Additionally, the token estimator uses len(text) // 4 — a heuristic that underestimates code tokens by 25-40%, causing compaction to trigger later than it should.
len(text) // 4 with accurate token counting.agent_loop() and agent_loop_stream().Topics discussed: authentication, middleware.
Tools used: read_file, edit_file, run_command.
Files: src/auth.py, src/middleware.py.
Last conclusion: "...the fix has been applied
to the middleware chain."
{
"decisions": [
"Switched from session-based to JWT auth
because Azure AD requires stateless tokens",
"Added middleware ordering fix: auth must
run before rate-limit (was reversed)"
],
"code_changes": [
{"file": "src/auth.py", "change": "Replaced
session lookup with JWT decode"},
{"file": "src/middleware.py", "change":
"Reordered: auth → rate_limit → budget"}
],
"constraints_discovered": [
"Azure AD JWKS endpoint rate-limits at
100 req/min — must cache"
],
"open_questions": [
"Should token refresh happen client-side
or via platform API proxy?"
]
}
Fast (~500ms) and cheap. The compaction prompt design is the real challenge.
The summary isn't for a human. It's for the next LLM turn. Decisions, constraints, file states.
Summarize old turns, keep recent turns verbatim. Detail for last 5 turns, decisions from turn 1-40.
The SDK has client.count_tokens(). Eliminates the "context too large" 400 errors.
Extraction is regex: 18 hardcoded patterns decide what's "worth remembering." When a pattern fires, the entire raw message (500+ words) is saved — no extraction of the actual insight.
Retrieval is keyword Jaccard: "prod database connection" won't find "production postgresql" because exact match fails.
Eviction is blind LRU: Hard cap of 500 memories. Critical architectural decisions evicted because nobody accessed them recently.
/memories page — see, edit, pin, delete memories./memories page renders with search, importance scores, pin/delete.agent_loop.py is 1,200+ lines with three implementations: sync (~358 lines), streaming (~373 lines), and architect (sync only). They share zero code.
# BUG 1: Hooks don't run on parallel tool calls in streaming mode
# BUG 2: Checkpoints miss edit summaries in streaming
# BUG 3: Architect mode only wraps sync — no streaming architect
The AG-UI event system (13 event types, SSE, EventBus) exists but the agent loop emits zero events.
AgentLoop class with one pipeline and streaming/non-streaming output adapter.9 agents exist as YAML files. Creating a new one means: open a text editor, write valid YAML, understand MCP extensions, know model tiers, define output schemas. A consultant can't do this.
# router.py — THE ACTUAL LOGIC
if msg_len < 200 and not tool_hints:
return "fast" # → Haiku
elif msg_len > 2000 and keyword in ["architect", "design", "refactor"]:
return "powerful" # → Opus
else:
return "standard" # → Sonnet
"why is auth broken?" (22 chars) → routes to Haiku. Needs deep reasoning. Routing logic is duplicated between platform API and CLI with diverging keywords.
PII detection is detection-only (no redaction). Only 4 PII patterns. No prompt injection detection. No jailbreak detection. Regex-only.
guardrail.violation webhooks.WorkflowRunner → Executes sequentially. No persistence. No events.
WorkflowStore → Persists to SQLite. Nobody calls save_run() during execution.
EventBus → 13 AG-UI events. Runner never emits events.
WorkflowConfig → Condition engine handles only 3 operators.
asyncio.gather().# Issue 1: Auth disabled by default — any JWT accepted without validation
# Issue 2: API key → user=None → FULL ACCESS, no RBAC/rate-limit/budget
# Issue 3: cache_key = hash(model + message) — no user scoping
# Issue 4: Rate limiting in-memory + disabled by default
# Issue 5: Virtual keys not wired into main auth flow
# Issue 6: Agent cards exposed to anonymous callers
search_by_entity("PostgreSQL") finds psycopg2/pg_connection chunks.modified_args built but never consumed — 90% built, 100% broken.modified_args to tool execution.Don't see your problem in the 10 listed challenges? No worries — you can bring your own. The same judging criteria, prizes, and timeline apply. Your project just needs to solve a real problem with AI.
Identify a real pain point — something your team or org deals with regularly. Not a toy demo.
AI must be core — the solution should use AI meaningfully, not as a bolt-on feature.
Ship something working — prototype is fine, but it must demo live on Demo Day.
Describe the problem clearly in your registration — judges need to understand the "why" as much as the "what."
AI-powered meeting summarizer, automated onboarding assistant, smart document search for your team.
Code review bot, automated test generator, deployment health checker, incident root-cause analyzer.
Proposal generator, intelligent project scoping, automated status reports, client communication assistant.
HR policy Q&A bot, expense anomaly detector, resource allocation optimizer, knowledge base builder.
Sign up today — takes 2 minutes. Come back during the week for the remaining steps.
We've logged your interest. Come back anytime this week to fill in your idea brief, solution plan, and final submission.
Great solutions start with a clearly understood problem. Research it well.
You have 1 week. Think MVP. Think AI-first. Think shipped.
You built it. Now show it. Submit before the deadline.
Your project is in the running. Judges will review after the deadline — shortlisted teams will be invited to Demo Day.