t.a.c Internal Hackathon · Q1 2026

t.a.c AI Challenge

10 deep, unsolved problems inside the niha AI platform. Pick one — or bring your own. Ship it in a week. Change the game.

10+
Challenges
7
Days
₹85K
Prize Pool
Impact
📣

Registrations Open This Friday!

Friday, 13 March 2026 at 10:00 AM IST — registrations go live and the challenge problems are unlocked. Use Friday to explore the challenges, form your team, and plan your approach. The build week runs Saturday 14 March – Friday 20 March. Demo Day follows.

Registrations open in
--
Days
--
Hours
--
Mins
--
Secs
Registrations Open
Fri, 13 Mar 2026 · 10:00 AM IST
Submissions Close
Fri, 20 Mar 2026 · 11:59 PM IST
Prizes
Top 3 teams win cash prizes. 1st place ships to production.
🏆
1st Place
₹50,000
Ships to production. Changes the platform.
🥈
2nd Place
₹25,000
Merges to main. Ready for polish.
🥉
3rd Place
₹10,000
Proves the concept. Path to ship is clear.
How It Works
📣
Wed, 12 Mar
Announcement
Challenge announced. Explore the 10 problems. Start thinking.
📝
Fri, 13 Mar
Registrations Open
Register your team. Pick a challenge (or bring your own). Plan your approach.
🔨
14 — 20 Mar
Build Week
7 days to build. Use any AI tool. Full codebase access. Ship something real.
🎧
23 — 27 Mar
Demo Day & Winners
Shortlisted teams present live. Judges review. Winners announced early next week.
Judging Criteria
Projects are scored out of 100 across six dimensions. Scores are transparent — no surprises.
🛠
Code Quality 20 pts
Clean, well-structured, maintainable code. Tests, documentation, and good engineering practices.
🌍
Production Ready 20 pts
The product must be live-ready — not a prototype that "works on my machine." Deployed or deployable.
🎯
Solves a Real Pain 20 pts
A key pain point must be clearly identified and demonstrably solved. Evidence matters.
🎯
Accuracy 15 pts
The solution must work correctly. Edge cases handled. Results are reliable and trustworthy.
🚀
Impact 15 pts
Does this meaningfully change how teams work? Scale of impact matters — from one team to the whole org.
💪
Hard to Replace 10 pts
Is this a moat, not just a feature? Does it create lasting value that's difficult to replicate?
View full scoring rubric, prize thresholds & score calculator →

Rules & Guidelines

1
Pick any challenge from the 10 listed — or bring your own idea
2
Work solo or in a team (max 3 people)
3
Full access to the niha codebase for listed challenges
4
Use any AI tool — Claude, ChatGPT, Copilot, Cursor, etc.
5
Submit a working project with a GitHub repo and README
6
Demo on Demo Day — live, working code only
The Challenges
10 real, unsolved problems — or bring your own. Read carefully. Go deep.
Challenge #1

Perfect Recall

LLM-Powered Context Compaction That Preserves Meaning
▲ Hard Agent Loop Core Prompt Engineering Python

🔴 The Problem

Every long conversation with niha silently degrades — and the user never knows.

When a conversation exceeds 80% of the token budget, the compaction engine fires. It is supposed to summarize the conversation so the agent can continue with context intact. Here is what it actually does:

# compaction.py — _build_summary() — THIS IS THE ENTIRE IMPLEMENTATION
# Extracts: first 100 chars of each user message
# Extracts: tool call names (no arguments, no results)
# Extracts: last 200 chars of assistant messages
# Everything else? Destroyed.

The model continues responding but it's flying blind. It re-reads files it already analyzed. It forgets constraints. It contradicts decisions it already made.

Additionally, the token estimator uses len(text) // 4 — a heuristic that underestimates code tokens by 25-40%, causing compaction to trigger later than it should.

🎯 What You Must Solve

  1. Replace regex-based compaction with LLM-powered semantic summarization. Use an async Haiku call to produce a structured summary preserving: decisions, code changes, files modified, constraints discovered, open questions, and reasoning chains.
  2. Design the summary schema. The compacted output must be structured (not free-form prose) so the agent loop can inject it reliably.
  3. Fix the token estimator. Replace len(text) // 4 with accurate token counting.
  4. Build a quality benchmark. 10+ real multi-turn sessions, compare old vs new, measure accuracy post-compaction.
  5. Wire it in. Integrate into both agent_loop() and agent_loop_stream().

🔄 Input / Output Examples

▶ Before (Current Regex Compaction)

Topics discussed: authentication, middleware.
Tools used: read_file, edit_file, run_command.
Files: src/auth.py, src/middleware.py.
Last conclusion: "...the fix has been applied
to the middleware chain."

▶ After (Your LLM Compaction)

{
  "decisions": [
    "Switched from session-based to JWT auth
     because Azure AD requires stateless tokens",
    "Added middleware ordering fix: auth must
     run before rate-limit (was reversed)"
  ],
  "code_changes": [
    {"file": "src/auth.py", "change": "Replaced
      session lookup with JWT decode"},
    {"file": "src/middleware.py", "change":
      "Reordered: auth → rate_limit → budget"}
  ],
  "constraints_discovered": [
    "Azure AD JWKS endpoint rate-limits at
     100 req/min — must cache"
  ],
  "open_questions": [
    "Should token refresh happen client-side
     or via platform API proxy?"
  ]
}

Success Criteria

  • Semantic fidelity: Post-compaction, agent answers 5 session questions with ≥90% accuracy. Current scores <20%.
  • No regression: Token count reduced by ≥60%.
  • Token counting accuracy: Within ±10% of actual.
  • Latency: Haiku summarization adds <2 seconds.
  • Tests: Unit tests, integration tests, benchmark suite.

📁 Key Files to Study

  • src/cli/src/tac_cli/core/compaction.pyThe 150-line file to replace
  • src/cli/src/tac_cli/core/agent_loop.py:622-677Compaction trigger in sync loop
  • src/cli/src/tac_cli/core/agent_loop.py:1050-1090Compaction trigger in streaming loop
  • src/cli/src/tac_cli/context/context_engine.py:320The len(text)//4 token estimator

💡 Hints

Use Haiku for Summarization

Fast (~500ms) and cheap. The compaction prompt design is the real challenge.

Think About What the Agent Needs

The summary isn't for a human. It's for the next LLM turn. Decisions, constraints, file states.

Progressive Summarization

Summarize old turns, keep recent turns verbatim. Detail for last 5 turns, decisions from turn 1-40.

Anthropic Token Counter

The SDK has client.count_tokens(). Eliminates the "context too large" 400 errors.

Challenge #2

Team Brain

Semantic Organizational Memory That Actually Remembers
▲ Hard Full Stack NLP / Embeddings Python + React

🔴 The Problem

The "organizational memory" is 18 regex patterns that can't understand what you said, can't find what it stored, and silently deletes your most important decisions.

Extraction is regex: 18 hardcoded patterns decide what's "worth remembering." When a pattern fires, the entire raw message (500+ words) is saved — no extraction of the actual insight.

Retrieval is keyword Jaccard: "prod database connection" won't find "production postgresql" because exact match fails.

Eviction is blind LRU: Hard cap of 500 memories. Critical architectural decisions evicted because nobody accessed them recently.

🎯 What You Must Solve

  1. LLM-powered memory extraction: Replace regex with Haiku calls that extract clean, atomic memory statements.
  2. Vector-based semantic retrieval: Replace Jaccard with ChromaDB embedding search.
  3. Importance-weighted eviction: Architectural decisions > preferences > casual mentions.
  4. Team memory scope: Project-scoped memories visible to all team members.
  5. Memory curation UI: /memories page — see, edit, pin, delete memories.

Success Criteria

  • Extraction precision: ≥80% of memory-worthy statements correctly identified from 50 messages.
  • Retrieval recall: Correct memory in top-3 results ≥85% of the time (current: <30%).
  • Importance eviction: Memories with importance ≥0.8 preserved regardless of recency.
  • Team visibility: Two users on same project see each other's memories.
  • Web UI: /memories page renders with search, importance scores, pin/delete.

📁 Key Files

  • src/cli/src/tac_cli/context/memory_extractor.py18 regex patterns to replace
  • src/cli/src/tac_cli/context/memory_store.pySQLite store with Jaccard retrieval
  • src/extensions/tac-context/ChromaDB — reuse for memory embeddings
Challenge #3

One Loop to Rule Them All

Unified Agent Loop + Real-Time Observability
▲▲ Very Hard Deep Refactor Agent Core Full Stack

🔴 The Problem

The agent loop is duplicated three times. The copies are already diverging — one divergence is a security bug.

agent_loop.py is 1,200+ lines with three implementations: sync (~358 lines), streaming (~373 lines), and architect (sync only). They share zero code.

# BUG 1: Hooks don't run on parallel tool calls in streaming mode
# BUG 2: Checkpoints miss edit summaries in streaming
# BUG 3: Architect mode only wraps sync — no streaming architect

The AG-UI event system (13 event types, SSE, EventBus) exists but the agent loop emits zero events.

🎯 What You Must Solve

  1. Unify into a single AgentLoop class with one pipeline and streaming/non-streaming output adapter.
  2. Fix the three divergence bugs.
  3. Wire AG-UI events — emit tool_call.start/end, text.delta, step events.
  4. Build a live agent run viewer in the web console.
  5. Zero regressions on existing tests.

Success Criteria

  • Single implementation: No duplicated logic.
  • Bugs fixed: Hooks in streaming, checkpoints with summaries, architect in both modes.
  • Events emitted: Every tool call and text generation emits AG-UI events.
  • Web viewer: Real-time tool calls and streaming text visible in console.
  • Zero regressions: All existing tests pass.

📁 Key Files

  • src/cli/src/tac_cli/core/agent_loop.py1,200-line file to unify
  • src/cli/src/tac_cli/workflow/workflow_events.pyAG-UI events + EventBus
  • src/web/src/hooks/useSSE.tsSSE client — extend for agent events
Challenge #4

Agent Forge

No-Code Agent Builder + Marketplace for Everyone
▲ Hard Full Stack React + Python UX Design

🔴 The Problem

"Non-technical users can build agents." Today, building an agent requires writing YAML by hand. The web builder is a skeleton with no functionality.

9 agents exist as YAML files. Creating a new one means: open a text editor, write valid YAML, understand MCP extensions, know model tiers, define output schemas. A consultant can't do this.

🎯 What You Must Solve

  1. Fully functional Agent Builder UI — guided form with live YAML preview and test sandbox.
  2. Agent Marketplace page — browse all agents, fork & customize, categories, star/favorite.
  3. Wire the Virtual Keys page for API access.

Success Criteria

  • End-to-end: Non-technical user creates, tests, publishes an agent in <10 minutes.
  • Valid output: Generated YAML passes the validator.
  • Test sandbox: "Try this agent" sends a real request and shows output.
  • Marketplace: All 9 built-in agents browsable with Fork button.

📁 Key Files

  • src/agents/*.yaml9 existing agents — the schema to produce
  • src/web/src/pages/admin/AgentBuilder.tsxSkeleton to complete
  • src/platform-api/src/platform_api/agents_api.pyAgent CRUD endpoints
Challenge #5

Smart Router

Semantic Model Routing with Feedback Learning
▲ Hard Platform API ML / Classification Analytics

🔴 The Problem

Every request passes through a model router that decides: Haiku, Sonnet, or Opus. It uses message length and 12 keywords. It's wrong constantly.
# router.py — THE ACTUAL LOGIC
if msg_len < 200 and not tool_hints:
    return "fast"          # → Haiku
elif msg_len > 2000 and keyword in ["architect", "design", "refactor"]:
    return "powerful"      # → Opus
else:
    return "standard"      # → Sonnet

"why is auth broken?" (22 chars) → routes to Haiku. Needs deep reasoning. Routing logic is duplicated between platform API and CLI with diverging keywords.

🎯 What You Must Solve

  1. Semantic routing classifier using a Haiku meta-call.
  2. Feedback loop from implicit user signals.
  3. Unify routing logic — single implementation.
  4. Routing analytics dashboard.

Success Criteria

  • Accuracy: ≥80% agreement with human labels (current: ~50%).
  • Cost: ≥20% reduction from Opus misroutes.
  • Latency: Pre-flight adds <800ms.
  • Unified: CLI and API use same routing logic.

📁 Key Files

  • src/platform-api/src/platform_api/router.py:773-893Keyword classifier to replace
  • src/cli/src/tac_cli/context/context_engine.py:292-316CLI-side mirror to eliminate
Challenge #6

Guardian

Production-Grade Guardrails That Actually Guard
▲▲ Very Hard Security Platform API NLP

🔴 The Problem

The guardrail system has a config page, an engine, and webhook events — but is never called during request processing. Zero runtime enforcement. A locked door with no wall.

PII detection is detection-only (no redaction). Only 4 PII patterns. No prompt injection detection. No jailbreak detection. Regex-only.

🎯 What You Must Solve

  1. Wire guardrails into the request pipeline — input & output checking.
  2. PII redaction with typed placeholders before LLM call.
  3. Prompt injection detection.
  4. Fire guardrail.violation webhooks.
  5. Violation dashboard in web console.

Success Criteria

  • Runtime enforcement: 100% of requests checked. PII never reaches the LLM.
  • PII redaction: Typed placeholders. Optional restore-on-response.
  • Injection blocked: ≥90% detection on 20 known patterns, <5% false positives.
  • Webhooks fire: Every violation fires the event.

📁 Key Files

  • src/platform-api/src/platform_api/guardrails.pyEngine to wire in and extend
  • src/platform-api/src/platform_api/app.pyWire guardrails here
  • src/platform-api/src/platform_api/webhooks.pyFire events here
Challenge #7

Flow Engine

Durable, Parallel, Observable Workflow Execution
▲▲ Very Hard Workflow Core Distributed Systems Python Async

🔴 The Problem

The workflow engine has a runner, persistence store, and event system — but they aren't connected. Workflows can't survive a crash, can't resume after human approval, can't run steps in parallel.
WorkflowRunner   → Executes sequentially. No persistence. No events.
WorkflowStore    → Persists to SQLite. Nobody calls save_run() during execution.
EventBus         → 13 AG-UI events. Runner never emits events.
WorkflowConfig   → Condition engine handles only 3 operators.

🎯 What You Must Solve

  1. Connect Runner to Store — save state after every step.
  2. Resume-after-interrupt — skip completed steps, continue from gate.
  3. Parallel step execution — fan-out with asyncio.gather().
  4. Emit AG-UI events at every lifecycle point.
  5. Extended conditions — numeric, boolean, list membership, AND/OR.
  6. Step-level retry and timeout.

Success Criteria

  • Crash recovery: Kill mid-workflow. Restart. Resumes from last completed step.
  • Resume-after-interrupt: Paused workflow continues with human input.
  • Parallel execution: Fan-out runs concurrently (timing proves it).
  • Events: step.start, step.end, state.delta, interrupt, run.end in correct order.

📁 Key Files

  • src/cli/src/tac_cli/workflow/workflow_runner.pySequential runner to upgrade
  • src/cli/src/tac_cli/workflow/workflow_store.pySQLite persistence — connect to runner
  • src/cli/src/tac_cli/workflow/workflow_events.pyEventBus + 13 event types — wire in
Challenge #8

Fortress

Zero-Trust Security & Multi-Tenant Isolation
▲▲ Very Hard Security Architecture Platform API

🔴 The Problem

Auth is disabled by default. API key users bypass ALL security. The response cache serves User A's data to User B. No tenant isolation.
# Issue 1: Auth disabled by default — any JWT accepted without validation
# Issue 2: API key → user=None → FULL ACCESS, no RBAC/rate-limit/budget
# Issue 3: cache_key = hash(model + message) — no user scoping
# Issue 4: Rate limiting in-memory + disabled by default
# Issue 5: Virtual keys not wired into main auth flow
# Issue 6: Agent cards exposed to anonymous callers

🎯 What You Must Solve

  1. Fix the auth bypass chain — auth enabled by default, API key users get identity.
  2. Fix response cache — user_id + system_prompt_hash in cache key.
  3. Add tenant isolation — org_id throughout the pipeline.
  4. Distributed rate limiting — Redis for production.
  5. Tamper-evident audit logging — hash-chain rows.

Success Criteria

  • No anonymous access: All callers get identity with RBAC + rate limits.
  • Cache isolation: User A's response never served to User B.
  • Tenant isolation: Org A admin can't see Org B's data.
  • Tamper evidence: Deleted audit row breaks the hash chain.

📁 Key Files

  • src/platform-api/src/platform_api/app.pyMain request pipeline
  • src/platform-api/src/platform_api/auth.pyJWT + API key auth — fix bypass
  • src/platform-api/src/platform_api/cache.pyResponse cache — fix cross-user pollution
  • src/platform-api/src/platform_api/audit.pyAudit trail — add hash chain
Challenge #9

Deep Search

Next-Generation RAG with Reranking & Entity Intelligence
▲ Hard RAG / Search NLP Python

🔴 The Problem

The RAG system has dual backends and RRF fusion — but no reranking, no entity linking, post-retrieval ACL filtering (security risk), and no recency signal.
  • No reranking pass after RRF fusion.
  • No entity-to-chunk linking — can't query "show all code using PostgreSQL."
  • ACL filtering post-retrieval — unauthorized chunks briefly in memory.
  • No recency signal — 2-year-old file ranks same as yesterday's.
  • No query expansion or HyDE — vocabulary gap between natural language and code.

🎯 What You Must Solve

  1. Add reranking — cross-encoder or Haiku reranking call.
  2. Wire entity extraction into ingestion.
  3. Push ACL filtering into SQL.
  4. Recency-weighted ranking.
  5. Query expansion + optional HyDE.

Success Criteria

  • Reranking: ≥15% higher NDCG@5 than RRF-only on 50-query benchmark.
  • Entity queries: search_by_entity("PostgreSQL") finds psycopg2/pg_connection chunks.
  • ACL in SQL: Unauthorized chunks never appear in raw query results.
  • Recency: Recently modified chunk ranks higher when equally relevant.

📁 Key Files

  • src/extensions/tac-context/src/tac_context/hybrid_search.pyChromaDB + BM25 local search
  • src/extensions/tac-context/src/tac_context/pg_hybrid_search.pyPostgreSQL hybrid search — ACL fix
  • src/extensions/tac-context/src/tac_context/ingest.pyWire entity extraction here
  • src/extensions/tac-context/src/tac_context/entity_graph.pyEntity extraction + graph
Challenge #10

Self-Healer

Self-Healing Agent Loop with Retry, Recovery & Hook Transforms
▲ Hard Agent Loop Core Resilience Python Async

🔴 The Problem

When the API returns a 429 or 503, the agent loop crashes. Hook transforms are silently ignored. Background task failures go unnoticed. No circuit breaker.
  1. No retry on transient errors — 429/503 terminates the session.
  2. Hook modified_args built but never consumed — 90% built, 100% broken.
  3. Background task failures swallowed — user never knows.
  4. Empty LLM responses are terminal — no retry.
  5. No circuit breaker — 5-tool run waits 10 minutes before failing.

🎯 What You Must Solve

  1. Retry with exponential backoff — 429, 502, 503, 504. Max 3 retries. Respect Retry-After.
  2. Connect hook modified_args to tool execution.
  3. Surface background task failures in the REPL.
  4. Retry empty responses.
  5. Circuit breaker — fast-fail after N consecutive failures.

Success Criteria

  • Retry works: Mock 503→503→200 — agent succeeds on 3rd attempt.
  • Hook transforms: modified_args applied to tool execution.
  • Failure notification: Background task failure surfaces within 5 seconds.
  • Circuit breaker: 4th request fast-fails after 3 consecutive failures.

📁 Key Files

  • src/cli/src/tac_cli/core/agent_loop.pyAdd retry + modified_args in both loops
  • src/cli/src/tac_cli/core/hooks.pyget_modified_args() already exists
  • src/cli/src/tac_cli/core/gateway_client.pyAdd retry + circuit breaker

Bring Your Own Challenge

Got a problem that keeps you up at night? Solve it here.

💡 How It Works

Don't see your problem in the 10 listed challenges? No worries — you can bring your own. The same judging criteria, prizes, and timeline apply. Your project just needs to solve a real problem with AI.

1

Identify a real pain point — something your team or org deals with regularly. Not a toy demo.

2

AI must be core — the solution should use AI meaningfully, not as a bolt-on feature.

3

Ship something working — prototype is fine, but it must demo live on Demo Day.

4

Describe the problem clearly in your registration — judges need to understand the "why" as much as the "what."

🎯 Ideas to Get You Started

Internal Tools

AI-powered meeting summarizer, automated onboarding assistant, smart document search for your team.

Developer Productivity

Code review bot, automated test generator, deployment health checker, incident root-cause analyzer.

Client-Facing

Proposal generator, intelligent project scoping, automated status reports, client communication assistant.

Operations

HR policy Q&A bot, expense anomaly detector, resource allocation optimizer, knowledge base builder.

Participate
Register today. Fill in your idea and solution plan during the week. Submit before the deadline.
1
Register
2
Idea Brief
3
Solution Plan
4
Submit
Step 01

Register Your Interest

Sign up today — takes 2 minutes. Come back during the week for the remaining steps.

I'm interested in the t.a.c AI Challenge. I understand the build week runs 14–20 March 2026 and I'll submit my project before the deadline. I commit to presenting on Demo Day if shortlisted.
🎉

You're in!

We've logged your interest. Come back anytime this week to fill in your idea brief, solution plan, and final submission.

📋
Day 1
Fill in your Idea Brief
🔨
Day 2–6
Build with AI tools
🚀
Day 7
Submit project + slides
Step 02

Define Your Problem

Great solutions start with a clearly understood problem. Research it well.

A concise name for the problem (max 80 chars)
0/80
What is happening? When? What are the pain points?
Which people, teams, or roles experience this?
Data or interviews validating this is a real problem.
How is this handled today?
How significant would solving this be?
Minor irritation Transformative
Step 03

Design Your Solution

You have 1 week. Think MVP. Think AI-first. Think shipped.

How does it work? How is AI involved?
Select all that apply
Be concrete
3–5 core features, one per line
Languages, frameworks, platforms
What are you NOT building this week?
Step 04

Submit Your Project

You built it. Now show it. Submit before the deadline.

What to submit
✶ Working prototype (deployed or runnable locally)
✶ GitHub repo with a README
✶ Presentation slides (5–8 slides max)
✶ 3-minute demo video (optional but recommended)
Must be accessible — public or add judges as collaborators
Google Slides, Canva, PowerPoint Online, or PDF
Loom, YouTube (unlisted) — 3 mins max
If deployed — Vercel, Render, Azure, etc.
Walk judges through what the app does
What was hardest? How did AI help?
What % was AI-generated or AI-assisted?
I confirm this is my original work built during the challenge week. I'm available for Demo Day and consent to the project being showcased internally.
🏆

Submission Received!

Your project is in the running. Judges will review after the deadline — shortlisted teams will be invited to Demo Day.

What Happens Next

Judges review all submissions after the deadline
Shortlisted teams invited to live Demo Day
Winners announced early next week (23–27 Mar)