Compass

Implementation deck / May 26, 2026

Compass answer experience

Make Compass sound like help.

The goal is not a louder personality. It is a public-ready Compass response: accurate data, plain-language meaning, a light NCTQ connection when it fits, and a useful next step.

Target answer shape

Here is the answer, and here is how to use it.

For a first pass, Compass can compare first-year BA salaries across high-FRPL districts it covers. Districts without current reviewed values stay visible as coverage gaps, not zeros.

Compare similar districts See NCTQ salary context Narrow to one state

Compass

The current itch

What Natalie is reacting to

The issue is not accuracy. It is that the trace leaks into the answer.

Compass can return a valid table and still feel too robotic for a public-facing NCTQ product.

Visible symptom

"cells without current data stay visible in the page"

Visible symptom

"for 61 of 133 requested data points"

Visible symptom

"I could not resolve a numeric metric"

Compass

Thesis

Plain mental model

Compass gets more conversational by becoming more legible.

The deterministic engine should keep doing what it does well: decide what data exists, what counts, and what can be cited. The answer layer should translate the validated answer packet into clear language, sensible caveats, and next moves.

Compass

Architecture frame

Justin's useful frame

Think Model, Controller, View.

That framing keeps the conversation practical. We are not asking the model to become the source of truth. We are building a better view over the truth Compass already validates.

Layer

Compass meaning

Owns

Must not own

Model

Catalog, data, source records, coverage states.

Facts and evidence.

Warmth or narrative.

Controller

Planner, resolver, executor, validators.

Route, query shape, artifact validity.

Public answer copy.

View

Renderer, answer brief, follow-up chips.

Language, layout, next step.

New facts.

Compass

Keep the good part

Why the machinery matters

The answer layer should not loosen the evidence chain.

Compass is valuable because it uses approved functions, typed plans, deterministic execution, coverage states, citations, and traces. Voice work should sit after validation and use only admitted inputs.

LLMPlannerTyped route, operation, metrics, selection.

CatalogResolverApproved IDs and candidate-only adjudication.

PythonExecutorRows, years, coverage states, citations.

PythonValidatorChecks row, metric, denominator, source integrity.

ViewAnswer layerPlain language, NCTQ context, structured follow-ups.

Compass

Product contract

What good sounds like

Every strong Compass answer should do four jobs.

1. Answer

Lead with the result.

Start with the substantive answer, not the retrieval action or system process.

2. Explain

Translate caveats.

Say what missing, older, or non-numeric data means in normal language.

3. Connect

Add NCTQ context.

When relevant, connect the result to a policy area, rationale, or publication.

4. Continue

Offer next moves.

Use structured follow-up options that help users go deeper or pivot.

Compass

Answer inputs

Validated answer packet

The answer layer gets a packet, not freedom.

That packet is the difference between better writing and ungoverned generation. It should include the validated facts plus enough context to write like a knowledgeable NCTQ product.

Result facts Rows, values, ranks, years, denominators, filters, and coverage states.

Source facts Citation markers, source titles, source years, and admitted source IDs.

Answer shape Ranking, lookup, peer comparison, sparse table, no stance, or policy guidance.

NCTQ context Approved policy area, rationale, publication, and exemplar references.

Voice rules Plain-language copy guidance loaded from reviewable assets.

Next actions Typed follow-up candidates with route hints and source constraints.

Future brief vocabulary EvidencePack, ClaimLedger, and AdvisorBrief remain useful names for a later sealed-brief writer, not the first implementation slice.

Compass

Renderer voice

Immediate visible change

Translate system facts into user language.

Current

"for 61 of 133 requested data points"

Target

"About half of the requested district-metric pairs have current reviewed data, so the table separates available values from gaps."

Current

"I could not resolve a numeric metric."

Target

"Compass needs a specific measurable policy item for this comparison. I can compare starting salaries, maximum salaries, observation counts, or leave days."

Compass

Analysis without vibes

Copy-paste useful

Better answers need typed insight primitives, not ad hoc flourish.

Users liked answer text that could move into a memo. We can support that without letting the model invent conclusions by asking the system to compute and expose a few bounded observations.

Range Highest, lowest, middle, and spread for numeric result sets.

Distribution How many districts fall above or below a validated threshold.

Contrast Anchor district versus peer group or state universe.

Coverage What portion of the request has current reviewed data.

Policy context Which NCTQ policy idea is relevant, if any.

Limit What Compass cannot conclude from the available data.

Compass

NCTQ fit

Not generic warmth

Compass should sound like it understands NCTQ's policy world.

For teacher compensation, NCTQ frames pay as a lever for recruitment, retention, and staffing challenges. The Teacher Contract Database tracks salaries across careers, additional pay, evaluations, leave, benefits, class size, and more. Compass can use that context when it is clearly tied to the user's question.

The voice is knowledgeable, not pushy.

A salary answer can mention strategic pay or Smart Money when relevant. A formal observation answer can point to evaluation context. A leave answer can point to NCTQ leave research. It should not turn every answer into a policy speech.

Compass

Engagement

Bring back the helpful next move

Suggested follow-ups should be structured options below the answer.

Conversation platforms treat suggestion chips as a way to continue or pivot the conversation. For Compass, the chips should be typed metadata, not loose prose at the end of the response.

Go deeper

Stay in the same metric or result set.

Show only current dataCompare top five

Connect to NCTQ

Bring in a relevant publication, rationale, or policy area.

See salary researchShow strategic pay context

Branch smartly

Move to a related district, peer group, or policy area.

Compare similar districtsCheck leave policies

Compass

MVP scenario

User
How do starting teacher salaries compare across the districts Compass covers?

Compass target
Among districts with current reviewed salary data, Compass can rank starting pay and show which districts have gaps. Starting salary is only one piece of the compensation story, so the next useful step is to compare career earnings, strategic pay, or similar districts.

Rank by career earnings Open Smart Money context Compare similar districts

Why this scenario works

Teacher compensation shows the full Compass value.

It exercises numeric data, coverage caveats, district comparisons, NCTQ publication context, and natural follow-ups. It also lets us show the difference between public-ready analysis and raw mechanics.

1Deterministic ranking protects the data.
2Renderer wording explains gaps like a person.
3Follow-ups create a guided policy conversation.

Compass

System design

Implementation shape

Build the answer experience in three layers.

1. Governed guidance

Move prompts and prose guidance into reviewable assets. Keep dynamic runtime context in Python.

This is issue #901.

2. Deterministic view

Rewrite renderer copy for methodology, coverage, no-content, denominator, and table-cell language.

This is the first user-visible M4 work.

3. Guarded answer layer

Let a stronger model improve prose over a sealed answer brief, then validate it before the user sees it.

Start in shadow mode, then gated mode for selected answer types.

Compass

Issue #901

Where instructions live

Prompts become assets people can review.

Pydantic AI's instructions model maps well to Compass: stable instructions can live in markdown, dynamic instructions can still be assembled from typed dependencies, and Pydantic field descriptions should stay focused on schema semantics.

src/compass_backend/prompts/
  README.md
  loader.py
  model_instructions/
    planner.md
    catalog_adjudicator.md
    criterion_classifier.md
  planner_guidance/
    ranking-and-sorting.md
    teacher-compensation-salary.md
    policy-guidance-followups.md
  answer_style_guides/
    default.md

Prompt assets make future voice changes reviewable. The answer-layer prompt is a product asset, not hidden implementation prose.

Compass

Model routing

Fast / lower cost
  catalog adjudication
  criterion classification
  follow-up shape classification

Balanced
  typed planning
  answer-shape selection
  policy-context matching

Highest capability
  answer-layer sealed-brief synthesis
  thorny refusal or ambiguity repair
  evaluation judge samples

Use intelligence where it yields value

Do not pay Opus-class prices for bounded classification.

Anthropic's current model docs frame Haiku as fastest, Sonnet as the speed/intelligence balance, and Opus as the most capable for complex reasoning. Compass should route by task risk: cheap and fast for constrained decisions, stronger models for typed planning, and the best writer only over a sealed validated brief.

Compass

Implementation path

Sprint sequence

Build from reviewability to visible quality.

Now

Fix renderer voice hygiene and add answer-layer shadow mode over a sealed brief.

Then

Add `manifest.suggested_followups`, frontend chips, and follow-up shape evaluation.

Always

Fall back to the deterministic answer if the model changes facts, drops caveats, mutates tables, or adds unsupported NCTQ context.

Compass

Validation

How we prove it

Measure the data machine and the answer view separately.

The scorecard can keep measuring data fidelity, selection, citations, and denominator correctness. Voice needs its own checks: banned internal phrases, plain-language caveats, NCTQ relevance, useful follow-ups, and public-ready professionalism.

Gate

Question

Tool

Pass signal

Data fidelity

Did the answer preserve validated facts?

Scenario gate / scorecard.

No regression.

Prose hygiene

Did internal terms leak?

Banned-phrase lint.

No trace language.

Coverage clarity

Can a user understand gaps?

Renderer tests.

Short cells, clear note.

NCTQ fit

Was context relevant and sourced?

Policy-content checks.

No generic claims.

Follow-ups

Do next steps deepen, connect, or branch?

Manifest contract.

3 useful typed options.

Compass

Guardrails

What we should not do

A charming answer is not worth a weaker answer.

ANo LLM layer can add facts, rows, citations, or counts.
BNo policy stance without an admitted NCTQ source or configured content surface.
CNo generic cheerleading or filler follow-up questions.
DNo prompt edits hidden inside Python when product review is needed.
ENo model upgrade without a measured quality, speed, and cost reason.

The bar is simple: public-ready, grounded, useful.

Compass should feel more human because it explains itself better, not because it guesses more.

Compass

Takeaway

Final frame

Compass gets its voice by respecting its boundaries.

The plan is not to make Compass perform personality. It is to give the product a better answer view: facts from the validated pipeline, language from governed guidance, NCTQ context from approved sources, and follow-ups as structured product affordances.

Compass

Source notes

Evidence used

Source notes

Compass issue#901 Organize Compass prompt and prose guidance into governed, reviewable assets

Canonical plandocs/plans/2026-05-26-compass-answer-experience-plan.md

Design review May 26 feedback folded into the canonical plan: legible first, M4 renderer hygiene first, deterministic follow-ups next, sealed-brief synthesis later.

Guidance ownershipdocs/architecture/compass-prompt-and-prose-guidance.md and src/compass_backend/prompts/README.md

Current runtime surfacessrc/compass_backend/planning/, catalog/, execution/, rendering/, and agents/model_settings.py

Pydantic AIAgents and instructions and structured output validation

Model routingClaude model overview and Claude pricing

Suggested actionsMicrosoft suggested actions and Google suggestion chips

NCTQ contextTeacher compensation, TCD research rationale, and state compensation levers

Superseded PR reviewed#876 Revise Compass advisor answer deck; salvaged the high-FRPL salary scenario and sealed-brief vocabulary before closing.