Milestone Review Program Refresh (v4): What Improves, and What’s Still Missing

Context and Initial Assessment

The Milestone Review Program is the mechanism used in Project Catalyst to assess Statements of Milestones (SoMs) and Proofs of Achievement (PoAs) before funded projects can continue progressing or receive payment. In practice, it acts as one of the ecosystem’s main accountability layers for treasury-funded work, making its design, reviewer quality, incentives, and oversight standards critical to the credibility of Catalyst itself.

After reviewing a draft to update the program recently shared by Danny (Catalyst Team), it is clear the intent is to improve procedural clarity around how the Milestone Review process should operate. Compared to older guidance, there are real improvements: clearer templates, more explicit verification steps, stronger emphasis on evidence, less ambiguity in reviewer responsibilities, and more attention to plagiarism and factual accuracy. The move toward blind review and the proposed strike mechanism also signal an effort to increase accountability and reduce low-effort reviewing. These are meaningful operational steps and should be acknowledged as such.

However, the scope of this draft is noticeably narrower than the direction communicated weeks earlier in Danny’s article, “Scaling Accountability and Compliance in Catalyst”.[1] That article outlined a trajectory that included (1) rejuvenating the reviewer talent pool through training and recruitment of verifiable technical expertise, (2) introducing mandatory independent QA for high-value projects, and (3) explicitly balancing higher standards with capacity and cost constraints.

The current draft mainly improves how Milestone reviews (PoAs and SoMs) are written and structured, but it does not yet describe the structural mechanisms implied by that earlier vision, nor does it specify what must change to meet the ecosystem’s accountability expectations. In other words, it is a meaningful operational improvement, but it does not yet amount to the “systemic accountability” shift described in the earlier framing.

The Milestone Review program began as a pilot around Fund 10, nearly three years ago. Since then, a significant backlog of known issues and community feedback has accumulated, including long delays in implementing even basic operational improvements. This context matters because the community’s concern is no longer limited to workflow clarity. The core question is whether the Milestone Review program has the structural design, incentives, and quality controls required for the scale of treasury funding it is responsible for safeguarding.

That question ultimately depends on whether the program addresses Milestone Reviewer vetting and selection transparency, qualification pathways, tiered oversight (including independent QA for high-value projects), incentive alignment, and an effort-based compensation model.


What’s Still Missing (Structural Gaps vs the Stated Direction)

The draft improves methodology execution, but the recent community debate and the earlier “systemic accountability” framing point to structural governance gaps that remain unaddressed.

Below are the key gaps.


1) Reviewer Vetting: No Clear System for “Verifiable Talent”

Why it matters

In the earlier direction (“Scaling Accountability and Compliance in Catalyst”), the program signaled a move toward verifiable technical talent, plus targeted training and recruitment. That implies a measurable, auditable process for confirming reviewer competence, especially for complex and high-risk milestones.

Today, the Milestone Reviewer (MR) pool is still formed through a process that is only partially disclosed. Reviewers are selected by the Catalyst team, but the program does not clearly explain how “verifiable talent” is established, how selection decisions are made, or how reviewer capability is validated beyond self-reported information.

What the current system looks like (in practice)

  1. Primary eligibility filter (Community Review performance)
    Eligibility is largely based on Community Reviewer Level 2 status. Based on the program’s described eligibility expectations, this implies a large review sample across multiple recent funds (e.g., 100 reviews written between Fund11 and Fund14) with a ~90% approval rate.[2] This is a participation and acceptance metric; it does not, by itself, verify analytical depth, rigor, or domain competence.
  2. Expression of interest (EOI) form + self-reported profile
    Eligible candidates are typically invited to complete a simple onboarding form (EOI). The form collects:
  • basic identity/contact details (email, name, social handles),
  • confirmation of Terms/Fund Rules/Privacy Policy,
  • whether the person reviewed SoM/PoA in prior funds,
  • whether they submitted proposals (potential conflict exposure),
  • self-declared programming/markup languages (e.g., Python, JS, SQL),
  • self-declared Cardano tech stack familiarity (e.g., Plutus, Marlowe),
  • whether they are a developer (yes/no),
  • and acknowledgement that the role requires formal identity verification.

This suggests some level of matching and risk management, but it is still mostly self-attestation, not a competency check.

  1. Cardano Ambassadors as an alternate entry path (bypassing CR progression)
    In parallel to the CR-based eligibility path, Cardano Ambassadors have been invited to participate as Milestone Reviewers via internal Catalyst communications (e.g., Telegram groups and related channels). In practice, this functions as an alternate entry route that can bypass the CR Level 2 progression requirements (i.e., the multi-fund review sample and approval ratio thresholds).

The structural issue is that there is no public documentation describing:

  • what vetting criteria apply to Ambassadors for MR eligibility,
  • how their competence is assessed beyond ambassador status,
  • whether and how performance is monitored,
  • what conditions trigger removal from the MR cohort,
  • whether participation is time-bound, performance-bound, or effectively persistent while ambassador status remains.

This creates an additional, opaque eligibility channel where “fitness to review” is assumed by role affiliation rather than explained as a verifiable competency standard.

  1. Selection remains opaque after the form (private notifications, no public auditability)
    After the EOI form (and regardless of entry path), candidates are notified by email whether they were selected or not. However, this communication is handled privately, and the broader process remains non-auditable from the outside.

There is no public-facing place (dashboard, website page, or published list) that shows:

  • the full set of eligible applicants for that fund cycle,
  • who was selected vs not selected,
  • the criteria used to make those decisions,
  • any ranking or skill-matching rationale,
  • or individualized reasons for acceptance or rejection.

As a result, the outcome may be communicated to each applicant, but the selection process is not transparent, explainable, or independently verifiable by the community.


What’s missing

  • Objective entry assessment (not just CR participation metrics or role affiliation)
  • Practical review simulation before onboarding (e.g. a short test review with expected outputs)
  • Competency requirements tied to project types (technical vs non-technical deliverables)
  • Periodic re-certification / revalidation (to prevent drift over time)
  • Transparent selection logic (published criteria + auditable rationale for acceptance/rejection)
  • Publicly defined standards for Ambassador-based MR eligibility and ongoing performance management

Risk if unchanged

If the gate remains “CR performance thresholds or Ambassador affiliation + a lightweight self-reported form + opaque selection,” the MR cohort can inherit systemic weaknesses and continue to produce inconsistent outcomes. Templates and checklists improve how reviews are written, but they do not solve the core trust problem if the talent filter and selection process remain weak, unverifiable, and hard to audit.


2) Selection Transparency: Who Gets In, and Why?

Why it matters

Even among eligible candidates, only some are selected. The draft does not clarify how selection happens, how performance is measured, or how skill matching is decided.

What’s missing

  • published selection criteria
  • transparent selection workflow
  • auditable rationale for acceptance/rejection
  • clarity on how “skill matching” is done in practice

Risk if unchanged

Opaque selection undermines legitimacy. Even a good process looks arbitrary if the entry gate is not explainable.


3) Training and Qualification: Higher Standards Without Skill Building

Why it matters

The earlier direction explicitly pointed toward training and stronger talent sourcing. The draft raises expectations for reviewer rigor, but it does not define a practical pathway for reviewers to learn the standard, prove they can apply it, and be assigned work that matches their competence.

What’s missing (and practical suggestions to close the gap)

  • Mandatory onboarding with a pass/fail check (not just “read the docs”)
    A short onboarding package that every new reviewer must complete before receiving assignments, including:
    • a 30–60 minute briefing on the actual approval logic (A1/B1/C1), PTG rules, timelines, and evidence requirements,
    • a checklist of “common failure modes” (missing access, weak metrics, unverifiable claims, plagiarism indicators),
    • and a simple pass/fail quiz to confirm understanding (e.g., 10–20 questions with scenario-based prompts).
  • Practice review (simulation) before real assignments
    One small “training PoA” (mock or anonymized past PoA) where the reviewer must:
    • write a review using the required template,
    • identify whether evidence actually proves criteria,
    • and produce a valid PTG list if needed.
      This should be reviewed by QA/admin once, as a gate before the reviewer is allowed to review live milestones.
  • Standardized training modules that match real workload
    Short modules that can be taken independently, for example:
    • evidence verification basics (what counts as independently verifiable),
    • GitHub/repo access and commit-history checking,
    • metric validation (how to verify “users”, “transactions”, “KPIs”),
    • privacy and redaction basics,
    • how to request a Specialist Technical Review instead of guessing.
  • Domain tracks + specialization rules
    A structured way to classify reviewers and route work, e.g.:
    • Technical / Smart contracts / Infrastructure
    • Research / Data / Analytics
    • Community / Education / Events
      Reviewers should only be assigned to domains they are qualified for, and technical milestones should trigger specialist review when needed.
  • Clear competency signals (beyond self-reporting)
    Examples:
    • completion badges for the modules above,
    • periodic short re-tests (e.g., quarterly),
    • optional certifications recognized by Catalyst,
    • minimum demonstrated capability in simulation reviews.

Risk if unchanged

Raising standards without training and competency gating pushes reviewers to either guess, rely on shallow heuristics, or avoid deeper verification altogether. The result is predictable: inconsistent reviews, higher false-approvals, and growing distrust, even if templates and checklists improve.


4) Risk-Tiered Oversight: Accountability Should Scale With Funding (and Independent QA for High-Value Projects)

Why it matters

Danny’s earlier direction explicitly raised the idea of mandatory independent QA for high-value projects. More broadly, a credible accountability model must recognize a basic reality: not all projects carry the same financial risk. Oversight intensity should scale with the amount of treasury funds at stake.

If the program applies roughly the same review structure across low- and high-budget proposals, it creates two predictable failures:

  • high-budget proposals may receive insufficient scrutiny relative to the risk,
  • low-budget proposals may face disproportionate process burden, where the cost of oversight can approach or exceed the value of the project itself.

A tiered approach is not bureaucracy for its own sake. It is how you avoid spending “audit-level effort” on small grants while still ensuring that large grants cannot pass with shallow verification.

What the current system does (and what the draft does not yet define)

Today, the main visible mechanism that differentiates higher-risk proposals is reviewer count: higher-budget projects are typically assigned four milestone reviewers instead of two. Beyond that, there may be internal skill-matching and allocation decisions by the Catalyst team, but those criteria are not publicly disclosed and therefore cannot be evaluated externally.

The current draft improves review formatting and procedural clarity, but it still does not define a risk-tiered oversight model. It does not specify:

  • funding thresholds tied to budget size (e.g., “above X ADA”),
  • how rigor requirements change by tier,
  • when independent QA becomes mandatory,
  • or how external QA should interact with milestone approvals.

What’s missing (and a practical structure that could close the gap)

  • Explicit funding tiers (low / mid / high) tied to budget size
    Define three tiers based on requested funding (numbers are illustrative and should be debated through a transparent process):
    The key point is not the exact numbers. The key point is: oversight must scale with funding and risk.
    • Low budget: lightweight oversight
    • Mid budget: standard oversight + higher verification expectations
    • High budget: enhanced oversight + stricter evidence requirements + independent QA triggers
  • Tier-specific review expectations (what changes per tier)
    For example:
    • minimum evidence standards increase by tier,
    • stricter rules on metric verification for higher tiers,
    • mandatory reproducibility checks for technical deliverables in the high tier,
    • more formal timeline deviation handling for higher tiers.
  • Independent QA for the high tier (beyond “more reviewers”)
    Assigning four reviewers instead of two can help with redundancy, but it is not the same as independent QA. For high-budget projects, the program should define when additional assurance becomes mandatory, such as:
    The threshold that triggers this (e.g., projects above a certain ADA amount) should be established through a transparent method (workshop/process), not assumed.
    • third-party technical audits or security reviews (when applicable),
    • independent validation of claimed metrics and datasets,
    • formal external attestations for partnerships or operational claims.
  • Definition of “independent QA” (what it means in Catalyst terms)
    The program should clearly state:
    • what qualifies as independent (no conflict, no proposer control),
    • what QA types are acceptable (audit, metric validation, security review, etc.),
    • what minimum evidence format is required.
  • Who provides QA (and how conflicts are managed)
    The program should specify whether QA is provided by:
    • an approved pool of external auditors/validators,
    • a rotating specialist cohort,
    • institutional partners,
    • or proposer-procured auditors under strict disclosure and approval rules.
  • How QA integrates with milestone approval decisions
    The process should define:
    • whether QA is advisory or binding,
    • what happens when QA contradicts milestone reviewer conclusions,
    • escalation paths (e.g., specialist review, admin decision),
    • and how disputes are documented.
  • Avoiding a “cost > grant” trap for low-budget projects
    A tiered model should explicitly prevent heavy oversight requirements for small budgets, because it is irrational to impose audit-level friction when the grant itself is small. This is exactly why tiering is necessary.

Risk if unchanged

Without a tiered model, the program effectively treats vastly different risk profiles as equivalent, with the main adjustment being “2 reviewers vs 4 reviewers.” That is not a complete risk-based accountability design. The result is predictable: inconsistent scrutiny, unclear expectations, and repeated public backlash when high-budget projects appear under-checked or low-budget projects feel over-policed. Templates and procedural refinements do not solve this. Accountability must be designed as a function of budget size and risk, not only as a function of reviewer formatting and workflow.


5) Incentives: The System Still Rewards Approval, Not Scrutiny

Why it matters

The draft focuses heavily on reviewer behavior (templates, evidence standards, strike systems). That is useful, but it does not address the incentive mechanics that shape how reviewers behave in practice.

For someone outside the program, here is the core issue in plain terms:

In practice, Milestone Reviewers are compensated per PoA review unit (₳150 per PoA) and payment is typically triggered only once the PoA reaches resolution through the Milestone Module workflow (i.e., after the review process converges and the Catalyst team completes sign-off). If a PoA is returned for revision, reviewers may need to re-check updated evidence across multiple rounds. That rework takes real time, but the compensation is still anchored to the same reward, which structurally encourages convergence to resolution quickly rather than spending more time on due diligence per iteration.

This creates a built-in bias:

  • Deep verification increases workload and delays approval.
  • Returning a PoA for revision creates extra rounds of unpaid effort until the process reaches approval.
  • Approving quickly minimizes time and maximizes the probability of getting paid promptly.

Even if most reviewers act in good faith, a system like this predictably encourages:

  • shallow checks,
  • “benefit of the doubt” decisions,
  • avoidance of hard PTG calls,
  • and an approval bias that looks like leniency from the outside.

There is also an institutional conflict that is rarely stated openly: Catalyst is also evaluated externally through program success signals (proposal approval rates). A Milestone Review compensation model that nudges reviewers toward approvals risks distorting those signals.

If the system design financially nudges reviewers toward approval, it can inflate perceived program performance. A more neutral design (where reviewers are paid for verified work regardless of outcome) would likely reduce approval rates and increase friction, but it would also increase credibility.

What’s missing (and practical fixes that would align incentives)

  • Payment neutrality between approval and rejection (pay for the work, not the outcome)
    Reviewers should be compensated for completing a rigorous review, whether the result is:
    • Approve, or
    • Return for Revision, or
    • Escalate for admin/specialist review.
      This removes the financial pressure to “approve to get paid.”
  • A defined model for re-review effort (PTG rounds are real work)
    If PTG loops require reviewers to re-check evidence, the system should explicitly recognize that effort (e.g., a small fixed re-review fee per round, or a capped structure).
  • Incentives that reward verification effort (not throughput)
    The program should explicitly reward behaviors that increase accountability:
    • verifying metrics with primary sources,
    • reproducing claims (where applicable),
    • identifying missing access/evidence early,
    • escalating correctly when technical depth is needed.
  • Quality enforcement that doesn’t rely mainly on punishment after the fact
    Strike systems are reactive. They punish mistakes after outcomes are already at risk. Incentive alignment should reduce low-effort behavior upfront, instead of trying to correct it later through subjective enforcement.

Risk if unchanged

If reviewers are paid mainly when approvals go through, the system will keep nudging behavior toward faster approvals and fewer hard calls. That makes “leniency” a predictable output of the mechanism, not just an individual reviewer problem.


6) Compensation: Same Pay, Wildly Different Work (No Effort Model)

Why it matters

The earlier direction explicitly acknowledged capacity and cost trade-offs: higher standards and stronger assurance inevitably require more time, more skill, and more operational cost. If the program is serious about raising review quality, it must also be serious about the economics of review work.

Right now, the draft tightens expectations, but it does not provide an effort-based compensation model. Without one, the program is effectively asking reviewers to do more work under the same pay structure, which is not sustainable and does not produce the desired behavior.

The problem in plain terms

Milestone Review rewards are allocated per unit of work: ₳75 per milestone review within an SoM, and ₳150 per Proof of Achievement (PoA) review, regardless of the project’s budget size or technical complexity.[3]

That would only make sense if all reviews required roughly the same effort.

They don’t.

Bigger budgets usually mean bigger reviews.
A small project might request 50k ADA and have 3 milestones.
A larger project might request 1–2M ADA and have 6 milestones.

So the budget can be 20–40x larger, while the number of review checkpoints is often only 2x larger.

In practice, higher-budget projects also tend to include:

  • more deliverables,
  • more links and artifacts,
  • more technical material,
  • more dependencies and moving parts,
  • more metrics and KPIs,
  • more surface area to hide weak evidence behind volume.

So even if milestone count only doubles, the actual review effort per milestone can increase significantly.

Same pay creates shortcuts

A simple milestone can sometimes be checked in 20–30 minutes.

A high-budget, complex milestone cannot be checked well in 20–30 minutes unless the reviewer is:

  • guessing,
  • skimming,
  • trusting claims without proof,
  • or writing a “looks good” review.

That is not scrutiny. That is rubber-stamping.

When the system pays the same amount for:

  • a quick review of a simple milestone, and
  • a deep review of a complex milestone,

the system pushes behavior toward the fastest option: shallow work. Not because reviewers are inherently bad, but because the incentives are built that way.

This leads to predictable outcomes

If compensation remains flat, you should expect:

  • rushed reviews on high-budget projects,
  • weak evidence passing,
  • approval bias (because it is faster and less stressful),
  • low-quality “audit trails” written to satisfy the template minimums,
  • and repeated public backlash when poor approvals surface.

This is not a mystery. It is basic incentive design.

“More reviewers” doesn’t fix it

Yes, some higher-budget projects are assigned four reviewers instead of two.

That can help with redundancy, but it does not solve the core issue:

  • the work per reviewer can still be huge,
  • the pay is still flat,
  • time pressure still exists,
  • shallow reviews still get rewarded.

Four people doing shallow checks is not the same as one person doing a deep check.

What’s missing (and a practical direction that avoids overengineering)

  • Review tiers (low / mid / high complexity)
    A simple classification system so the program stops treating all milestones as equal.
  • Expected time ranges per tier
    Clear expectations like: “low complexity reviews usually take X–Y minutes; high complexity reviews usually take X–Y hours.”
  • Compensation tied to complexity and risk
    Higher-complexity milestones should pay more, because they require more time and expertise.
  • A public rationale for rates (the effort model)
    The program should publish a basic explanation such as:
    • how many hours a good review is expected to take by tier,
    • what hourly rate assumptions are being used (or what reference is being used),
    • what data supports those assumptions,
    • whether a pilot measured real average review time.

Without an effort model, compensation looks arbitrary. And when pay looks arbitrary, reviewers can treat the work like a gig task.

What this signals to the community

When the system pays the same for high-risk and low-risk reviews, it sends a clear message:

“We are not serious about scrutiny.”

Even if that is not the intention, that is the effect.

If the goal is accountability and careful treasury stewardship, the pay model must match the level of effort required.

Risk if unchanged

Flat pay encourages shortcuts. Complex milestones get skimmed, weak evidence passes, and audit trails become box-checking. The program can tighten templates forever, but the incentive will still push reviewers toward speed over verification.


7) Skill Matching: High-Risk Reviews Can Be Assigned to Non-Specialists

Why it matters

Previous direction highlighted the need for technical expertise for nuanced and high-complexity projects. While earlier sections discuss accountability tiers and effort scaling, the current framework still lacks a mechanism ensuring that the right reviewers evaluate the right types of work.

Today, Milestone Review assignments largely assume that any approved reviewer can evaluate any milestone category.

However, reviewing:

  • a community event,
  • an educational initiative,
  • research deliverables,
  • infrastructure software,
  • smart contracts or protocol tooling

requires fundamentally different skills.

Process improvements cannot substitute for domain expertise.

What’s missing (and practical improvement paths)

  • explicit reviewer competency profiles
  • formal skill-matching between reviewers and project domains
  • specialist review triggers for technically complex milestones
  • escalation paths when reviewers identify domain limitations

Risk if unchanged

Without competency matching, reviewers will continue evaluating work outside their expertise domain. This leads to:

  • inconsistent approval standards,
  • higher false approvals,
  • overreliance on trust instead of verification,
  • and erosion of confidence in the Milestone Review process even if procedures improve.

8) Revalidation of the Existing Reviewer Pool: Legacy Risk Not Addressed

Why it matters

Recent criticisms against Milestone Review Program are largely about current reviewer performance, not hypothetical future cohorts. If standards are being raised, the program needs a way to confirm that the existing reviewer pool still meets those standards.

What’s missing (minimal, practical)

  • a light re-certification for active reviewers (short test + one simulated review)
  • periodic performance reassessment based on objective signals (QA flags / returned reviews due to low quality, escalation rate, obvious misses)
  • a controlled renewal plan (rotate out consistently low-quality reviewers, recruit replacements)

Risk if unchanged

A stronger checklist applied to the same pool does not restore confidence. The reform becomes “new rules on paper” while the same reviewer performance patterns continue in practice.


Conclusion

The draft improves consistency, but without structural upgrades in vetting, oversight tiers, incentives, and effort modeling, it is unlikely to resolve the current trust deficit.


References

[1] https://x.com/dannyribar/status/2014783410162618460 [1a] https://archive.is/izb27

[2] https://docs.projectcatalyst.io/current-fund/community-review/community-reviewer-levels

[3] https://docs.projectcatalyst.io/previous-funds/fund14-docs/project-onboarding/milestone-reviewers-guides/milestone-reviewer-rewards

Comments 0