CA Justice Watch tracks prosecutorial injustice across all 58 California counties. Every fact sourced from public records.

Ranking Methodology — Mathematical Formulation

A complete mathematical specification of the misconduct-severity ranking formula used across cajusticewatch.com. Written in standard notation for academic review.

Status: locked 2026-05-16. This methodology is frozen. Any future revision requires a new published methodology page, with the prior version archived and a published changelog. Wayback Machine archive integrity is the canonical record of the locked specification.
Canonical source: spreadsheet-data/EQUATION.json · Implementation: cajusticewatch/mcp-server · Live spreadsheet: /spreadsheet

TL;DR

Each documented misconduct incident is scored by combining (1) a single analyst-assigned severity tier on a 1–10 California Penal Code-calibrated scale, (2) a confidence weight derived from documentary evidence and source integrity, (3) a pattern-recognition boost when an actor commits similar incidents repeatedly, and (4) an exponential severity weighting so categorically worse conduct cannot be drowned out by volume of lesser conduct. Per-person scores are sums of per-incident scores; rankings are dimensionless ratios to the population mean.

1. Scope & non-goals

Let \(\mathcal P\) denote the set of in-scope California criminal-justice actors (district attorneys, public defenders, currently sitting judges, law-enforcement officers, and trial witnesses). For each \(p \in \mathcal P\), the formula produces a single non-negative real number \(R(p) \in \mathbb R_{\ge 0}\) that summarizes the actor's documented misconduct severity.

The formula is intended as a documentation summary, not as an adjudication. It does not produce a verdict, a probability of future misconduct, or a normative judgment. It is a deterministic function of public-record inputs: identical inputs produce identical outputs.

Non-goals are stated explicitly in §17 to avoid misreading.

2. Notation

Definition 2.1 (Actors). \(\mathcal P\) is the set of in-scope actors. A specific actor is denoted \(p \in \mathcal P\).
Definition 2.2 (Incidents). \(\mathcal I\) is the set of all documented incidents across the database. \(\mathcal I(p) \subseteq \mathcal I\) is the subset attributed to actor \(p\) before any filtering. Each incident \(i \in \mathcal I(p)\) carries the metadata fields enumerated in §3–§6.
Definition 2.3 (Scoreable set). The scoreable subset of incidents for actor \(p\) is \[ I_{\mathrm{sc}}(p) \;:=\; \{\, i \in \mathcal I(p) \;:\; \mathrm{status}(i) \neq \texttt{ruled\_false} \,\}. \] Incidents with status ruled_false are excluded from \(R(p)\) entirely (not merely down-weighted). Throughout the rest of this document, all summations and counts run over \(I_{\mathrm{sc}}(p)\) (the filtered set), not over \(\mathcal I(p)\) (the unfiltered set).
Definition 2.4 (Count). \(n(p) := |I_{\mathrm{sc}}(p)|\) is the number of scoreable incidents attributable to \(p\).

3. Severity tiers & the per-incident severity \(S(i)\)

Every incident is anchored to a single severity tier on a 1–10 ordinal scale calibrated to California Penal Code (PC) tiers (Table 3.1). The tier is assigned by the documenting analyst as the worst applicable dimension — charge severity, adjudicated outcome severity, or harm caused — whichever yields the highest defensible PC anchor for the incident as a whole.

Table 3.1 — Severity tier scale \(\mathcal S = \{1,\dots,10\}\) with PC §1170(b) sentencing anchors

TierPC AnchorLow (yrs)Mid (yrs)Upper (yrs)Custody
10LWOP / death penalty (PC §190.2)505050State prison (life)
9Murder PC §190(a): 15-to-life (2°) / 25-to-life (1°)152525State prison (life-tail)
8PC 211 (2d robbery) × strike doubling (PC §667(e)(1))4610State prison
7PC 461(a) (1st-degree burglary) × strike doubling4812State prison
6PC 461(a) (1st-degree burglary), no strike246State prison
5PC 245(a)(1) (assault w/ deadly weapon) felony234State prison
4Common PC §1170(h) realignment triad (16mo / 2yr / 3yr)1.3323County jail
3Misdemeanor with jail eligibility (PC §18.5 caps at 364 days)00.51County jail
2Low-custody misdemeanor (cited/released or short jail)00.0830.25County jail (if any)
1Infraction (PC §19.6) — fine only000None

The low / mid / upper columns are policy-anchored statutory triads — they reflect the actual California Penal Code §1170(b) sentencing ranges for each tier's representative anchor offense (cited inline), used by the sentence-equivalent metric \(Y(p)\) defined in §10. The columns are not claims about typical sentences for arbitrary offenses in that tier class — they are explicit calibration anchors per Bishop-Tars 2026-05-17 review. The Custody column distinguishes state-prison (CDCR) tiers from county-jail tiers so \(Y(p)\) can report prison-years and jail-years separately rather than collapsing them.

Calibration caveats (binding on Y(p)):

Each incident is assigned a single severity tier:

\[ S(i) \in \mathcal S = \{1, 2, \dots, 10\}, \tag{3.1} \]

assigned as described above. If an analyst wishes to record their reasoning for the chosen tier (e.g. "tier 7 because the original charge dominated even though the plea was reduced"), that reasoning goes in the incident's free-text reasoning field; the numeric column carries the single tier \(S(i)\).

4. Confidence base \(\alpha(i)\) from incident status

Every incident has a documentary status reflecting whether an oversight body has adjudicated it. The status determines a base confidence multiplier \(\alpha(i) \in [0,1]\), calibrated from California civilian-oversight sustained-vs-unfounded rates (see ALPHA sheet for source data).

Table 4.1 — Status → base confidence

status\((i)\)\(\alpha(i)\)Treatment
ruled_confirmed\(1.0000\)Sustained by an oversight body or court
ruled_insufficient\(0.4348\)Investigated, evidence insufficient
not_ruled\(0.2590\)Documented but never formally adjudicated
ruled_false(excluded)Excluded from \(I_{\mathrm{sc}}(p)\) per Definition 2.3
Justification. The non-trivial values \(0.4348\) and \(0.2590\) are not arbitrary: they correspond to the empirically observed sustained rates among civilian-oversight findings of "insufficient" and "not ruled" complaints, respectively, computed across multiple California oversight agencies (San Francisco DPA, San Diego CLERB, Los Angeles Civilian Oversight Commission, et al.). Full source data is on the ALPHA sheet. This is one of the highest-leverage parameter choices and a natural focus for peer review.

5. Per-incident base confidence \(\mathrm{base}_{\mathrm C}(i,p)\)

Three additional quality indicators are scored per incident, each in \([0,1]\):

\(\mathrm{evidence}(i)\)
Documentary record strength. \(1\) = primary-source documents (court orders, oversight findings, signed declarations). \(0\) = pure rumor.
\(\mathrm{integrity}(i)\)
Source-tier reliability, assigned from the discrete source_class bands defined on the SCHEMA sheet: \(S_1\) court / Bar record \(\to 1.00\); \(S_2\) agency record (CJP, POST, DPA, CLERB) \(\to 0.75\); \(S_3\) named press \(\to 0.50\); \(S_4\) single-source allegation \(\to 0.25\). These four-band quarter-step values are a policy convention rather than an empirically calibrated curve; they are appropriate subjects for peer review and tracked as an open documentation gap in §16.
\(\rho(i)\)
Corroboration density, assigned from three reviewer bands: \(\rho = 1.00\) when multiple independent confirming sources exist, \(0.75\) when at least two sources concur, \(0.50\) when only a single source is on record. Like integrity, the band thresholds reflect current reviewer practice and are tracked as a documentation gap in §16 rather than as a fully-specified continuous function.

The base confidence for incident \(i\) attributed to actor \(p\) is the product:

\[ \mathrm{base}_{\mathrm C}(i,p) \;:=\; \mathrm{evidence}(i)\;\cdot\;\mathrm{integrity}(i)\;\cdot\;\rho(i)\;\cdot\;\alpha(i). \tag{5.1} \]

By construction, \(\mathrm{base}_{\mathrm C}(i,p) \in [0,1]\): the product of four values in \([0,1]\) is in \([0,1]\). The product form encodes the requirement that a deficiency in any one dimension is sufficient to dampen confidence — strong documentary evidence does not rescue a low-integrity source, and vice versa.

6. Pattern boost \(\mathcal B(i,p)\)

Repeated misconduct of the same type by the same actor is documented as a pattern. The boost \(\mathcal B(i,p)\) increases confidence for incident \(i\) when actor \(p\) has additional incidents sharing the same axiom. Up to four axioms (from the 7-axiom TAXONOMY) may be tagged per incident: VIOLENCE, COERCION, COLLUSION, ABUSE, DECEIT, DISCRIMINATION, EVASION.

Let \(\mathcal A(i) \subseteq \{1,\dots,7\}\) denote the set of axioms tagged on incident \(i\) (with \(1 \le |\mathcal A(i)| \le 4\); incidents always have at least one axiom). For axiom \(a\) and the specific incident \(i\) being scored, let

\[ \mathcal I_a(i, p) \;:=\; \{\, j \in I_{\mathrm{sc}}(p) \;:\; a \in \mathcal A(j),\; j \neq i \,\}, \]

be the other scoreable incidents of \(p\) that share axiom \(a\) with \(i\). The boost uses the 4-slot additive rule with linearly-decreasing weights specified in the BOOST sheet:

\[ w \;=\; (w_1, w_2, w_3, w_4) \;=\; (1.00,\; 0.75,\; 0.50,\; 0.25), \]

so that the \(k\)-th additional matching incident (in rank order) contributes weight \(w_k\), and any \(k > 4\) contributes zero. Ordering rule: for each axiom \(a\), enumerate \(\mathcal I_a(i,p)\) as \(j_1, j_2, \dots, j_{m_a}\) where \(m_a := \min(4, |\mathcal I_a(i,p)|)\), sorted by descending \(\mathrm{base}_{\mathrm C}(j,p)\). That is, \(j_1\) is the highest-base-confidence other matching incident, \(j_2\) the next-highest, and so on. Ties between equal-\(\mathrm{base}_{\mathrm C}\) candidates are score-neutral (the weighted sum \(\sum w_k \cdot b_k\) is invariant under any permutation of equal-valued entries) and are therefore broken in whatever order the implementation's sort is stable on — typically input order. The per-axiom boost is the weighted sum of those corroborating incidents' base confidences:

\[ \mathcal B_a(i,p) \;:=\; \sum_{k=1}^{m_a}\, w_k \cdot \mathrm{base}_{\mathrm C}(j_k,\,p). \tag{6.1} \]

If \(\mathcal I_a(i,p) = \varnothing\) (no other incidents share this axiom), the sum is empty and \(\mathcal B_a(i,p) = 0\). The combined pattern boost is the max across axioms tagged on \(i\):

\[ \mathcal B(i,p) \;:=\; \max_{a \in \mathcal A(i)}\, \mathcal B_a(i,p). \tag{6.2} \]

Since \(|\mathcal A(i)| \ge 1\) by construction, (6.2) is always well-defined. Note that \(\mathcal B(i,p) \ge 0\) with equality iff no other scoreable incident shares any axiom with \(i\).

Design rationale. The weighted-sum structure ensures that each additional matching incident contributes a strictly positive (but diminishing) increment to the boost, while the max-across-axioms aggregation prevents a single very-broad axiom (e.g., ABUSE) from dominating the boost when a narrower axiom (e.g., COERCION) is more specific. This is consistent with information-retrieval literature on diversity-aware pattern weighting.

7. Capped action confidence and incident score \(s(i,p)\)

The action confidence applies the boost and caps the result strictly below \(1\). Define the tier-cap function

\[ \kappa(s) \;:=\; \frac{s}{s+1}, \qquad s \in \mathbb{N}, \;\; s \ge 1, \tag{7.0} \]

which is monotone increasing on its domain, bounded above by \(1\), with supremum \(1\) not attained (on the practical PC scale \(s \in \{1,\dots,10\}\), \(\kappa(s) \le \frac{10}{11} \approx 0.909\)). Then:

\[ C_{\mathrm{action}}(i,p) \;:=\; \min\!\bigl(\,\kappa(S(i)),\;\;\mathrm{base}_{\mathrm C}(i,p)\,\cdot\,\bigl(1 + \mathcal B(i,p)\bigr)\,\bigr). \tag{7.1} \]

The choice of cap \(\kappa(s) = s/(s+1)\) takes the same functional form as a Bayesian posterior mean under sequential evidence accumulation (the form \(N/(N+1)\) is the maximum-likelihood estimate of a binary success rate after \(N\) trials under an improper limit prior — sometimes called a Haldane-style prior, the limiting case of \(\mathrm{Beta}(\varepsilon, 1)\) as \(\varepsilon \to 0\)). Equivalently, this is the conservative dual of Laplace's rule of succession (Laplace 1814, which uses a \(\mathrm{Beta}(1,1)\) uniform prior giving \((N+1)/(N+2)\)): same posterior shape, deliberately shifted prior. We adopt the conservative variant because the accountability domain calls for confidence to be earned from evidence rather than granted by a uniform prior.

\(S(i)\)\(\kappa(S(i))\)Interpretation
1 (infraction)0.500For tier-1 incidents the cap binds at 0.5 — minor incidents can never exceed coin-flip confidence under this formula. This is intentional: the cost of falsely amplifying minor allegations is judged higher than the cost of capping them.
3 (misd. w/ jail)0.750
5 (state prison)0.833
7 (serious felony)0.875
9 (life-eligible)0.900
10 (death-eligible)0.909Asymptote — no incident is ever treated as 100% certain.

Severity weighting is exponential:

\[ S_{\mathrm{weight}}(i) \;:=\; 2^{\,S(i)}. \tag{7.2} \]

The per-incident score is the product:

\[ \boxed{\;\;s(i,p) \;:=\; S_{\mathrm{weight}}(i) \cdot C_{\mathrm{action}}(i,p) \;=\; 2^{\,S(i)} \cdot C_{\mathrm{action}}(i,p).\;\;} \tag{7.3} \]

Equation (7.3) is the atomic unit of the system. All higher-level metrics aggregate it.

8. Per-person rollup \(R(p)\)

\[ \boxed{\;\;R(p) \;:=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)} s(i,\,p) \;=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)} 2^{\,S(i)}\cdot C_{\mathrm{action}}(i,\,p).\;\;} \tag{8.1} \]

\(R(p)\) is the canonical per-actor severity score. There is no per-person aggregate cap — by methodological choice, repeated misconduct should aggregate without ceiling. See §12 for bounds and properties.

9. Display multipliers \(\delta_R\) and \(\delta_P\)

Two dimensionless display ratios are computed for presentation only. They do not feed back into \(R(p)\).

Let \(\mathcal P_{\text{ranked}} := \{\, p \in \mathcal P : R(p) > 0 \,\}\) be the set of actors with strictly positive score (i.e., at least one scoreable incident that produced a positive per-incident score under (7.3); note that an actor can have \(n(p) > 0\) tier-scored incidents and still satisfy \(R(p) = 0\) if every such incident has \(\mathrm{base}_{\mathrm C} = 0\)). Define

\[ \mu_R \;:=\; \frac{1}{|\mathcal P_{\text{ranked}}|} \sum_{p \in \mathcal P_{\text{ranked}}} R(p), \qquad \mu_P \;:=\; \frac{1}{|\mathcal P|} \sum_{p \in \mathcal P} R(p), \]

with the convention that if \(\mathcal P_{\text{ranked}} = \varnothing\) then \(\mu_R\) is undefined (and \(\delta_R\) is reported as "—" by the display layer); similarly \(\mu_P\) requires \(|\mathcal P| > 0\), which is always satisfied in production. Then for any \(p \in \mathcal P\),

\[ \delta_R(p) \;:=\; \frac{R(p)}{\mu_R}, \qquad \delta_P(p) \;:=\; \frac{R(p)}{\mu_P}. \tag{9.1} \]

The two ratios answer different questions:

Asymmetric null handling is intentional. Including \(R(p)=0\) actors in \(\mu_P\) but not in \(\mu_R\) is the mathematical engine that lets the two display ratios carry different information. This is the same trick used in (e.g.) sports statistics to distinguish "all-star batting average" from "league-wide batting average." If every \(p \in \mathcal P\) satisfies \(R(p) > 0\), then \(\mu_P = \mu_R\) and \(\delta_P \equiv \delta_R\) by construction; the display layer detects this collapse and substitutes "—" rather than misrepresent the comparison.

10. Sentence-equivalent metric \(Y(p)\)

\(R(p)\) is the canonical mathematical ranking score, but its units (a dimensionless severity-weighted sum) are not graspable by non-technical readers. To translate documented misconduct into a unit the general public actually feels — years of imprisonment — we define a parallel sentence-equivalent metric \(Y(p)\), with units of years.

For each scoreable incident \(i \in I_{\mathrm{sc}}(p)\), let \(T_{\mathrm{mid}}(S(i))\), \(T_{\mathrm{low}}(S(i))\), \(T_{\mathrm{up}}(S(i))\) denote the middle, low, and upper sentence anchors (in years) from Table 3.1. The per-incident sentence contribution is:

\[ y_{\mathrm{mid}}(i,p) \;:=\; T_{\mathrm{mid}}(S(i)) \cdot C_{\mathrm{action}}(i,p), \tag{10.1} \]

with \(y_{\mathrm{low}}\) and \(y_{\mathrm{up}}\) defined analogously. The actor's total sentence exposure is the consecutive sum (PC §669, "no plea, no concurrent discount" framing):

\[ \boxed{\;\;Y_{\mathrm{mid}}(p) \;:=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)}\, T_{\mathrm{mid}}(S(i)) \cdot C_{\mathrm{action}}(i,p)\;\;} \tag{10.2} \]

\(Y_{\mathrm{low}}(p)\) and \(Y_{\mathrm{up}}(p)\) are computed identically using the low and upper triad columns. The published value is the canonical \(Y_{\mathrm{mid}}(p)\); the band \([Y_{\mathrm{low}}(p),\,Y_{\mathrm{up}}(p)]\) is displayed alongside to make the sentencing range transparent rather than hidden.

Prison vs jail decomposition

Because California's two custodial systems are not morally equivalent — state prison (CDCR) is materially harder time than county jail — \(Y(p)\) is decomposed by custody type per Table 3.1's Custody column:

\[ Y(p) \;=\; Y_{\mathrm{prison}}(p) \;+\; Y_{\mathrm{jail}}(p), \]

where \(Y_{\mathrm{prison}}\) sums per-incident \(y(i,p)\) over scoreable incidents with \(S(i) \in \{5, 6, 7, 8, 9, 10\}\) (state-prison tiers) and \(Y_{\mathrm{jail}}\) sums over \(S(i) \in \{2, 3, 4\}\) (county-jail tiers including PC §1170(h) realignment felonies). Tier-1 incidents contribute zero years to either component. The display layer reports the prison and jail figures separately as well as the total.

Stacking convention

The summation in (10.2) implicitly stacks all per-incident sentences consecutively, not concurrently. This is a deliberate cumulative-harm convention chosen for the accountability framing of \(Y(p)\), not a prediction of actual California sentencing outcomes. Real-world California sentencing applies a battery of subordinate-term rules, PC §654 multiple-punishment stays, and judicial discretion under PC §669 that frequently produce concurrent or partially-concurrent terms. Y(p) does not model any of these — it asks only "what would each documented misconduct incident produce under PC §1170(b), summed independently, with the standard C_action confidence discount applied."

Why the same \(C_{\mathrm{action}}\) factor. A documented incident with low confidence contributes proportionally less sentence-equivalent, just as it contributes proportionally less R(p). The same conservatism applies: capped at \(\kappa(S(i)) = S(i)/(S(i)+1)\), so a fully-saturated tier-10 incident contributes at most \(50 \cdot 10/11 \approx 45.5\) years (mid term), not the full 50. This makes \(Y(p)\) consistent with the rest of the methodology — uncertain misconduct produces an uncertain sentence.
What \(Y(p)\) is, and what it is not. \(Y(p)\) is a translation of documented misconduct into the unit of years, applied symmetrically — the law the actor is documented as having violated, applied back to them at the same statutory triads they impose on defendants. It is not a verdict, a sentence recommendation, an indictment, or a call for prosecution. It is a public-comprehension metric. Like \(R(p)\), it operates only on documented incidents in \(I_{\mathrm{sc}}(p)\) and produces a deterministic output. The conservative assumptions baked in (statutory triads only, no enhancements beyond what's encoded in tier 8/7's strike-doubling, consecutive stacking by default, ruled_false excluded entirely) make it a floor rather than a ceiling on the true sentence exposure a real prosecution would carry.

Worked example for a tier-9 sustained incident

Consider an incident with \(S(i) = 9\), \(\alpha = 1\) (sustained), \(\mathrm{evidence} = \mathrm{integrity} = \rho = 1\), no boost. Then \(\mathrm{base}_{\mathrm C}(i,p) = 1\), \(C_{\mathrm{action}}(i,p) = \min(\kappa(9),\,1) = 0.9\). Per Table 3.1, tier-9 triad is (15, 25, 30) years. The sentence contribution:

\[ y_{\mathrm{low}} = 15 \cdot 0.9 = 13.5,\quad y_{\mathrm{mid}} = 25 \cdot 0.9 = 22.5,\quad y_{\mathrm{up}} = 30 \cdot 0.9 = 27\ \text{years.} \]

An actor with three such incidents would have \(Y_{\mathrm{mid}}(p) = 67.5\) years (band: 40.5 to 81), summed consecutively.

11. Three-state status classification

Each actor is assigned a categorical ranking_status based on whether incidents and scores exist:

\[ \mathrm{status}(p) \;:=\; \begin{cases} \texttt{ranked} & \text{if } n(p) > 0 \text{ and } R(p) > 0, \\[2pt] \texttt{documented\_unscored} & \text{if } n(p) > 0 \text{ and } R(p) = 0, \\[2pt] \texttt{null} & \text{if } n(p) = 0. \end{cases} \tag{10.1} \]

The documented_unscored state captures actors with documented incidents that have not yet been tier-scored. The formula's policy is conservative: no tier classification means no score, never an estimated score.

12. Bounds, properties, and proofs

Proposition 11.1 (Non-negativity). For every \(p \in \mathcal P\), \(R(p) \ge 0\).
By (5.1), \(\mathrm{base}_{\mathrm C}(i,p) \in [0,1]\) as a product of four \([0,1]\) values. \(\mathcal B(i,p) \ge 0\) by (6.1)–(6.2). Therefore \(1 + \mathcal B(i,p) \ge 1 > 0\), and the min in (7.1) is bounded below by \(0\), so \(C_{\mathrm{action}}(i,p) \in \bigl[0,\;\kappa(S(i))\bigr] \subset [0, 1)\). Since \(2^{S(i)} > 0\) for all \(S(i) \in \mathcal{S} = \{1,\dots,10\}\), each \(s(i,p) \ge 0\). \(R(p)\) is a finite sum of non-negative terms.
Proposition 11.2 (Confidence cap, severity-dependent). For every \(i,p\), \(C_{\mathrm{action}}(i,p) \le \dfrac{S(i)}{S(i)+1}\), which is strictly less than \(1\) for all \(S(i) \in \mathbb N\) and asymptotes to \(1\) only as \(S(i) \to \infty\). In particular \(C_{\mathrm{action}}(i,p) \le \tfrac{10}{11} \approx 0.909\) on the practical PC tier scale \(S(i) \in \{1,\dots,10\}\).
Immediate from the \(\min\bigl(\tfrac{S(i)}{S(i)+1},\,\cdot\bigr)\) in (7.1). The function \(S \mapsto S/(S+1) = 1 - 1/(S+1)\) is monotone increasing on \([1, \infty)\) and bounded above by \(1\), with supremum \(1\) not attained.
Proposition 11.3 (Monotonicity in severity, conditional). Fix an incident \(i\) attributed to actor \(p\), and assume \(\mathrm{base}_{\mathrm C}(i,p) > 0\) (i.e., all four factors \(\mathrm{evidence}, \mathrm{integrity}, \rho, \alpha\) are strictly positive). Then for any two severity tiers \(s, s' \in \mathcal{S}\) with \(s < s'\), the per-incident score \(s(i,p)\) evaluated at \(S(i) = s\) is strictly less than that evaluated at \(S(i) = s'\). In words: holding all other inputs fixed, increasing the severity tier strictly increases the per-incident score.
Recall \(s(i,p) = 2^{S(i)} \cdot C_{\mathrm{action}}(i,p)\) where \(C_{\mathrm{action}}(i,p) = \min(\kappa(S(i)),\; U)\) with \(U := \mathrm{base}_{\mathrm C}(i,p)(1+\mathcal B(i,p))\). The quantity \(U\) is constant in \(S(i)\) (it depends on \(i,p\) but not on the tier label). Let \(s, s' \in \mathcal S\) with \(s < s'\). We split on which branch of the \(\min\) is active:
  • Case 1: \(U \ge \kappa(s')\). Then both \(\min\) calls reduce to the cap, giving \(C_{\mathrm{action}}|_{S=s} = \kappa(s) < \kappa(s') = C_{\mathrm{action}}|_{S=s'}\) (Prop. 11.2). Since \(2^{s} < 2^{s'}\) and both \(C_{\mathrm{action}}\) values are strictly positive, the product \(s(i,p)\) is strictly larger at \(s'\).
  • Case 2: \(U \le \kappa(s)\). Then both calls reduce to \(U > 0\) (since \(\mathrm{base}_{\mathrm C}>0\) and \(1+\mathcal B \ge 1\)). \(C_{\mathrm{action}}\) is the same constant \(U\) at both tiers, but \(2^{s} < 2^{s'}\) so the product is strictly larger at \(s'\).
  • Case 3 (crossing): \(\kappa(s) < U < \kappa(s')\). Then \(C_{\mathrm{action}}|_{S=s} = \kappa(s)\) and \(C_{\mathrm{action}}|_{S=s'} = U > \kappa(s)\). So both factors strictly increase, hence so does the product.
In all three cases, \(s(i,p)|_{S=s} < s(i,p)|_{S=s'}\). The non-degeneracy hypothesis \(\mathrm{base}_{\mathrm C}(i,p) > 0\) is required only to rule out the trivial case \(U = 0\), in which all values are zero. \(\square\)

Note that the \(\mathrm{base}_{\mathrm C}(i,p) > 0\) condition is necessary: if any of \(\mathrm{evidence}, \mathrm{integrity}, \rho, \alpha\) is zero, the product (5.1) collapses to zero and \(s(i,p) = 0\) regardless of tier.

Proposition 11.4 (Aggregation monotonicity). Adding a scoreable incident \(i^*\) to actor \(p\)'s dossier with \(s(i^*,p) > 0\) strictly increases \(R(p)\).
Let \(I_{\mathrm{sc}}(p)\) denote the pre-addition scoreable set and \(I'_{\mathrm{sc}}(p) := I_{\mathrm{sc}}(p) \cup \{i^*\}\) the post-addition set. We must show that for every previously scored incident \(j \in I_{\mathrm{sc}}(p)\), the per-incident score \(s(j,p)\) cannot decrease under the addition, and that the new term \(s(i^*,p)\) is strictly positive.

Step 1: existing per-incident scores are non-decreasing. Fix \(j \in I_{\mathrm{sc}}(p)\). The only quantity in \(s(j,p) = 2^{S(j)} \cdot C_{\mathrm{action}}(j,p)\) that can change is \(\mathcal B(j,p)\) (since \(S(j)\) and \(\mathrm{base}_{\mathrm C}(j,p)\) depend only on \(j\) and \(p\), not on the rest of the dossier). For each axiom \(a \in \mathcal A(j)\), the set \(\mathcal I_a(j,p)\) grows from \(\mathcal I_a^{\,\text{old}}(j,p)\) to \(\mathcal I_a^{\,\text{old}}(j,p) \cup \{i^*\}\) when \(a \in \mathcal A(i^*)\), and is unchanged otherwise. In the latter case \(\mathcal B_a(j,p)\) is unchanged. In the former case, the per-axiom weighted top-\(4\) sum (6.1) accepts an additional non-negative candidate \(\mathrm{base}_{\mathrm C}(i^*,p) \ge 0\); since all weights \(w_k\) are non-negative and the ranking is "descending \(\mathrm{base}_{\mathrm C}\)" with empty-slot weight \(0\), inserting a new non-negative candidate cannot strictly decrease the sum of the top four (it replaces only a smaller or equal candidate). Hence \(\mathcal B_a(j,p)\) is non-decreasing for every axiom \(a\); the \(\max\) in (6.2) of a vector of non-decreasing entries is non-decreasing; therefore \(\mathcal B(j,p)\) is non-decreasing; therefore \(\mathrm{base}_{\mathrm C}(j,p)(1+\mathcal B(j,p))\) is non-decreasing; therefore \(C_{\mathrm{action}}(j,p) = \min(\kappa(S(j)),\,\cdot)\) is non-decreasing (the cap is fixed and the second argument grows); and finally \(s(j,p) = 2^{S(j)} \cdot C_{\mathrm{action}}(j,p)\) is non-decreasing.

Step 2: strict increase. The new term \(s(i^*,p) > 0\) by hypothesis. Each existing term is non-decreasing by Step 1. Therefore \[ R'(p) = s(i^*,p) + \sum_{j \in I_{\mathrm{sc}}(p)} s'(j,p) \;\ge\; s(i^*,p) + \sum_{j \in I_{\mathrm{sc}}(p)} s(j,p) \;>\; R(p). \qquad\square \]

An upper bound on \(R(p)\) does not exist: human harm is unbounded in principle, and the formula refuses to impose an artificial ceiling. See §17 for the principled reason this is a feature, not a defect.

13. Why exponential severity? (theorem)

The choice of \(S_{\mathrm{weight}}(i) = 2^{S(i)}\) is justified by the following observation, which we state formally:

Theorem 12.1 (Severity dominance, worst-case bound). Let \(s_{\max}, s_{\min} \in \mathcal{S}\) with \(s_{\max} > s_{\min}\). Suppose we compare a single tier-\(s_{\max}\) incident with action confidence \(C_{\max} \in (0, \kappa(s_{\max})]\) against \(N\) tier-\(s_{\min}\) incidents, each at the worst-case (i.e., highest possible) action confidence \(\kappa(s_{\min})\). Then the single tier-\(s_{\max}\) incident contributes strictly more to \(R(p)\) than the bundle of \(N\) tier-\(s_{\min}\) incidents whenever \[ 2^{s_{\max} - s_{\min}} \;>\; N \cdot \frac{\kappa(s_{\min})}{C_{\max}}. \tag{12.1} \] In the saturating case where the high-tier incident is also at its cap (\(C_{\max} = \kappa(s_{\max})\)), this reduces to \[ 2^{s_{\max} - s_{\min}} \;>\; N \cdot \frac{\kappa(s_{\min})}{\kappa(s_{\max})} \;=\; N \cdot \frac{s_{\min}(s_{\max}+1)}{s_{\max}(s_{\min}+1)}. \tag{12.2} \] Concretely, evaluating (12.2) at \(s_{\max}=9, s_{\min}=3\): the coefficient is \(\kappa(3)/\kappa(9) = 0.75/0.9 = 5/6 \approx 0.8333\), so the dominance condition reduces to \(2^6 = 64 > 0.8333\,N\), giving \(N < 76.8\) — a single saturating tier-9 incident strictly outweighs up to 76 tier-3 incidents at their respective max confidence.
The bundle of \(N\) tier-\(s_{\min}\) incidents contributes at most \(N \cdot 2^{s_{\min}} \cdot \kappa(s_{\min})\) (this is the supremum over their possible action confidences, taken to bound the comparison from above). The single tier-\(s_{\max}\) incident contributes \(2^{s_{\max}} \cdot C_{\max}\). Strict dominance holds iff \[ 2^{s_{\max}} \cdot C_{\max} \;>\; N \cdot 2^{s_{\min}} \cdot \kappa(s_{\min}) \;\Longleftrightarrow\; 2^{s_{\max} - s_{\min}} \;>\; N \cdot \frac{\kappa(s_{\min})}{C_{\max}}. \qquad\Box \]

This is not merely a quantitative preference; it encodes the normative claim that lethal harm is categorically not commensurable with minor misconduct. A linear weighting would imply substitutability that the methodology explicitly rejects.

Other weighting choices (e.g., \(S^k\) for polynomial \(k\), or arbitrary monotone tables) were considered. Exponential weighting with base 2 was selected for three reasons: (a) for the tier gaps that matter most in practice (e.g., comparing tier-9 conduct against bundles of tier-3 conduct), base 2 produces a dominance bound that comfortably exceeds realistic incident counts per actor — for \(s_{\max}-s_{\min} \ge 6\) (tier-9 vs tier-3 or wider), (12.2) yields \(N\) bounds in the dozens, exceeding typical empirical dossier sizes; (b) base 2 respects the ordinal structure of the underlying severity tier scale, where each step represents a categorical increase in severity (consistent with the PC-anchored Table 3.1 calibration even though \(S(i)\) is now analyst-assigned across charge / outcome / harm dimensions per §3); and (c) it produces interpretable "doublings" between adjacent tiers, a phrasing that translates cleanly to non-technical audiences. We do not claim base 2 dominates arbitrary incident counts — for small tier gaps (e.g., tier 4 vs tier 3) a large \(N\) of lower-tier incidents will outweigh a single higher-tier one, by design.

14. The strict 12-month window

Refresh-sensitive sources (web pages, press archives, oversight databases that delete old records) are evaluated against a rolling 12-month window with strict cutoff dates. Sources outside this window are not eligible to seed new incidents, though previously-ingested incidents are not retroactively removed.

Current window: (loading from the live spreadsheet…)

This is an engineering decision (not a mathematical one) intended to prevent silent corruption when an underlying source rotates or deletes old content. It does not affect equations (3.1)–(9.1). The current cutoff dates are recorded on the live EQUATION sheet and update automatically when the database refreshes.

Interactive companion
A live simulator implementing equations (3.1)–(9.1) above lives at /simulation. Build a fictitious actor, add dated incidents, and watch them slot into the live rankings against real Californian DAs, judges, officers, defenders, and police departments. The simulator is a companion to this specification, not part of it.
Try the formula →

15. Reproducibility

Every quantity above is intended to be reproducible from public artifacts. The one currently-acknowledged gap is that the explicit numeric mapping of evidence / integrity / ρ bands (the S1–S4 source_class table and the ρ density bands described in §5) is presently codified in reviewer practice rather than as an explicit SCHEMA cell; this is tracked as open item §16(g) and will be folded into the SCHEMA sheet before the next published lock. With that single exception, the equation, constants, and per-actor inputs are all served by public sources:

Any per-actor score \(R(p)\) can be verified by (a) reading the relevant _Math.json row, (b) reading the matching _Incidents.json rows for the inputs, and (c) re-running equations (3.1)–(8.1) by hand or in any computer-algebra system. Display ratios \(\delta_R\) and \(\delta_P\) additionally require evaluating (9.1) against the full population sums.

16. Limitations & open questions for review

This methodology has known limitations that are appropriate subjects for peer scrutiny. We list them rather than concealing them.

(a) Population completeness for \(\delta_P\)

The \(\delta_P\) ratio is only meaningful when the underlying sheet contains both ranked (\(R(p) > 0\)) actors and zero-score rows — that is, either documented_unscored actors (\(n(p)>0, R(p)=0\)) or null actors (\(n(p)=0\)) per the §10 status classification — covering the in-scope population. When a sheet contains only ranked rows, \(\mu_P\) trivially equals \(\mu_R\) and \(\delta_P \equiv \delta_R\) — a degenerate state that the display layer detects and surfaces with a yellow warning banner above the affected table, substituting "—" for \(\delta_P\) values. The mathematical guard is in place; per-sheet coverage status is visible on each worst-* page directly. See EQUATION sheet sections 6–7 and the population-collapse check at tools/check_pop_avg_collapse.py.

(b) Calibration of \(\alpha\)

The non-trivial values \(\alpha = 0.4348\) and \(\alpha = 0.259\) are derived from California civilian-oversight sustained rates. The choice of which agencies and time windows to include in the calibration set is itself a methodological choice and an appropriate subject for peer review. See ALPHA sheet for the underlying source data.

(c) Tier assignment for non-criminal misconduct

The PC-tier scale (Table 3.1) is a natural anchor for misconduct that maps cleanly onto a charge or sentence. Per §3, however, \(S(i)\) is assigned by the analyst as the "worst applicable dimension" — charge severity, adjudicated outcome severity, or harm caused — which means for non-criminal misconduct (e.g., ethics findings, harm-without-charge incidents) the tier assignment rests on the analyst's judgment about the "if charged criminally, what tier would this be" question. This is documented per-incident in the free-text reasoning field but introduces a degree of human judgment that should be acknowledged.

(d) Boost saturation

The 4-slot boost saturates at four matching incidents per axiom. A serial offender with 50 incidents of the same axiom receives no additional boost beyond the fourth match (though \(R(p)\) continues to grow linearly via (8.1)). The saturation choice is intentional but worth justifying formally.

(e) Independence assumption

Equation (5.1) treats the four confidence factors as multiplicatively independent. In practice they may be correlated (e.g., a primary-source document tends to be from a higher-integrity outlet). The product form is conservative — it under-weights when correlation is positive — but a covariance-adjusted form may be preferable for some use cases.

(f) Strict 12-month window edge effects

Sources just outside the window are excluded from new incident creation but contribute to existing incidents already in the dossier. This avoids retroactive deletion but creates a soft edge in coverage that may bias \(R(p)\) toward more-recently-covered actors.

(g) SCHEMA documentation gap for evidence / integrity / ρ bands

The discrete bands used by reviewers for the three confidence factors are listed in §5, but are not yet published as explicit rows on the SCHEMA sheet. This is a documentation cleanup, not a formula change; it does not affect any of the equations in §3–§12.

Open invitation. If you are a mathematician, statistician, or social scientist with relevant expertise, formal peer review of this methodology — including critique, alternative formulations, or empirical validation — is explicitly welcomed. Contact: cajusticewatch.com/contact. The full source spreadsheet, ingestion data, and code are public.

17. What this formula is NOT

18. References & related work

The formula's structure draws on standard techniques from the following literatures. This list is not exhaustive; reviewers are invited to suggest additions.

Composite indicators. OECD & JRC, Handbook on Constructing Composite Indicators: Methodology and User Guide, 2008. Foundational treatment of the design choices involved in summarizing multidimensional quality data into a single ranking.

Severity-weighted aggregation. Hanley & Lippman-Hand, "If nothing goes wrong, is everything all right?" JAMA 249(13), 1983 — on conservative aggregation under uncertainty.

Confidence as a product of independent factors. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, 1988. Justifies the \(\mathrm{evidence}\cdot\mathrm{integrity}\cdot\rho\cdot\alpha\) factorization as a degenerate case of a noisy-AND model.

Exponential weighting for rare-but-severe events. Taleb, Statistical Consequences of Fat Tails, 2020. Argues that linear or polynomial weighting systematically under-weights tail events; exponential weighting partially corrects this.

Pattern-saturation boost. The 4-slot product-with-max-combine rule is structurally analogous to the diversification-weighted IR scoring schemes in Carbonell & Goldstein, "The use of MMR, diversity-based reranking for reordering documents and producing summaries," SIGIR 1998.

Civilian-oversight calibration. ALPHA values are calibrated from the published sustained/unfounded rates of multiple California civilian-oversight bodies (San Francisco Department of Police Accountability, San Diego Citizens' Law Enforcement Review Board, Los Angeles Civilian Oversight Commission). Source data: ALPHA sheet.


Citation suggestion: California Justice Watch (2026). Ranking Methodology — Mathematical Formulation. https://cajusticewatch.com/methodology. Accessed [date].

Licence. This document is released under CC BY 4.0. The underlying spreadsheet, data, and source code are similarly open.

Errata + corrections welcomed. File issues at github.com/cajusticewatch/mcp-server or write to cajusticewatch.com/contact.