Ranking Methodology — Mathematical Formulation
A complete mathematical specification of the misconduct-severity ranking formula used across cajusticewatch.com. Written in standard notation for academic review.
TL;DR
Each documented misconduct incident is scored by combining (1) a single analyst-assigned severity tier on a 1–10 California Penal Code-calibrated scale, (2) a confidence weight derived from documentary evidence and source integrity, (3) a pattern-recognition boost when an actor commits similar incidents repeatedly, and (4) an exponential severity weighting so categorically worse conduct cannot be drowned out by volume of lesser conduct. Per-person scores are sums of per-incident scores; rankings are dimensionless ratios to the population mean.
1. Scope & non-goals
Let \(\mathcal P\) denote the set of in-scope California criminal-justice actors (district attorneys, public defenders, currently sitting judges, law-enforcement officers, and trial witnesses). For each \(p \in \mathcal P\), the formula produces a single non-negative real number \(R(p) \in \mathbb R_{\ge 0}\) that summarizes the actor's documented misconduct severity.
The formula is intended as a documentation summary, not as an adjudication. It does not produce a verdict, a probability of future misconduct, or a normative judgment. It is a deterministic function of public-record inputs: identical inputs produce identical outputs.
Non-goals are stated explicitly in §17 to avoid misreading.
2. Notation
ruled_false are excluded from \(R(p)\) entirely (not merely down-weighted). Throughout the rest of this document, all summations and counts run over \(I_{\mathrm{sc}}(p)\) (the filtered set), not over \(\mathcal I(p)\) (the unfiltered set).
3. Severity tiers & the per-incident severity \(S(i)\)
Every incident is anchored to a single severity tier on a 1–10 ordinal scale calibrated to California Penal Code (PC) tiers (Table 3.1). The tier is assigned by the documenting analyst as the worst applicable dimension — charge severity, adjudicated outcome severity, or harm caused — whichever yields the highest defensible PC anchor for the incident as a whole.
Table 3.1 — Severity tier scale \(\mathcal S = \{1,\dots,10\}\) with PC §1170(b) sentencing anchors
| Tier | PC Anchor | Low (yrs) | Mid (yrs) | Upper (yrs) | Custody |
|---|---|---|---|---|---|
| 10 | LWOP / death penalty (PC §190.2) | 50 | 50 | 50 | State prison (life) |
| 9 | Murder PC §190(a): 15-to-life (2°) / 25-to-life (1°) | 15 | 25 | 25 | State prison (life-tail) |
| 8 | PC 211 (2d robbery) × strike doubling (PC §667(e)(1)) | 4 | 6 | 10 | State prison |
| 7 | PC 461(a) (1st-degree burglary) × strike doubling | 4 | 8 | 12 | State prison |
| 6 | PC 461(a) (1st-degree burglary), no strike | 2 | 4 | 6 | State prison |
| 5 | PC 245(a)(1) (assault w/ deadly weapon) felony | 2 | 3 | 4 | State prison |
| 4 | Common PC §1170(h) realignment triad (16mo / 2yr / 3yr) | 1.33 | 2 | 3 | County jail |
| 3 | Misdemeanor with jail eligibility (PC §18.5 caps at 364 days) | 0 | 0.5 | 1 | County jail |
| 2 | Low-custody misdemeanor (cited/released or short jail) | 0 | 0.083 | 0.25 | County jail (if any) |
| 1 | Infraction (PC §19.6) — fine only | 0 | 0 | 0 | None |
The low / mid / upper columns are policy-anchored statutory triads — they reflect the actual California Penal Code §1170(b) sentencing ranges for each tier's representative anchor offense (cited inline), used by the sentence-equivalent metric \(Y(p)\) defined in §10. The columns are not claims about typical sentences for arbitrary offenses in that tier class — they are explicit calibration anchors per Bishop-Tars 2026-05-17 review. The Custody column distinguishes state-prison (CDCR) tiers from county-jail tiers so \(Y(p)\) can report prison-years and jail-years separately rather than collapsing them.
- Tier 10 is a normalization constant. The 50-year figure is an incapacitation proxy for LWOP / death (assumed CDCR intake age 30 against ~80-year life expectancy), not a predicted served-time estimate from custody/mortality data.
- Tier 8/7 bake in PC §667(e)(1) strike doubling because Table 3.1 explicitly defines those tiers as "with strike enhancement". The anchor offenses (PC 211, PC 461(a)) are policy anchors, not claims that every tier-8 or tier-7 offense doubles to those exact triads.
- The triad does not include firearm enhancements (PC §12022.5/.53), great-bodily-injury enhancements (PC §12022.7), hate-crime enhancements (PC §422.75), or out-on-bail enhancements (PC §12022.1) — any of which would materially raise the upper figure in a real prosecution.
- \(Y(p)\) is a sentence-equivalent proxy, not a sentencing calculator. It does not predict actual California sentencing outcomes (which depend on plea bargaining, PC §654 stays, subordinate-term rules, enhancement pleading/proof, custody credits PC §4019, and judicial discretion). It is a symmetric-application metric: the same statutory triad the law applies to defendants, applied back to the actor's documented misconduct.
Each incident is assigned a single severity tier:
\[ S(i) \in \mathcal S = \{1, 2, \dots, 10\}, \tag{3.1} \]assigned as described above. If an analyst wishes to record their reasoning for the chosen tier (e.g. "tier 7 because the original charge dominated even though the plea was reduced"), that reasoning goes in the incident's free-text reasoning field; the numeric column carries the single tier \(S(i)\).
4. Confidence base \(\alpha(i)\) from incident status
Every incident has a documentary status reflecting whether an oversight body has adjudicated it. The status determines a base confidence multiplier \(\alpha(i) \in [0,1]\), calibrated from California civilian-oversight sustained-vs-unfounded rates (see ALPHA sheet for source data).
Table 4.1 — Status → base confidence
| status\((i)\) | \(\alpha(i)\) | Treatment |
|---|---|---|
ruled_confirmed | \(1.0000\) | Sustained by an oversight body or court |
ruled_insufficient | \(0.4348\) | Investigated, evidence insufficient |
not_ruled | \(0.2590\) | Documented but never formally adjudicated |
ruled_false | (excluded) | Excluded from \(I_{\mathrm{sc}}(p)\) per Definition 2.3 |
5. Per-incident base confidence \(\mathrm{base}_{\mathrm C}(i,p)\)
Three additional quality indicators are scored per incident, each in \([0,1]\):
- \(\mathrm{evidence}(i)\)
- Documentary record strength. \(1\) = primary-source documents (court orders, oversight findings, signed declarations). \(0\) = pure rumor.
- \(\mathrm{integrity}(i)\)
- Source-tier reliability, assigned from the discrete
source_classbands defined on the SCHEMA sheet: \(S_1\) court / Bar record \(\to 1.00\); \(S_2\) agency record (CJP, POST, DPA, CLERB) \(\to 0.75\); \(S_3\) named press \(\to 0.50\); \(S_4\) single-source allegation \(\to 0.25\). These four-band quarter-step values are a policy convention rather than an empirically calibrated curve; they are appropriate subjects for peer review and tracked as an open documentation gap in §16. - \(\rho(i)\)
- Corroboration density, assigned from three reviewer bands: \(\rho = 1.00\) when multiple independent confirming sources exist, \(0.75\) when at least two sources concur, \(0.50\) when only a single source is on record. Like integrity, the band thresholds reflect current reviewer practice and are tracked as a documentation gap in §16 rather than as a fully-specified continuous function.
The base confidence for incident \(i\) attributed to actor \(p\) is the product:
\[ \mathrm{base}_{\mathrm C}(i,p) \;:=\; \mathrm{evidence}(i)\;\cdot\;\mathrm{integrity}(i)\;\cdot\;\rho(i)\;\cdot\;\alpha(i). \tag{5.1} \]By construction, \(\mathrm{base}_{\mathrm C}(i,p) \in [0,1]\): the product of four values in \([0,1]\) is in \([0,1]\). The product form encodes the requirement that a deficiency in any one dimension is sufficient to dampen confidence — strong documentary evidence does not rescue a low-integrity source, and vice versa.
6. Pattern boost \(\mathcal B(i,p)\)
Repeated misconduct of the same type by the same actor is documented as a pattern. The boost \(\mathcal B(i,p)\) increases confidence for incident \(i\) when actor \(p\) has additional incidents sharing the same axiom. Up to four axioms (from the 7-axiom TAXONOMY) may be tagged per incident: VIOLENCE, COERCION, COLLUSION, ABUSE, DECEIT, DISCRIMINATION, EVASION.
Let \(\mathcal A(i) \subseteq \{1,\dots,7\}\) denote the set of axioms tagged on incident \(i\) (with \(1 \le |\mathcal A(i)| \le 4\); incidents always have at least one axiom). For axiom \(a\) and the specific incident \(i\) being scored, let
\[ \mathcal I_a(i, p) \;:=\; \{\, j \in I_{\mathrm{sc}}(p) \;:\; a \in \mathcal A(j),\; j \neq i \,\}, \]be the other scoreable incidents of \(p\) that share axiom \(a\) with \(i\). The boost uses the 4-slot additive rule with linearly-decreasing weights specified in the BOOST sheet:
\[ w \;=\; (w_1, w_2, w_3, w_4) \;=\; (1.00,\; 0.75,\; 0.50,\; 0.25), \]so that the \(k\)-th additional matching incident (in rank order) contributes weight \(w_k\), and any \(k > 4\) contributes zero. Ordering rule: for each axiom \(a\), enumerate \(\mathcal I_a(i,p)\) as \(j_1, j_2, \dots, j_{m_a}\) where \(m_a := \min(4, |\mathcal I_a(i,p)|)\), sorted by descending \(\mathrm{base}_{\mathrm C}(j,p)\). That is, \(j_1\) is the highest-base-confidence other matching incident, \(j_2\) the next-highest, and so on. Ties between equal-\(\mathrm{base}_{\mathrm C}\) candidates are score-neutral (the weighted sum \(\sum w_k \cdot b_k\) is invariant under any permutation of equal-valued entries) and are therefore broken in whatever order the implementation's sort is stable on — typically input order. The per-axiom boost is the weighted sum of those corroborating incidents' base confidences:
\[ \mathcal B_a(i,p) \;:=\; \sum_{k=1}^{m_a}\, w_k \cdot \mathrm{base}_{\mathrm C}(j_k,\,p). \tag{6.1} \]If \(\mathcal I_a(i,p) = \varnothing\) (no other incidents share this axiom), the sum is empty and \(\mathcal B_a(i,p) = 0\). The combined pattern boost is the max across axioms tagged on \(i\):
\[ \mathcal B(i,p) \;:=\; \max_{a \in \mathcal A(i)}\, \mathcal B_a(i,p). \tag{6.2} \]Since \(|\mathcal A(i)| \ge 1\) by construction, (6.2) is always well-defined. Note that \(\mathcal B(i,p) \ge 0\) with equality iff no other scoreable incident shares any axiom with \(i\).
7. Capped action confidence and incident score \(s(i,p)\)
The action confidence applies the boost and caps the result strictly below \(1\). Define the tier-cap function
\[ \kappa(s) \;:=\; \frac{s}{s+1}, \qquad s \in \mathbb{N}, \;\; s \ge 1, \tag{7.0} \]which is monotone increasing on its domain, bounded above by \(1\), with supremum \(1\) not attained (on the practical PC scale \(s \in \{1,\dots,10\}\), \(\kappa(s) \le \frac{10}{11} \approx 0.909\)). Then:
\[ C_{\mathrm{action}}(i,p) \;:=\; \min\!\bigl(\,\kappa(S(i)),\;\;\mathrm{base}_{\mathrm C}(i,p)\,\cdot\,\bigl(1 + \mathcal B(i,p)\bigr)\,\bigr). \tag{7.1} \]The choice of cap \(\kappa(s) = s/(s+1)\) takes the same functional form as a Bayesian posterior mean under sequential evidence accumulation (the form \(N/(N+1)\) is the maximum-likelihood estimate of a binary success rate after \(N\) trials under an improper limit prior — sometimes called a Haldane-style prior, the limiting case of \(\mathrm{Beta}(\varepsilon, 1)\) as \(\varepsilon \to 0\)). Equivalently, this is the conservative dual of Laplace's rule of succession (Laplace 1814, which uses a \(\mathrm{Beta}(1,1)\) uniform prior giving \((N+1)/(N+2)\)): same posterior shape, deliberately shifted prior. We adopt the conservative variant because the accountability domain calls for confidence to be earned from evidence rather than granted by a uniform prior.
| \(S(i)\) | \(\kappa(S(i))\) | Interpretation |
|---|---|---|
| 1 (infraction) | 0.500 | For tier-1 incidents the cap binds at 0.5 — minor incidents can never exceed coin-flip confidence under this formula. This is intentional: the cost of falsely amplifying minor allegations is judged higher than the cost of capping them. |
| 3 (misd. w/ jail) | 0.750 | |
| 5 (state prison) | 0.833 | |
| 7 (serious felony) | 0.875 | |
| 9 (life-eligible) | 0.900 | |
| 10 (death-eligible) | 0.909 | Asymptote — no incident is ever treated as 100% certain. |
Severity weighting is exponential:
\[ S_{\mathrm{weight}}(i) \;:=\; 2^{\,S(i)}. \tag{7.2} \]The per-incident score is the product:
\[ \boxed{\;\;s(i,p) \;:=\; S_{\mathrm{weight}}(i) \cdot C_{\mathrm{action}}(i,p) \;=\; 2^{\,S(i)} \cdot C_{\mathrm{action}}(i,p).\;\;} \tag{7.3} \]Equation (7.3) is the atomic unit of the system. All higher-level metrics aggregate it.
8. Per-person rollup \(R(p)\)
\[ \boxed{\;\;R(p) \;:=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)} s(i,\,p) \;=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)} 2^{\,S(i)}\cdot C_{\mathrm{action}}(i,\,p).\;\;} \tag{8.1} \]\(R(p)\) is the canonical per-actor severity score. There is no per-person aggregate cap — by methodological choice, repeated misconduct should aggregate without ceiling. See §12 for bounds and properties.
9. Display multipliers \(\delta_R\) and \(\delta_P\)
Two dimensionless display ratios are computed for presentation only. They do not feed back into \(R(p)\).
Let \(\mathcal P_{\text{ranked}} := \{\, p \in \mathcal P : R(p) > 0 \,\}\) be the set of actors with strictly positive score (i.e., at least one scoreable incident that produced a positive per-incident score under (7.3); note that an actor can have \(n(p) > 0\) tier-scored incidents and still satisfy \(R(p) = 0\) if every such incident has \(\mathrm{base}_{\mathrm C} = 0\)). Define
\[ \mu_R \;:=\; \frac{1}{|\mathcal P_{\text{ranked}}|} \sum_{p \in \mathcal P_{\text{ranked}}} R(p), \qquad \mu_P \;:=\; \frac{1}{|\mathcal P|} \sum_{p \in \mathcal P} R(p), \]with the convention that if \(\mathcal P_{\text{ranked}} = \varnothing\) then \(\mu_R\) is undefined (and \(\delta_R\) is reported as "—" by the display layer); similarly \(\mu_P\) requires \(|\mathcal P| > 0\), which is always satisfied in production. Then for any \(p \in \mathcal P\),
\[ \delta_R(p) \;:=\; \frac{R(p)}{\mu_R}, \qquad \delta_P(p) \;:=\; \frac{R(p)}{\mu_P}. \tag{9.1} \]The two ratios answer different questions:
- \(\delta_R(p)\) — vs ranked peers. "Among actors with \(R(p) > 0\), how does \(p\) compare?" The denominator excludes both null actors (\(n(p) = 0\)) and documented-but-unscored actors (\(n(p) > 0,\;R(p) = 0\)).
- \(\delta_P(p)\) — vs population. "Among all in-scope actors \(p \in \mathcal P\), how does \(p\) compare?" Both null and documented-but-unscored actors appear in the denominator and pull \(\mu_P\) toward zero.
10. Sentence-equivalent metric \(Y(p)\)
\(R(p)\) is the canonical mathematical ranking score, but its units (a dimensionless severity-weighted sum) are not graspable by non-technical readers. To translate documented misconduct into a unit the general public actually feels — years of imprisonment — we define a parallel sentence-equivalent metric \(Y(p)\), with units of years.
For each scoreable incident \(i \in I_{\mathrm{sc}}(p)\), let \(T_{\mathrm{mid}}(S(i))\), \(T_{\mathrm{low}}(S(i))\), \(T_{\mathrm{up}}(S(i))\) denote the middle, low, and upper sentence anchors (in years) from Table 3.1. The per-incident sentence contribution is:
\[ y_{\mathrm{mid}}(i,p) \;:=\; T_{\mathrm{mid}}(S(i)) \cdot C_{\mathrm{action}}(i,p), \tag{10.1} \]with \(y_{\mathrm{low}}\) and \(y_{\mathrm{up}}\) defined analogously. The actor's total sentence exposure is the consecutive sum (PC §669, "no plea, no concurrent discount" framing):
\[ \boxed{\;\;Y_{\mathrm{mid}}(p) \;:=\; \sum_{i \,\in\, I_{\mathrm{sc}}(p)}\, T_{\mathrm{mid}}(S(i)) \cdot C_{\mathrm{action}}(i,p)\;\;} \tag{10.2} \]\(Y_{\mathrm{low}}(p)\) and \(Y_{\mathrm{up}}(p)\) are computed identically using the low and upper triad columns. The published value is the canonical \(Y_{\mathrm{mid}}(p)\); the band \([Y_{\mathrm{low}}(p),\,Y_{\mathrm{up}}(p)]\) is displayed alongside to make the sentencing range transparent rather than hidden.
Prison vs jail decomposition
Because California's two custodial systems are not morally equivalent — state prison (CDCR) is materially harder time than county jail — \(Y(p)\) is decomposed by custody type per Table 3.1's Custody column:
\[ Y(p) \;=\; Y_{\mathrm{prison}}(p) \;+\; Y_{\mathrm{jail}}(p), \]where \(Y_{\mathrm{prison}}\) sums per-incident \(y(i,p)\) over scoreable incidents with \(S(i) \in \{5, 6, 7, 8, 9, 10\}\) (state-prison tiers) and \(Y_{\mathrm{jail}}\) sums over \(S(i) \in \{2, 3, 4\}\) (county-jail tiers including PC §1170(h) realignment felonies). Tier-1 incidents contribute zero years to either component. The display layer reports the prison and jail figures separately as well as the total.
Stacking convention
The summation in (10.2) implicitly stacks all per-incident sentences consecutively, not concurrently. This is a deliberate cumulative-harm convention chosen for the accountability framing of \(Y(p)\), not a prediction of actual California sentencing outcomes. Real-world California sentencing applies a battery of subordinate-term rules, PC §654 multiple-punishment stays, and judicial discretion under PC §669 that frequently produce concurrent or partially-concurrent terms. Y(p) does not model any of these — it asks only "what would each documented misconduct incident produce under PC §1170(b), summed independently, with the standard C_action confidence discount applied."
Worked example for a tier-9 sustained incident
Consider an incident with \(S(i) = 9\), \(\alpha = 1\) (sustained), \(\mathrm{evidence} = \mathrm{integrity} = \rho = 1\), no boost. Then \(\mathrm{base}_{\mathrm C}(i,p) = 1\), \(C_{\mathrm{action}}(i,p) = \min(\kappa(9),\,1) = 0.9\). Per Table 3.1, tier-9 triad is (15, 25, 30) years. The sentence contribution:
\[ y_{\mathrm{low}} = 15 \cdot 0.9 = 13.5,\quad y_{\mathrm{mid}} = 25 \cdot 0.9 = 22.5,\quad y_{\mathrm{up}} = 30 \cdot 0.9 = 27\ \text{years.} \]An actor with three such incidents would have \(Y_{\mathrm{mid}}(p) = 67.5\) years (band: 40.5 to 81), summed consecutively.
11. Three-state status classification
Each actor is assigned a categorical ranking_status based on whether incidents and scores exist:
The documented_unscored state captures actors with documented incidents that have not yet been tier-scored. The formula's policy is conservative: no tier classification means no score, never an estimated score.
12. Bounds, properties, and proofs
- Case 1: \(U \ge \kappa(s')\). Then both \(\min\) calls reduce to the cap, giving \(C_{\mathrm{action}}|_{S=s} = \kappa(s) < \kappa(s') = C_{\mathrm{action}}|_{S=s'}\) (Prop. 11.2). Since \(2^{s} < 2^{s'}\) and both \(C_{\mathrm{action}}\) values are strictly positive, the product \(s(i,p)\) is strictly larger at \(s'\).
- Case 2: \(U \le \kappa(s)\). Then both calls reduce to \(U > 0\) (since \(\mathrm{base}_{\mathrm C}>0\) and \(1+\mathcal B \ge 1\)). \(C_{\mathrm{action}}\) is the same constant \(U\) at both tiers, but \(2^{s} < 2^{s'}\) so the product is strictly larger at \(s'\).
- Case 3 (crossing): \(\kappa(s) < U < \kappa(s')\). Then \(C_{\mathrm{action}}|_{S=s} = \kappa(s)\) and \(C_{\mathrm{action}}|_{S=s'} = U > \kappa(s)\). So both factors strictly increase, hence so does the product.
Note that the \(\mathrm{base}_{\mathrm C}(i,p) > 0\) condition is necessary: if any of \(\mathrm{evidence}, \mathrm{integrity}, \rho, \alpha\) is zero, the product (5.1) collapses to zero and \(s(i,p) = 0\) regardless of tier.
Step 1: existing per-incident scores are non-decreasing. Fix \(j \in I_{\mathrm{sc}}(p)\). The only quantity in \(s(j,p) = 2^{S(j)} \cdot C_{\mathrm{action}}(j,p)\) that can change is \(\mathcal B(j,p)\) (since \(S(j)\) and \(\mathrm{base}_{\mathrm C}(j,p)\) depend only on \(j\) and \(p\), not on the rest of the dossier). For each axiom \(a \in \mathcal A(j)\), the set \(\mathcal I_a(j,p)\) grows from \(\mathcal I_a^{\,\text{old}}(j,p)\) to \(\mathcal I_a^{\,\text{old}}(j,p) \cup \{i^*\}\) when \(a \in \mathcal A(i^*)\), and is unchanged otherwise. In the latter case \(\mathcal B_a(j,p)\) is unchanged. In the former case, the per-axiom weighted top-\(4\) sum (6.1) accepts an additional non-negative candidate \(\mathrm{base}_{\mathrm C}(i^*,p) \ge 0\); since all weights \(w_k\) are non-negative and the ranking is "descending \(\mathrm{base}_{\mathrm C}\)" with empty-slot weight \(0\), inserting a new non-negative candidate cannot strictly decrease the sum of the top four (it replaces only a smaller or equal candidate). Hence \(\mathcal B_a(j,p)\) is non-decreasing for every axiom \(a\); the \(\max\) in (6.2) of a vector of non-decreasing entries is non-decreasing; therefore \(\mathcal B(j,p)\) is non-decreasing; therefore \(\mathrm{base}_{\mathrm C}(j,p)(1+\mathcal B(j,p))\) is non-decreasing; therefore \(C_{\mathrm{action}}(j,p) = \min(\kappa(S(j)),\,\cdot)\) is non-decreasing (the cap is fixed and the second argument grows); and finally \(s(j,p) = 2^{S(j)} \cdot C_{\mathrm{action}}(j,p)\) is non-decreasing.
Step 2: strict increase. The new term \(s(i^*,p) > 0\) by hypothesis. Each existing term is non-decreasing by Step 1. Therefore \[ R'(p) = s(i^*,p) + \sum_{j \in I_{\mathrm{sc}}(p)} s'(j,p) \;\ge\; s(i^*,p) + \sum_{j \in I_{\mathrm{sc}}(p)} s(j,p) \;>\; R(p). \qquad\square \]
An upper bound on \(R(p)\) does not exist: human harm is unbounded in principle, and the formula refuses to impose an artificial ceiling. See §17 for the principled reason this is a feature, not a defect.
13. Why exponential severity? (theorem)
The choice of \(S_{\mathrm{weight}}(i) = 2^{S(i)}\) is justified by the following observation, which we state formally:
This is not merely a quantitative preference; it encodes the normative claim that lethal harm is categorically not commensurable with minor misconduct. A linear weighting would imply substitutability that the methodology explicitly rejects.
Other weighting choices (e.g., \(S^k\) for polynomial \(k\), or arbitrary monotone tables) were considered. Exponential weighting with base 2 was selected for three reasons: (a) for the tier gaps that matter most in practice (e.g., comparing tier-9 conduct against bundles of tier-3 conduct), base 2 produces a dominance bound that comfortably exceeds realistic incident counts per actor — for \(s_{\max}-s_{\min} \ge 6\) (tier-9 vs tier-3 or wider), (12.2) yields \(N\) bounds in the dozens, exceeding typical empirical dossier sizes; (b) base 2 respects the ordinal structure of the underlying severity tier scale, where each step represents a categorical increase in severity (consistent with the PC-anchored Table 3.1 calibration even though \(S(i)\) is now analyst-assigned across charge / outcome / harm dimensions per §3); and (c) it produces interpretable "doublings" between adjacent tiers, a phrasing that translates cleanly to non-technical audiences. We do not claim base 2 dominates arbitrary incident counts — for small tier gaps (e.g., tier 4 vs tier 3) a large \(N\) of lower-tier incidents will outweigh a single higher-tier one, by design.
14. The strict 12-month window
Refresh-sensitive sources (web pages, press archives, oversight databases that delete old records) are evaluated against a rolling 12-month window with strict cutoff dates. Sources outside this window are not eligible to seed new incidents, though previously-ingested incidents are not retroactively removed.
Current window: (loading from the live spreadsheet…)
This is an engineering decision (not a mathematical one) intended to prevent silent corruption when an underlying source rotates or deletes old content. It does not affect equations (3.1)–(9.1). The current cutoff dates are recorded on the live EQUATION sheet and update automatically when the database refreshes.
15. Reproducibility
Every quantity above is intended to be reproducible from public artifacts. The one currently-acknowledged gap is that the explicit numeric mapping of evidence / integrity / ρ bands (the S1–S4 source_class table and the ρ density bands described in §5) is presently codified in reviewer practice rather than as an explicit SCHEMA cell; this is tracked as open item §16(g) and will be folded into the SCHEMA sheet before the next published lock. With that single exception, the equation, constants, and per-actor inputs are all served by public sources:
- Canonical spec:
EQUATIONsheet on the live spreadsheet (also exported asspreadsheet-data/EQUATION.json). - Severity tier scale:
SEVERITYsheet. - Confidence base values:
ALPHAsheet, with civilian-oversight source data. - Pattern boost weights:
BOOSTsheet. - 7-axiom taxonomy:
TAXONOMYsheet. - Action-rule mapping:
ACTION_RULESsheet. - Schema:
SCHEMAsheet. - Implementation:
tools/severity_scoring.pyin the project repository. - Per-actor data:
spreadsheet-data/{DAs,Judges,Officers,Defenders,Departments}_Math.json, also queryable via the public MCP server or REST API. - Master OpenDocument spreadsheet: the canonical
.odsfile used by the export pipeline (linked from the /spreadsheet page).
Any per-actor score \(R(p)\) can be verified by (a) reading the relevant _Math.json row, (b) reading the matching _Incidents.json rows for the inputs, and (c) re-running equations (3.1)–(8.1) by hand or in any computer-algebra system. Display ratios \(\delta_R\) and \(\delta_P\) additionally require evaluating (9.1) against the full population sums.
16. Limitations & open questions for review
This methodology has known limitations that are appropriate subjects for peer scrutiny. We list them rather than concealing them.
(a) Population completeness for \(\delta_P\)
The \(\delta_P\) ratio is only meaningful when the underlying sheet contains both ranked (\(R(p) > 0\)) actors and zero-score rows — that is, either documented_unscored actors (\(n(p)>0, R(p)=0\)) or null actors (\(n(p)=0\)) per the §10 status classification — covering the in-scope population. When a sheet contains only ranked rows, \(\mu_P\) trivially equals \(\mu_R\) and \(\delta_P \equiv \delta_R\) — a degenerate state that the display layer detects and surfaces with a yellow warning banner above the affected table, substituting "—" for \(\delta_P\) values. The mathematical guard is in place; per-sheet coverage status is visible on each worst-* page directly. See EQUATION sheet sections 6–7 and the population-collapse check at tools/check_pop_avg_collapse.py.
(b) Calibration of \(\alpha\)
The non-trivial values \(\alpha = 0.4348\) and \(\alpha = 0.259\) are derived from California civilian-oversight sustained rates. The choice of which agencies and time windows to include in the calibration set is itself a methodological choice and an appropriate subject for peer review. See ALPHA sheet for the underlying source data.
(c) Tier assignment for non-criminal misconduct
The PC-tier scale (Table 3.1) is a natural anchor for misconduct that maps cleanly onto a charge or sentence. Per §3, however, \(S(i)\) is assigned by the analyst as the "worst applicable dimension" — charge severity, adjudicated outcome severity, or harm caused — which means for non-criminal misconduct (e.g., ethics findings, harm-without-charge incidents) the tier assignment rests on the analyst's judgment about the "if charged criminally, what tier would this be" question. This is documented per-incident in the free-text reasoning field but introduces a degree of human judgment that should be acknowledged.
(d) Boost saturation
The 4-slot boost saturates at four matching incidents per axiom. A serial offender with 50 incidents of the same axiom receives no additional boost beyond the fourth match (though \(R(p)\) continues to grow linearly via (8.1)). The saturation choice is intentional but worth justifying formally.
(e) Independence assumption
Equation (5.1) treats the four confidence factors as multiplicatively independent. In practice they may be correlated (e.g., a primary-source document tends to be from a higher-integrity outlet). The product form is conservative — it under-weights when correlation is positive — but a covariance-adjusted form may be preferable for some use cases.
(f) Strict 12-month window edge effects
Sources just outside the window are excluded from new incident creation but contribute to existing incidents already in the dossier. This avoids retroactive deletion but creates a soft edge in coverage that may bias \(R(p)\) toward more-recently-covered actors.
(g) SCHEMA documentation gap for evidence / integrity / ρ bands
The discrete bands used by reviewers for the three confidence factors are listed in §5, but are not yet published as explicit rows on the SCHEMA sheet. This is a documentation cleanup, not a formula change; it does not affect any of the equations in §3–§12.
17. What this formula is NOT
- Not a verdict. A high \(R(p)\) is documentation, not adjudication.
- Not predictive. It scores past documented conduct only.
- Not curved. Identical conduct produces identical scores. There is no normalization to a target distribution.
- Not a totality measure. Only DOCUMENTED misconduct contributes; absence of \(R(p)\) is not evidence of clean record, only of absence of documentation.
- Not capped per person. Repeat conduct aggregates without ceiling — the methodology's position is that human harm is unbounded.
- Not anonymous. All inputs are public-record and citable; the formula carries no private or undisclosed signals.
18. References & related work
The formula's structure draws on standard techniques from the following literatures. This list is not exhaustive; reviewers are invited to suggest additions.
Composite indicators. OECD & JRC, Handbook on Constructing Composite Indicators: Methodology and User Guide, 2008. Foundational treatment of the design choices involved in summarizing multidimensional quality data into a single ranking.
Severity-weighted aggregation. Hanley & Lippman-Hand, "If nothing goes wrong, is everything all right?" JAMA 249(13), 1983 — on conservative aggregation under uncertainty.
Confidence as a product of independent factors. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, 1988. Justifies the \(\mathrm{evidence}\cdot\mathrm{integrity}\cdot\rho\cdot\alpha\) factorization as a degenerate case of a noisy-AND model.
Exponential weighting for rare-but-severe events. Taleb, Statistical Consequences of Fat Tails, 2020. Argues that linear or polynomial weighting systematically under-weights tail events; exponential weighting partially corrects this.
Pattern-saturation boost. The 4-slot product-with-max-combine rule is structurally analogous to the diversification-weighted IR scoring schemes in Carbonell & Goldstein, "The use of MMR, diversity-based reranking for reordering documents and producing summaries," SIGIR 1998.
Civilian-oversight calibration. ALPHA values are calibrated from the published sustained/unfounded rates of multiple California civilian-oversight bodies (San Francisco Department of Police Accountability, San Diego Citizens' Law Enforcement Review Board, Los Angeles Civilian Oversight Commission). Source data: ALPHA sheet.
Citation suggestion: California Justice Watch (2026). Ranking Methodology — Mathematical Formulation. https://cajusticewatch.com/methodology. Accessed [date].
Licence. This document is released under CC BY 4.0. The underlying spreadsheet, data, and source code are similarly open.
Errata + corrections welcomed. File issues at github.com/cajusticewatch/mcp-server or write to cajusticewatch.com/contact.