Hospital readmission rates and CMS penalty analysis across US states
Data Source: CMS Hospital Readmissions Reduction Program (HRRP) public datasets, available at data.cms.gov and Hospital Compare portal. Data represents hospital-level quality metrics and Medicare penalty assessments for 30-day unplanned readmissions.
The geographic analysis component of ReadmitRisk leverages publicly available CMS data to provide state-level and hospital-level readmission benchmarking. Unlike the MIMIC-IV and UCI datasets which enable patient-level risk prediction, the geographic data enables:
| Attribute | Value |
|---|---|
| Source | Centers for Medicare & Medicaid Services (CMS) |
| Portal | data.cms.gov and Hospital Compare |
| Hospitals Included | 4,692 acute care hospitals |
| Geographic Coverage | 50 US states + District of Columbia |
| Update Frequency | Annual (published each summer) |
| File Format | CSV (downloadable) |
HRRP tracks 30-day readmission rates for six conditions:
Risk Adjustment: CMS applies hierarchical logistic regression models to risk-adjust readmission rates for patient demographics, comorbidities, and hospital characteristics. This enables fair comparison across hospitals with different patient populations.
import pandas as pd
import requests
# Download CMS Hospital Compare data
cms_url = "https://data.cms.gov/provider-data/dataset/9n3s-kdb3"
response = requests.get(cms_url)
# Parse CSV
df_hospitals = pd.read_csv("hospital_readmissions.csv")
# Inspect columns
print(df_hospitals.columns)
# Key fields:
# - facility_id (CMS Certification Number)
# - facility_name
# - state
# - measure_name (AMI, HF, PN, COPD, THA/TKA, CABG)
# - score (readmission rate %)
# - compared_to_national (Better/Same/Worse)
# - penalty_pct (HRRP penalty percentage)
# Filter to acute care hospitals only
df_hospitals = df_hospitals[df_hospitals['hospital_type'] == 'Acute Care Hospitals']
# Remove non-numeric scores (suppressed due to low volume)
df_hospitals['score'] = pd.to_numeric(df_hospitals['score'], errors='coerce')
df_hospitals = df_hospitals.dropna(subset=['score'])
# Filter to all-condition readmission measure
df_all_cause = df_hospitals[df_hospitals['measure_name'] == 'Hospital-Wide Readmission']
# Calculate state-level aggregates
state_summary = df_all_cause.groupby('state').agg({
'score': 'mean', # Average readmission rate
'penalty_pct': 'mean', # Average penalty %
'facility_id': 'count' # Hospital count
}).reset_index()
state_summary.columns = ['state', 'avg_readmission_rate', 'avg_penalty_pct', 'hospital_count']
CMS penalty percentages represent the reduction in total Medicare reimbursements. To estimate dollar impact:
# Estimate total penalty dollars per state
# Assumes average hospital receives $50M in Medicare payments annually
state_summary['total_penalty_estimate'] = (
state_summary['avg_penalty_pct'] / 100 * # Convert to decimal
state_summary['hospital_count'] * # Number of hospitals
50_000_000 # Avg Medicare payments per hospital
)
# Format for display
state_summary['total_penalty_estimate_formatted'] = (
state_summary['total_penalty_estimate'].apply(
lambda x: f"${x/1_000_000:.1f}M"
)
)
Estimation Caveat: Total penalty estimates use national average Medicare payment volumes. Actual penalties vary by hospital size, case mix, and Medicare patient share. These are conservative approximations for illustrative purposes.
State aggregations are weighted by hospital discharge volumes to avoid skew from small rural facilities:
# Weighted state averages
weighted_state_avg = df_all_cause.groupby('state').apply(
lambda x: np.average(
x['score'],
weights=x['num_discharges'] # Weight by hospital volume
)
).reset_index(name='weighted_avg_readmission_rate')
States are grouped into Census regions for pattern analysis:
| Region | States | Avg Readmission Rate |
|---|---|---|
| Southeast | AL, AR, FL, GA, KY, LA, MS, NC, SC, TN, VA, WV | 16.2% |
| Midwest | IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI | 14.8% |
| Northeast | CT, DE, ME, MD, MA, NH, NJ, NY, PA, RI, VT | 14.3% |
| West | AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY | 13.1% |
Hospitals are classified into penalty tiers based on HRRP assessment:
def classify_penalty_tier(penalty_pct):
if penalty_pct == 0:
return 'No Penalty'
elif penalty_pct < 1.0:
return 'Low (0.01-1.0%)'
elif penalty_pct < 2.0:
return 'Moderate (1.0-2.0%)'
else:
return 'High (2.0-3.0%)'
df_all_cause['penalty_tier'] = df_all_cause['penalty_pct'].apply(classify_penalty_tier)
# Count hospitals by tier
tier_distribution = df_all_cause['penalty_tier'].value_counts()
# Top 10 worst performers
worst_hospitals = df_all_cause.nlargest(10, 'score')[
['facility_name', 'state', 'score', 'penalty_pct']
]
# Top 10 best performers
best_hospitals = df_all_cause.nsmallest(10, 'score')[
['facility_name', 'state', 'score', 'penalty_pct']
]
State-level choropleth maps visualize readmission rates:
import plotly.express as px
# Load US state geojson
states_geojson_url = "https://raw.githubusercontent.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json"
# Create choropleth
fig = px.choropleth(
state_summary,
geojson=states_geojson,
locations='state',
locationmode='USA-states',
color='avg_readmission_rate',
color_continuous_scale='RdYlGn_r', # Red for high, green for low
scope='usa',
hover_data=['hospital_count', 'avg_penalty_pct']
)
fig.write_html('state_readmission_heatmap.html')
# Export state summary
state_summary_json = state_summary.to_dict(orient='records')
with open('dashboard/lib/state_summary.json', 'w') as f:
json.dump(state_summary_json, f, indent=2)
# Export hospital metrics
hospital_metrics = df_all_cause[[
'facility_name', 'state', 'city', 'score', 'penalty_pct'
]].rename(columns={
'facility_name': 'name',
'score': 'readmission_rate'
}).to_dict(orient='records')
with open('dashboard/lib/hospital_metrics.json', 'w') as f:
json.dump(hospital_metrics, f, indent=2)
The Next.js dashboard consumes these JSON files for interactive visualization:
CMS suppresses hospital readmission rates when:
Impact: ~15-20% of hospital-measure combinations are suppressed. State aggregates are based only on reported values.
CMS risk adjustment models account for:
But do NOT account for:
This may disadvantage safety-net hospitals serving vulnerable populations.
CMS data reflects 3-year rolling averages published 12-18 months after measurement period ends. Current data typically represents performance from 2-5 years prior.