Data Analysis Methodology - Geographic/CMS Data

Hospital readmission rates and CMS penalty analysis across US states

Data Source: CMS Hospital Readmissions Reduction Program (HRRP) public datasets, available at data.cms.gov and Hospital Compare portal. Data represents hospital-level quality metrics and Medicare penalty assessments for 30-day unplanned readmissions.

4,692
Acute Care Hospitals
51
States + DC
6
Condition Measures

1. Overview

The geographic analysis component of ReadmitRisk leverages publicly available CMS data to provide state-level and hospital-level readmission benchmarking. Unlike the MIMIC-IV and UCI datasets which enable patient-level risk prediction, the geographic data enables:

2. Data Sources

2.1 CMS Hospital Readmissions Reduction Program (HRRP)

Attribute Value
Source Centers for Medicare & Medicaid Services (CMS)
Portal data.cms.gov and Hospital Compare
Hospitals Included 4,692 acute care hospitals
Geographic Coverage 50 US states + District of Columbia
Update Frequency Annual (published each summer)
File Format CSV (downloadable)

2.2 Condition-Specific Measures

HRRP tracks 30-day readmission rates for six conditions:

  1. Acute Myocardial Infarction (AMI): Heart attack
  2. Heart Failure (HF): Congestive heart failure
  3. Pneumonia (PN): Community-acquired pneumonia
  4. COPD: Chronic obstructive pulmonary disease
  5. Hip/Knee Replacement (THA/TKA): Elective joint replacement
  6. Coronary Artery Bypass Graft (CABG): Open-heart surgery

Risk Adjustment: CMS applies hierarchical logistic regression models to risk-adjust readmission rates for patient demographics, comorbidities, and hospital characteristics. This enables fair comparison across hospitals with different patient populations.

3. Data Extraction Process

3.1 Download and Parse CMS Files

import pandas as pd
import requests

# Download CMS Hospital Compare data
cms_url = "https://data.cms.gov/provider-data/dataset/9n3s-kdb3"
response = requests.get(cms_url)

# Parse CSV
df_hospitals = pd.read_csv("hospital_readmissions.csv")

# Inspect columns
print(df_hospitals.columns)
# Key fields:
# - facility_id (CMS Certification Number)
# - facility_name
# - state
# - measure_name (AMI, HF, PN, COPD, THA/TKA, CABG)
# - score (readmission rate %)
# - compared_to_national (Better/Same/Worse)
# - penalty_pct (HRRP penalty percentage)

3.2 Data Cleaning and Filtering

# Filter to acute care hospitals only
df_hospitals = df_hospitals[df_hospitals['hospital_type'] == 'Acute Care Hospitals']

# Remove non-numeric scores (suppressed due to low volume)
df_hospitals['score'] = pd.to_numeric(df_hospitals['score'], errors='coerce')
df_hospitals = df_hospitals.dropna(subset=['score'])

# Filter to all-condition readmission measure
df_all_cause = df_hospitals[df_hospitals['measure_name'] == 'Hospital-Wide Readmission']

# Calculate state-level aggregates
state_summary = df_all_cause.groupby('state').agg({
    'score': 'mean',                    # Average readmission rate
    'penalty_pct': 'mean',              # Average penalty %
    'facility_id': 'count'              # Hospital count
}).reset_index()

state_summary.columns = ['state', 'avg_readmission_rate', 'avg_penalty_pct', 'hospital_count']

3.3 Penalty Estimation

CMS penalty percentages represent the reduction in total Medicare reimbursements. To estimate dollar impact:

# Estimate total penalty dollars per state
# Assumes average hospital receives $50M in Medicare payments annually
state_summary['total_penalty_estimate'] = (
    state_summary['avg_penalty_pct'] / 100 *  # Convert to decimal
    state_summary['hospital_count'] *          # Number of hospitals
    50_000_000                                 # Avg Medicare payments per hospital
)

# Format for display
state_summary['total_penalty_estimate_formatted'] = (
    state_summary['total_penalty_estimate'].apply(
        lambda x: f"${x/1_000_000:.1f}M"
    )
)

Estimation Caveat: Total penalty estimates use national average Medicare payment volumes. Actual penalties vary by hospital size, case mix, and Medicare patient share. These are conservative approximations for illustrative purposes.

4. Geographic Aggregation

4.1 State-Level Metrics

State aggregations are weighted by hospital discharge volumes to avoid skew from small rural facilities:

# Weighted state averages
weighted_state_avg = df_all_cause.groupby('state').apply(
    lambda x: np.average(
        x['score'],
        weights=x['num_discharges']  # Weight by hospital volume
    )
).reset_index(name='weighted_avg_readmission_rate')

4.2 Regional Classifications

States are grouped into Census regions for pattern analysis:

Region States Avg Readmission Rate
Southeast AL, AR, FL, GA, KY, LA, MS, NC, SC, TN, VA, WV 16.2%
Midwest IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI 14.8%
Northeast CT, DE, ME, MD, MA, NH, NJ, NY, PA, RI, VT 14.3%
West AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY 13.1%

5. Hospital-Level Analysis

5.1 Penalty Tier Classification

Hospitals are classified into penalty tiers based on HRRP assessment:

def classify_penalty_tier(penalty_pct):
    if penalty_pct == 0:
        return 'No Penalty'
    elif penalty_pct < 1.0:
        return 'Low (0.01-1.0%)'
    elif penalty_pct < 2.0:
        return 'Moderate (1.0-2.0%)'
    else:
        return 'High (2.0-3.0%)'

df_all_cause['penalty_tier'] = df_all_cause['penalty_pct'].apply(classify_penalty_tier)

# Count hospitals by tier
tier_distribution = df_all_cause['penalty_tier'].value_counts()

5.2 Top/Bottom Performers

# Top 10 worst performers
worst_hospitals = df_all_cause.nlargest(10, 'score')[
    ['facility_name', 'state', 'score', 'penalty_pct']
]

# Top 10 best performers
best_hospitals = df_all_cause.nsmallest(10, 'score')[
    ['facility_name', 'state', 'score', 'penalty_pct']
]

6. Geographic Visualization

6.1 State Heatmap Generation

State-level choropleth maps visualize readmission rates:

import plotly.express as px

# Load US state geojson
states_geojson_url = "https://raw.githubusercontent.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json"

# Create choropleth
fig = px.choropleth(
    state_summary,
    geojson=states_geojson,
    locations='state',
    locationmode='USA-states',
    color='avg_readmission_rate',
    color_continuous_scale='RdYlGn_r',  # Red for high, green for low
    scope='usa',
    hover_data=['hospital_count', 'avg_penalty_pct']
)

fig.write_html('state_readmission_heatmap.html')

7. Data Integration with Dashboard

7.1 Export to JSON

# Export state summary
state_summary_json = state_summary.to_dict(orient='records')

with open('dashboard/lib/state_summary.json', 'w') as f:
    json.dump(state_summary_json, f, indent=2)

# Export hospital metrics
hospital_metrics = df_all_cause[[
    'facility_name', 'state', 'city', 'score', 'penalty_pct'
]].rename(columns={
    'facility_name': 'name',
    'score': 'readmission_rate'
}).to_dict(orient='records')

with open('dashboard/lib/hospital_metrics.json', 'w') as f:
    json.dump(hospital_metrics, f, indent=2)

7.2 Dashboard Rendering

The Next.js dashboard consumes these JSON files for interactive visualization:

8. Data Quality Considerations

8.1 Suppression Rules

CMS suppresses hospital readmission rates when:

Impact: ~15-20% of hospital-measure combinations are suppressed. State aggregates are based only on reported values.

8.2 Risk Adjustment Limitations

CMS risk adjustment models account for:

But do NOT account for:

This may disadvantage safety-net hospitals serving vulnerable populations.

8.3 Measurement Period Lag

CMS data reflects 3-year rolling averages published 12-18 months after measurement period ends. Current data typically represents performance from 2-5 years prior.

9. Applications

9.1 Health Plan Use Cases

  1. Network Adequacy: Identify service areas with high-penalty hospital concentration
  2. Provider Engagement: Target quality improvement partnerships with hospitals in high-penalty tiers
  3. Value-Based Contracting: Structure incentives based on CMS benchmarks
  4. Member Attribution: Identify members discharged from high-readmission facilities for proactive outreach

9.2 Hospital System Use Cases

  1. Competitive Benchmarking: Compare performance against regional peers
  2. Quality Improvement Targeting: Focus on conditions with worst performance vs. national
  3. Penalty Mitigation: Model intervention impact on CMS penalty reduction
  4. Service Line Planning: Assess readmission risk before launching new programs

10. Limitations and Caveats

11. Future Enhancements

  1. Time Series Analysis: Track state/hospital trends over 5+ years to identify improving/deteriorating markets
  2. SDOH Integration: Overlay social determinants data (AHRQ, Census) to explain geographic variation
  3. Condition-Specific Heatmaps: Separate maps for HF, AMI, COPD to identify specialty gaps
  4. Hospital Clustering: Group hospitals by performance patterns for targeted interventions
  5. Predictive Modeling: Forecast hospital penalty risk based on historical trajectories

12. References