SCC Public Health & AWS: Collaborative public health data platform (PRO208)

Implementing a Collaborative Public Health Data Platform for Santa Clara County

Introduction

  • The presentation was delivered by Rajiv Shupi and TJ, senior consultants at AWS Public Sector ProServe, and covered the implementation of a comprehensive Public Health Data platform for the County of Santa Clara in California.

Business Impact Story

  • The mission of the Santa Clara County Public Health Department is to improve and protect the health and wellbeing of its constituents, serving over 2 million people.
  • The department works with 70 partner agencies, primarily the California Department of Public Health and system integrators like AWS.
  • The implementation involved a flagship Public Health Data platform with 15 data sources, over 50 million records ingested weekly, 150 ingestion views, 300+ curation and distribution tables, and 150 reports and 60+ dashboards.

Business Challenges

  • The public health department faced challenges with siloed and fragmented data, where epidemiologists and data scientists spent more time wrangling data than analyzing and using it for reporting with meaningful insights.

Personas and Their Problems

  • Tom, a new epidemiologist, had to deal with large and messy data sets and wanted automated reports with validated data sets.
  • Dr. Santa Clara, a busy healthcare officer, needed timely data for decision-making and reporting to leadership to better serve the constituents.

Data Sources

  • The key data sources included COVID-19 data from CalREDIE, immunization data, contact tracing data, syndromic surveillance, and vital statistics.

Solution Architecture and Data Flow

  • The solution used a multi-account AWS architecture for higher security and granular control.
  • The data flow included an injection layer for data validation and standardization, a curation layer for applying business rules and storing in an optimized model, and a distribution layer for transformed, query-optimized data.
  • Key use cases included identity management using AWS Glue Find Matches and text extraction from handwritten forms using AWS Textract.

Business and Technical Outcomes

  • 13 out of 15 data sources were deployed, with 150+ ingestion views, 300+ curation and distribution tables, and 50+ million records ingested weekly in under 50 minutes on average.
  • The identity management use case was implemented for 6 data sources, and the PDF form ingestion using Textract was also successful.

Lessons Learned

  • Start small, think big, and avoid over-complicating the solution.
  • Include data modeling efforts as part of the implementation to provide production-grade data in lower environments.
  • Focus on finishing what you start, working backwards from the business outcome.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us