A Leading Therapeutics Company from Boston Transforms its Framework, Processes, and Templates via AntStack for Better Speed, Reliability and Efficiency
Data Engineering On Databricks Artwork





Problem Statement

A pioneering Therapeutics company is focused on delivering life-changing brain health medicines and therapies. They focus on drug and compound research and development. They use translational data to drive efficiency in drug development, explore the impact of their proprietary compounds and understand their potential in the treatment of disorders of the brain. They have designed a portal to offer accurate, balanced, and current scientific information to support medical professionals with AntStack.

About Data Engineering On Databricks

A leading Therapeutics company is committed to developing novel therapies with the potential to transform the lives of people with debilitating disorders of the brain. They are pursuing new pathways to improve brain health and run depression, neurology, and neuropsychiatry franchise programs that aim to change how brain disorders are perceived and treated. Their mission is to make medicines that matter so people can get better, sooner. They aim to transform the practice of neuroscience research and rethink how central nervous system (CNS) disorders are understood and treated. Their mission is to pioneer solutions to deliver life-changing brain health medicines, so every person can thrive.

Goals and Expectations:

Their initial expectations involved building pipelines to load data from various sources, performing required transformations on said data, and making them available to business users and analysts, all using the Databricks platform. The data to be loaded ranged from research and development data regarding tests, drugs, and compounds to commercial and customer data collected both internally and from external vendors. While the data sources varied from SFTP servers and external RDBMS databases to text and CSV files made available via AWS S3. The requirement also involved the eventual development of a framework and process which could be adapted for any use case and to handle any type and scale of data.

Technology Advancement with New Serverless Platform


The Therapeutics company was facing challenges with the existing system and was delighted with the following outcomes:

Speed and Reliability Goals

While they use the tool https://healthchecks.io/ for selected use cases and follow a general practice of maintaining checklists for quality and sanity checks, they now wanted To speed up the primary metric for speed and reliability. The ability to apply the aforementioned process/ framework in non-generic use cases and ad hoc requirements. AntStack was able to provide resilient solutions to hurdles in data loading and transformation within a relatively short period while maintaining the data quality.

Simple and Effective Cron Job Monitoring

They were looking for a notification system for the nightly backups, weekly reports, cron jobs, and scheduled tasks. Most of these jobs were not running on time. AntStack solved their issues with a process flow, wherein a user generates a unique ping URL for their background job. Then update the job to send an HTTP request to the ping URL every time the job runs. When the job does not ping Healthchecks.io on time, Healthchecks.io alerts the user. This simple yet effective solution helped them deliver on time.

Seamless Integration with External Storage Services

The Therapeutics company did not want to go serverless in its implementation to manage clusters. Instead, they wanted Databricks to take care of spinning up, managing, and orchestrating the compute clusters used for the ETL process as well as SQL Endpoints for querying and analytics. AntStack utilized Databricks and helped them seamlessly integrate with external storage services, job orchestration, and workflow capabilities, along with git integration for source control and preconfigured spark environment. The program featured rich notebooks with support for multiple languages, including SQL, Python, Shell, etc., making the trade-off of managed clusters over serverless computing worth it.

Technological Loading and Transformation

They were loading data and applying the required transformations to the data, which could vary from adding new columns, doing various aggregations, and joining to combining data from various bronze tables to single or multiple target tables across the refined/silver and trusted/gold layers. The implementation involved using the Databricks platform to load data from various sources using spark methods available through PySpark and spark SQL. The source data is loaded from various sources using different methods of reading ‘said data’ supported by the spark to a raw/bronze layer. The cleaned and transformed data is then made available to business users and analysts via SQL warehouses (endpoints) provided by Databricks with granular permissions. Databricks notebooks and workflows are employed to achieve the bulk of the loading and transformation, while the AWS Glue data catalog acts as a hive metastore alongside ample use of other AWS services like SES for reporting.

Faster and Reliable Framework

The Therapeutics company lacked a fast and reliable framework to load and transform huge chunks of data spread across multiple sources, systems, and teams. AntStack offered a solution involving setting up processes and templates to handle various generic data loading scenarios to improve and streamline the time taken between collecting the data and being able to explore it. This approach helped reduce the time involved and helped identify and understand the pain points which could be focused on, specifically the generic cases that have been handled faster.

Overall Impact

  • The new ETL framework helped to reduce the developer’s manual work, improving overall development time.
  • New quality checklist/code review checklist, helped developers deliver quality work that satisfies the business requirements.
  • Leveraging AWS SES enables them to send emails from within notebooks for reporting beyond what Databricks offers.
  • A specific use case involved having to deal with nested levels of zipped files as a source to load the data from
  • Being able to load data from RDBMS databases like oracle using spark JDBC connectors and optimizing the same for best load time
  • Dealing with unconventional data sources, including ones without APIs to trigger data generation/download
  • Moved the necessary datasets and data pipelines into production to be made available to stakeholders, business users, and analysts in good time.

Way Forward

The Therapeutics company was delighted by AntStack’s efforts as they now have an effective framework, processes, and templates powerful enough to handle various recurring and even ad hoc use cases. 

They plan to continue leveraging AntStack services for further technological transformation as more business analysts, teams, and use cases are being brought in with data to be loaded, transformed, and made production ready. They also strive to reduce and eventually drop reliability on older individuals and isolated systems.

For application-related queries and upgrades to your existing system to grow your business, contact AntStack. We are your holistic solutions partner to design, develop and deploy effectively.

Related Case Studies

A Cybersecurity Technology Innovator Automates its Systems Through a Secure Serverless Solution via AntStack
Divo Automates Audio Distribution Processes Through a Secure Serverless Solution via AntStack
A Leading Financial App Builder from Indonesia Goes Serverless with AntStack And Builds a Cross-Platform MVP

This website stores cookies on your computer.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors on this website.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.