Torch Insight is built on an integrated data warehouse that combines a multitude of different data sources and links the data together in a way that allows for exploration, analysis, benchmarking, and generally gaining insight into how different players in healthcare markets are connected. A key component of any data platform is understanding where the underlying Torch Insight data comes from. In this blog I’ll talk about where underlying data comes from and how it’s integrated into the platform.
There are three major underlying sources for the data that we use: (1) publicly available data, (2) licensed data, and (3) collected data. Additionally, there is a wealth of derived data that we provide.
Publicly Available Data
There are thousands of publicly available datasets that are sponsored and maintained by a variety of different entities. The majority of these are managed by government entities and, because of the public funding that sponsors their creation, they are disseminated publicly. Others are sponsored by not-for-profit organizations, and a handful are released by for-profit entities. For Torch Insight, we work with many dozens of publicly-funded data sources including those sponsored by the Centers for Medicare & Medicaid Services (CMS), the United States Census Bureau, the Centers for Disease Control (CDC) and others.
Many of these sources are direct sources since they are the result of government-backed studies or data collection efforts. Think of these like data from the Census Bureau where the direct collection of demographic information was the intent of the programs that created them. Others, such as many of the datasets from CMS, are indirect sources since they are derivatives of other programs. Medicare beneficiary demographics, for example, are derived from the data captured which is required to run the Medicare program.
A variety of data sources that we use come from proprietary sources that we license from other companies. These come from a variety of vendors that collect data for a variety of reasons. Some are in the business of collecting and selling data, while others are in a separate business and have collected large amounts of data as a result of that business and are seeking opportunities to disseminate it with business partners. An example of this data is health insurer financial and enrollment data which is reported quarterly to the National Association of Insurance Commissioners. Data that we license from others includes redistribution permissions, so we can resell that data, and the derivative work we generate from it, to our clients.
A subset of licensed data is data that we have a license to use but are only able to sell derivative work. For example, we do not sell claims data, but we do license claims data that we are able to extract information from, including standardized reports and relationships, and then we share just the extracted or generated data.
The third source of data is data that we manually collect. For example, our ACO Database was originally started in 2010 and we have a team that is focused on maintaining a record of all payment contracts where healthcare providers accept risk for the patients they care for. Other manually collected data is validated data where we have a team that manually insures that our other data sources are correct, such as validating addresses and phone numbers, or taking known healthcare providers that come from other sources and adding additional information such as websites and identifying which health system they are affiliated with. We have manually validated hundreds of thousands of variables and fields and continue to do so on an ongoing basis.
Similar to some of the standardized reports we generate from claims data, we also create other derivative work from our base datasets. This includes making basic calculations, such as standardized financial ratios, or applying more sophisticated model derived from dozens of variables to make customized estimates.
Knowing Where the Data Comes From
Understanding the underlying data sources is essential for analysts and researchers as they try to make sense of what they discover through Torch Insight. Our data is tracked with a custom-built metadata management system that keeps track of the underlying data source, the time frame it covered, the time it was released, and any calculations done to the data. This can be accessed from within many of the tools, or can also be accessed through a separate, searchable resource.