Researchers Pragmatically Tapping Secondary Data Sources
By Deborah Borfitz
June 9, 2021 | The benefits and challenges of leveraging secondary data for pragmatic research was the focus of conversation as well as a keynote address at the recent Colorado Pragmatic Research in Health Conference (COPRH Con), a global meeting focused on studies conducted in a real-world context. Growing interest in pragmatic studies is tied to the reality that it takes 17 years for only 14% of research to translate into practice, according to opening remarks by Allison Kempe, M.D., MPH, founding director of Adult & Child Consortium for Health Outcomes Research & Delivery Science (ACCORDS) at the University of Colorado.
The meeting of the two data worlds was discussed by David Vock, Ph.D., biostatistician at the University of Minnesota (UMN), based on a case study for evaluating the effect of living kidney donation on long-term outcomes. Use of retrospective data sources such as medical records, insurance claims, and registries of completed studies remains a relatively novel concept, he says.
COPRH Con was co-sponsored by the ACCORDS education program at the University of Colorado School of Medicine and the Colorado Clinical and Translational Sciences Institute. ACCORDS is the recipient of a three-year conference grant from the Agency for Healthcare Research and Quality.
Among the many strengths of such secondary data sources are that they are contemporaneous, capture information from the diversity of people seeking care, are relevant to clinicians and patients, and are inexpensive to access, says Vock. On the other hand, the data are irregularly and inconsistently collected.
The data “can and should” be used at many points in pragmatic research—and not just as a substitute for data collection. “Perceived weaknesses can and should be reframed as strengths for pragmatic research,” Vock says.
Living kidney donation is the definitive treatment for end-stage kidney disease (ESRD), due to better immunological matching and organ functioning. The recipient can also identify the donor sooner, in some cases before the start of dialysis, as opposed to the typically prolonged wait for a cadaver donor, he explains.
For living kidney donors, the perioperative mortality rate is 3 per 10,000 and major complications occur in 3% to 6% of cases, Vock says. While they lose half of their kidney function, their remaining kidney will increase in capacity to compensate so that their total kidney function returns to 70% of pre-donation functioning with one year.
The longer-term health implications of giving up the remaining 30% of kidney function was the question addressed by the pragmatic clinical trial, he continues. Prior research found that living donors were themselves at heightened risk of ESRD in the first 15 years after donation—just over 30 (versus 5) per 10,000—a small risk overall but a dramatic increase relatively speaking.
All-cause and cardiovascular mortality were also higher, he adds. Among living donor transplants done at UMN, a dramatic increase in ESRD was seen at 15 years up to 40 years post-donation.
More than 4,500 living donor transplants have been down at UMN over the past 60 years and, since 2000, surveys have gone out to the donors every three years, says Vock. The pragmatic trial used that data, supplemented by data from the Rochester Epidemiology Project (REP), population-based research ongoing since the 1960s in Minnesota and Wisconsin. Potential controls were identified in the REP database during the study’s design phase based on demographics (race, gender, age, geography) and contemporaneousness with the donors.
Limitations included the possibility of missing data, including documentation of care (e.g., heart attack) provided outside of Olmsted Medical Center and the Mayo Clinic and lab values (e.g., serum creatinine levels and urinalysis values) that could influence donor qualification and patient surveys that never get completed and returned. How these limitations might be ameliorated in the design and analytics phase remains an open question.
The trial was relatively labor-intensive, as it involved interrogating existing data to identify donors and controls and chart abstraction to generate follow-up data from the REP. The United States Renal Data System and National Death Index had to be queried to ascertain the timing of diagnosis and death. “It is not a cheap option,” says Vock, noting the study was made possible by grant funding.
Transferring and integrating all that data into a central database to analyze results will be another challenge, as each source has its own governing data use agreement, says Vock. Identifying the best statistical methods for combining imperfect data is an “active ongoing goal,” he adds, pointing to the likelihood that some of the available information coming from self-reporting, medical records, and national registries will be conflicting.
While managing data in the trial is challenging, the information is clinically meaningful because it represents issues for which care was sought and care decisions were made, he says. Integrating the multiple sources of data provides a more comprehensive understanding of patients while eliminating study selection bias. “All [living kidney] donors at the UMN [can participate], not just those who agree to be part of an onerous randomized clinical trial.”
The topic of assessing context and fit in usual care settings was highlighted by a session on Pragmatic Explanatory Continuum Indicator Summary-2 Provider Strategies (PRECIS-2-PS) led by Wynne Norton, Ph.D., program director for implementation science in the division of cancer control and population science at the National Cancer Institute. PRECIS-2-PS is a tool developed to help plan trials along the explanatory-pragmatic continuum, a term coined in 1967.
On one end of the spectrum are trials focused on demonstrating efficacy under controlled conditions and maximizing internal validity and on the other end are those centered on effectiveness in real-world conditions and guiding decision-making, Norton explains. PRECIS-2 is the most-used tool, first described in 2015 (DOI: 10.1136/bmj.h2147), for designing trial that are “fit for purpose.” Individual trial domains determine where a trial ranks overall on a scale of 1 (purely explanatory) to 5 (purely pragmatic).
PRECIS-2-PS was an adaptation of that tool for trials where the target of an intervention are practitioners, Norton says, and retains the same nine domains—eligibility, recruitment, setting, implementation resources, flexibility of provider-focused strategies, flexibility of interventions, data collection, primary outcomes, and primary analysis—as well as the scoring system. As with the original PRECIS-2, trials were deemed more pragmatic the more closely they resembled usual (not standard) care. Examples would be trials where healthcare professionals are encouraged to flexibly employ an intervention based on the needs of patients or they can flexibly opt in or out of using it at all.
The major difference with the PRECIS-2-PS version is that the implementation resources domain is gauging how closely the trial’s needed resources supporting the delivery of the provider-focused strategy differs from resources readily available in usual care, says Norton. Stakeholder groups are the emphasis, and these include healthcare professionals, health system leaders, IT experts, and anyone else to whom study results would apply. These individuals help in selecting outcomes and assessing if an intervention is feasible outside of a trial, including its cost to implement.
Moving forward, the reliability and validity of PRECIS-2-PS will be tested when applied to both prospective and retrospective data, she continues. The tool will also address changes to trial domains over time. A repository of case studies will also be built for training purposes.
PRECIS-2 has been used for over 5,000 registered clinical trials and not only in primary care, Norton shares. There is also now some discussion about developing a PRECIS-3 tool that might address interventions more broadly at the community or policy level.
Clinical workflows are not set up for the collection of patient-reported outcomes (PROs) in pragmatic research, according to conference speaker Rodger Kessler, Ph.D., associated clinical professor at the University of Colorado as well as senior scientist for dissemination and implementation for the AAFP National Research Network. PRO surveys need to be acceptable to patients and settings, he says, as well as support research that is meaningful to organizations.
In a COVID-19-related project currently underway with three research teams at Arizona State University and a family medicine practice in Colorado, the electronic health record (EHR) and short-form survey measuring quality of life is being used for risk stratification and triage to a clinical pathway, Kessler says.
Its aim is both quality improvement and to assess the state of EHR sharing capacity.
The literature suggests that the use of PRO measurement tools is acceptable to patients and providers, but relatively few studies evaluate implementation and effectiveness issues, he says. “A lot of information is collected but a much smaller percentage is useful in care delivery” to support decision-making, goal setting, risk stratification, communication between providers and patients, and interaction among care team members as it relates to clinical actions taken.
Quality-of-life surveys are the best predictor of the clinical, economic, and social consequences of health and disease, says Kessler. A generic, short-form questionnaire takes about one minute to complete and the less common disease-specific ones a median of two to three minutes.
But response rates from patients have been mixed and clinicians are challenged by policies and procedures to collect and use the data. Sustaining this kind of activity may require giving $2,000 annual stipends to practices, he says.
According to various published accounts, quality of life measures can predict treatment response, future health, cost of healthcare, work productivity, return to work, and mortality. Over time, he has thought less about the survey type than the stakeholders involved in care planning and delivery, administration, and policymaking, Kessler shares.
The 10-item QGEN, designed as a one-minute alternative to the common SF-36 survey, has been validated in multiple settings and in different subpopulations. Together with the QDIS-7-item tool, measuring how much disease limits function, it provides a generic physical and mental health score, he says.
When it comes to PRO survey choice, “there is no magic bullet,” continues Kessler. In his work with internationally recognized PRO expert John Ware, Ph.D., he has used a 15-item list within the Annual Wellness Visit of the Centers for Medicare & Medicaid Services that asks patients about their ability to function.
In one clinic, Kessler and his colleagues demonstrated that the 15-question list could be surfaced in the EHR as a patient-specific assessment tool. The information has utility in improving patient functioning and clinical outcomes as well as early pre-crisis access to care, he notes. It could also be the means for rapid risk stratification at the population level to improve efficiency and coordination and reduce hospitalization and emergency room use.
For the current COVID-19 project, enrolled participants are “challenging populations” with social determinant risk factors and multiple comorbidities, Kessler says. The protocol calls for quality of life data to be collected on 250 patients on Medicaid insurance at each of the practices and uses a validated risk stratification tool, an index generated from EHR and submitted claims data to identify the top 20% of patients at high risk for poorer functioning.
The pandemic has made it difficult for primary care practices to set up a system to collect and use data from patients and the IT infrastructure was too limited at one site to allow its participation, he says. The other two Arizona practices have succeeded in generating the “vulnerability index,” but response to the quality-of-life survey is only about 4%. At the Colorado practice, where there is more focused interest in the project, collection of the quality-of-life measures on the 250 patients is about to begin.
Social Media Data
Mining and analyzing data from social media for pragmatic research was the subject of a talk by Bethany Kwan, Ph.D., associate professor, and Jenna Reno, Ph.D., communication and dissemination scientist, both in the department of family medicine at University of Colorado School of Medicine. The session was designed to help attendees “know what to ask for” when approaching an analyst.
Key audience type is what separates the many social media platforms, which are no longer for younger users only, says Reno. The platforms collectively have 4.2 billion users, Facebook being the most popular worldwide, and are now a recognized way to reach participants for research.
Social media has different research-related uses, including to implement and conduct studies, connect with stakeholders, engage patient communities, disseminate results, and (with some caveats) recruit people into trials, she says. It is also a source of secondary data for communication research, network analysis, ethnographic research, public health surveillance, and patient-generated health outcomes data.
Among the many benefits of using social media are to do real-time data collection, reach large numbers of people cost effectively, understand the information people are seeking, identify trending topics, engage with a tool already being used daily, have a built-in network for sharing and resharing, and to forego the need for transcribing, says Reno. Natural language processing can do content analysis of large datasets.
Challenges include inequitable access, selection bias, data accessibility, non-standard data (quality and formatting concerns), non-traditional sampling, and a long list of ethical considerations—e.g., Are data private or public? Should users be asked to participate or can consent be waived? Does anonymity need to be protected? When might an institutional review board need to be involved?
Researchers need to weigh potential harms against potential public benefits, continues Reno, as well as legal concerns and site terms and conditions. Facebook is more protective of user data and may not make it readily accessible to people free of charge.
Kwan discussed a research project that used Twitter in a partnership with the social media groups #BTSM (Brain Tumor Social Media) and #HPM (hospice and palliative medicine) to learn the quality-of-life concerns of patients and assess how well they aligned with palliative care services. Tweet chats were held at a set time once a week and the 20-22 pages of the transcribed conversation were downloaded for content analysis by Symplur.
Several hundred unique tweets and themes emerged, addressing quality of life in the context of healthcare and identifying the need for better support for care partners, says Kwan. Patients and care partners also expressed their preference for quality-of-life discussions to happen early but not at initial diagnosis. All participants agreed up front to be potentially quoted in the intended research paper.
Knowing the audience, including influencers, is important when using social media, Kwan says. As data sources, social media can provide unstructured data (text and images stored in a native format), structured data (e.g., profile data), and metadata (e.g., user-generated hashtags or keywords).
The data can be mined manually, by transcribing posts by hand; using an API (free for 1% of tweets on Twitter); or using a third-party platform such as Symplur ($5,000 expense for the University of Colorado project), she adds. Data processing options include named-entity recognition and normalization as well as text mining techniques to extract features from free text—among them N-gram, sentence dependency-based word embeddings, dictionary lookup, and an algorithm based on fuzzy adaptive resonance theory.
Social listening tools used by companies to understand their customer base and what people are saying about their brand are also available to deploy for pragmatic research purposes, says Reno. The options here include CrowdTangle, Agorapulse, Hootsuite, Iconosquare, and Sprout Social—each specializing in a different area—but the outlay can get hefty. Facebook also has A/B testing to assess strategies for posts.
Who might be needed on the research team? A qualitative methodologist for content analysis and a biostatistician for data mining and network analysis, says Kwan. Additional team members would include social media engagement experts, and patient and community stakeholder representatives.
When the Symplur tool was used for network analysis of tweets referencing #BTSM over six years, Kwan says, researchers could see who was tweeting, how they were connected, and the major influencers (e.g., MD Anderson, National Brain Tumor Society, and the NCI Neuro-Oncology Branch at the National Institutes of Health). Knowing the influencers is useful when trying to share content and disseminate what has been learned.
Concluding advice of the speakers about social media data mining and analysis were to consider the audience, platform, research type, needed partners, and potential availability of data types. Social media users can opt to give researchers access to their account to collect trace data on people in their network, Reno notes.