Bayesian Hierarchical Poisson Models for Multiple Grouped Outcomes and Clustering with Applications to Observational Health Data

Raymond Carragher; Tanja Mueller; Marion Bennie; Chris Robertson

doi:10.5334/jors.356

(1) Overview

Introduction

Sources of information beyond clinical trial data and spontaneous adverse event reporting databases are becoming increasingly available to healthcare researchers, with national healthcare providers recording information about patient demographics, prescriptions, treatments, hospitalizations, and other outcomes of different treatment pathways. These records provide a source of information regarding the relative performance of different treatments in the general population beyond that offered by trials, with data continuing to accumulate as new patients interact with health bodies.

The analysis of observational health data provides a number of logistical and statistical challenges. Patient confidentiality demands that the data be stored in a secure location, with limited secure access for relevant personnel, and limited verified software availability (essentially a closed system). Observational data is rarely free from bias [4] and, in particular, it cannot be assumed that treatment allocation is random meaning that care must be taken when drawing inferences from the data [16]. Methods for analysing this type of data typically take a causal approach [4] with clinical outcomes often modelled individually, excluding the possibility of using relationships that may exist between the outcomes in the analysis. One result of this is that there may be no standard way of assessing the impacts of the outcomes together in order to make a decision about which treatments are best for patients.

The approach implemented in the bhpm software extends methods designed for clinical trials to an observational setting [17, 18, 19], with lack of balance in the data accounted for by a clustered or stratified approach. The model is a conditional Poisson model where the treatment effects are defined as increases in risk relative to a baseline treatment. The related outcomes are modelled following a hierarchical structure with each outcome belonging to a particular grouping of related outcomes. The rationale and theoretical background for the software is described in [2], where a study on the use of direct oral anticoagulants (DOACs) among atrial fibrillation patients in Scotland [20] is used to motivate the development of a Bayesian hierarchical model. The model was applied to the DOAC study data, and a simulation study was used to assess its performance in terms of clinical outcome detection and error rates. The approach can be considered complementary to a targeted univariate approach and may be used to determine which clusters of patients are more likely to suffer outcomes, while adjusting for multiple comparisons. Other recent methods addressing these types of issues have been proposed, for example, the Bayesian self-controlled cases series model (BSCCS) [5] which uses a hierarchical regression model for health insurance claims data, and a Bayesian approach of Crooks et al. which uses information sharing rather than groupings [6]. These provide differing approaches to analysing healthcare data than that implemented in bhpm. An advantage of the bhpm approach is that it allows the analysis of large amounts of summary patient data to be scanned for unexpected outcomes. The bhpm software has been deployed to the Scottish National Safe Haven [7] and is being used for an ongoing research study of outcomes for direct oral anticoagulants (Dabigatran©, Rivaroxaban©, and Apixaban©) [2, 8, 13]. This study has found that Myocardial Infarction had an increased rate under Apixaban© compared to Rivaroxaban©, that Other bleed had decreased rates for both Apixaban© and Dabigatran© compared to Rivaroxaban, and that Pulmonary embolism had a decreased rate under Dabigatran© compared to Rivaroxaban© [2, 13]. A more detailed discussion of these results can be found in [2].

Model

Related outcomes are modelled using a hierarchical structure. Each outcome belongs to a grouping of related outcomes. If there are C treatments, H strata, B groupings of outcomes with k_b outcomes in group b, then the number of outcomes, $X_{b j, h}^{(c)}$ , for the jth outcome in the bth group for treatment c in stratum h is modelled by the following:

X_{b j, h}^{(c)} ~ P o i s s o n (λ_{b j, h}^{(c)}, T_{b j, h}^{(c)})

T_{b j, h}^{(c)} = \underset{i}{Σ} t_{i h}

log (λ_{b j, h}^{(c)}) = γ_{b j, h} + x_{(c)} θ_{b j, h}

where $h = 1, \dots, H, b = 1, \dots B, j = 1, \dots k b, c = 1, \dots, C . x (c)$ is an indicator variable for the treatment, $T_{b j, h}^{(c)}$ is the total time spent under treatment c for all subjects in stratum h, $λ_{b j, h}^{(c)}$ is the corresponding underlying rate parameter, γbj,h is the log rate of the baseline treatment, and θbj,h is the increase or decrease in risk relative to the baseline treatment.

This is a three-level hierarchical model. The priors for the model are as follows:

γ_{b j, h} ~ N (μ_{γ b}, σ_{γ b}^{2}), θ_{b j, h}^{(c)} ~ π_{b}^{(c)} I_{θ_{b j, h}^{(c)} = 0]} + (1 - π_{b}^{(c)}) N (μ_{θ b}^{(c)}, {(σ_{θ b}^{(c)})}^{2})

μ_{γ b} ~ N (μ_{γ 0}, τ_{γ 0}^{2}), μ_{θ b}^{(c)} ~ N (μ_{θ 0}^{(c)}, {(τ_{θ 0}^{(c)})}^{2})

σ_{γ b}^{2} ~ I G (α_{γ}, β_{γ}), {(σ_{θ b}^{(c)})}^{2} ~ I G (α_{θ}, β_{θ})

π_{b}^{(c)} ~ B e t a (α_{π}^{(c)}, β_{π}^{(c)})

μ_{γ 0} ~ N (μ_{γ 00}, τ_{γ 00}^{2}), μ_{θ 0}^{(c)} ~ N (μ_{θ 00}, τ_{θ 00}^{2})

τ_{γ 0}^{2} ~ I G (α_{γ 00}, β_{γ 00}), {(τ_{θ 0}^{(c)})}^{2} ~ I G (α_{θ 00}, β_{θ 00})

α_{π}^{(c)} ~ M (λ_{α}) I (α_{π}^{(c)} > 1), β_{π}^{(c)} ~ M (λ_{β}) I (β_{π}^{(c)} > 1)

where I(.) is the indicator function, IG(.,.) is the inverse gamma distribution, M(.) is the exponential distribution, and Beta(.,.) is the beta distribution.

The hyperparameters for the model are:

μ_{γ 00} = 0, τ_{γ 00}^{2} = 10, α_{γ} = 3, β_{γ} = 1, α_{γ 00} = 3, β_{γ 00} - 1, λ_{α} = 1

μ_{θ 00} = 0, τ_{θ 00}^{2} = 10, α_{θ} = 3, β_{θ} = 1, α_{θ 00} = 3, β_{θ 00} - 1, λ_{β} = 1

The hierarchy reflects the assumed structure in the data, and the choice of priors contains several conjugate pairs which makes the model more computationally tractable. The occurrence of outcomes in healthcare data can vary widely and the hierarchical structure allows the model to capture this variability [2].

In order to assess the association of an outcome with a treatment the posterior distribution of the model parameter theta (θ), the change in log rate of a treatment compared to a chosen baseline, is used. A posterior probability of an increase or decrease in θ exceeding some threshold value may be used to indicate differences in treatment rates.

The directed acyclic graph for the full model is shown in Figure 1. The model structure and parameters are fully described in [2]. The model fitting functions are shown in Table 1 and an example is given below.

The directed acyclic graph for the model.

Table 1

Model Fitting functions.

FUNCTION	DESCRIPTION
bhpm.pm	Bayesian hierarchical model with point-mass mixture for change in log odds of outcome rate.
bhpm.npm	Bayesian hierarchical model with for change in log odds of outcome rate (no mixture prior).

Example

This example shows the main steps in running the model and assessing which, if any, outcomes are associated with treatment. The data should be in a file or data frame with columns as indicated in Table 2. In this example there are three treatments (Trt.Grp): 1, 2, 3, and 4; with 1 taken as the baseline treatment for comparison purposes. A greater than 0.8 probability of an increase or decrease in log outcome rate will be used as the threshold to assess differences in treatment rate.

Table 2

Data format.

COLUMN	DESCRIPTION
Cluster	Patient clusters
Trt.Grp	Treatment identifier (integer)
Outcome	The clinical outcome of interest
Outcome.Grp	A group of related outcomes.
Exposure	Total time under treatment of all patients in Cluster
Counts	Number of Outcomes which have occurred under treatment Exposure

The steps to perform the analysis are as follows. Load the software and data to be analysed.

Fit the model:

Once the model has been fitted, a convergence check should be performed. The Gelman-Rubin statistic from the coda package may be used to give an overview of convergence [9]. These statistics as well as the range of acceptance rates for parameters fitted using the Metropolis-Hastings algorithm may be printed or examined. Values close to 1.0 for the Gelman-Rubin statistics are indicative of convergence:

As the sampled values are available from the model fit, alternative or more informative convergence diagnostics may be applied. For example, the rstan package provides more recent convergence and efficiency diagnostics for assessing model convergence [21]. The monitor function in particular reports the effective sample size and the $\hat{R}$ statistic of the samples being monitored [21, 22, 23]. An $\hat{R}$ value of less than 1.05 is recommended when assessing convergence:

Visual inspection of trace and rank plots is recommended, although for models with large numbers of parameters it is generally not feasible to assess all the parameters this way. These plots may be created using the mcmc_trace function from the bayesplot package [24] (Figure 2):

The implementation of the models uses a default set of global simulation parameters for fitting Metropolis-Hastings and Slice sampling steps. Lack of convergence in the model may be addressed by adjusting these simulation parameters, either at a global level, or for individual model parameters. The functions bhpm.global.sim.param.defaults, and bhpm.sim.control.params may be used to generate a data frame containing the default values. These may be updated as required and passed to bhpm.pm or bhpm.npm to adjust the model fit:

The posterior probabilities of increases or decreases in log rate of the outcomes compared to the baseline are assessed by using the θ model parameters. These posterior probabilities are found as follows:

Using a probability threshold of 0.8, outcomes considered possibly associated with treatment difference are:

For Trt.Grp 2, G00-99_Outcome1 has an increase in occurrence rate compared to Trt.Grp 1 for clusters M/0-64, M/65-84, and M/85+. For Trt.Grp 2, G00_00_Outcome2 has an increased rate compared to Trt.Grp 1 for M/0-64 only. No other outcome differences between Trt.Grp 2 and Trt.Grp 3 and Trt.Grp 1 are indicated at the 0.8 level.

Implementation and architecture

The models are fitted using a Gibbs sampling Markov chain Monte-Carlo method [10]. The joint distributions of the models’ parameters are complicated and difficult to sample directly. However, under assumptions of conditional independence for a model, the Gibbs approach generates samples from the complete conditional distributions of the individual parameters, generating a Markov chain whose stationary distribution is the model’s joint distribution. This approach is particularly useful for fitting hierarchical models such as those considered here. While additional approaches, such as Hamiltonian Monte Carlo methods [15], are possible and will be considered for implementation, dependent on future requirements, the Gibbs sampler has proven suitable over a wide range of data simulations. The joint posterior distributions and complete conditionals for the Bayesian models were derived from the model definitions [2]. Standard distributions are sampled from the corresponding R implementations, with non-standard distributions being sampled using either Metropolis-Hastings (MH) or Slice sampling [10]. Default sampling parameters, for example step-sizes and initial values, are supplied or generated by the package, but these may be overridden by the user if required. A number of helper functions exist to generate these values in the correct format for the models. A complete description of the models is given in [2] and the documentation contains a complete description of all the package functions.

The Bayesian methods are implemented in C++ for efficiency, with the model hierarchy (Figure 1) reflected in the C++ class structure. Data (as described in Table 2) is passed to a model fitting function (Table 1), either as a file or as a data frame. The R code routes the data to the appropriate C++ model fitting function, which performs the sampling. R functions for generating summary statistics, Highest Probability Intervals, and convergence statistics are provided using the services of the coda package [9].

The software is self-contained and designed for ease of use in a closed environment. The data to be analysed may be supplied in either a data frame or a file. Examples of the format are included with the package.

Quality control

All packages uploaded to CRAN must pass a set of standardised tests before being released. The package detailed in this paper was initially released to CRAN in 2020 and has been through further release cycles. Additionally the package formed the basis for peer-reviewed results published in [2] and presented in [13]. Before release a full set of unit and functional tests are performed on the package development system, including memory checks with valgrind [11] and Google address sanitizer [12]. The package contains tests, demonstration analyses, and examples.

(2) Availability

Operating system

Any operating system that supports R.

Programming language

R version 3.0 or higher.

Additional system requirements

None.

Dependencies

R, coda.

List of contributors

Raymond Carragher

Software location

Archive (e.g. institutional repository, general repository) (required – please see instructions on journal website for depositing archive copy of software in a suitable repository)

Name: CRAN
Persistent identifier: https://CRAN.R-project.org/package=bhpm
Licence: GPL-3
Publisher: Raymond Carragher
Version published: 1.8.1
Date published: 29/09/2025

Code repository

Name: GitHub
Identifier: https://github.com/rcarragh/bhpm
Licence: GPL-3
Date published: 29/09/2025

Language

English.

(3) Reuse potential

The R package bhpm provides a self-contained set of methods for epidemiological investigators, statisticians and researchers interested in identifying treatments that have different outcome profiles for different clusters of patients. It is designed to be easy to use with a simple data input format. While the primary use case for the software is in this area, the package will be useful to any project where two or more groups of comparative data may be modelled by hierarchical Poisson models. The methods are flexible within regard to choice of clusters and hierarchies. The clusters, interventions (treatments) and relationships between the outcomes are specified by the users in the input data (single cluster, single grouping or single outcome analyses are also possible).

Currently the package is being used to assess outcomes, defined by ICD.10 codes [14], for direct oral anticoagulant treatments [2, 13]. The use of ICD.10 codes for health related outcomes provides an interface to the assessment of potentially thousands of outcomes for multiple treatments. Internally the C++ hierarchy will allow relatively quick implementation of new models through inheritance as required.

Patient cluster membership is fixed but there is the possibility of moving to a more integrated Bayesian approach, with inclusion in the model of uncertainty regarding cluster membership, outcome groupings, and treatment allocation, allowing for a more extended probabilistic framework. The authors are interested in extending the software and welcome collaborations in this area. Any support issues or questions can be addressed directly to the corresponding author: raymond.carragher@uws.ac.uk, through the associated CRAN author email address, or through the GitHub repository.

Competing Interests

The authors have no competing interests to declare.