Skip to main content
Have a personal or library account? Click to login
Bayesian Hierarchical Poisson Models for Multiple Grouped Outcomes and Clustering with Applications to Observational Health Data Cover

Bayesian Hierarchical Poisson Models for Multiple Grouped Outcomes and Clustering with Applications to Observational Health Data

Open Access
|Nov 2025

Full Article

(1) Overview

Introduction

Sources of information beyond clinical trial data and spontaneous adverse event reporting databases are becoming increasingly available to healthcare researchers, with national healthcare providers recording information about patient demographics, prescriptions, treatments, hospitalizations, and other outcomes of different treatment pathways. These records provide a source of information regarding the relative performance of different treatments in the general population beyond that offered by trials, with data continuing to accumulate as new patients interact with health bodies.

The analysis of observational health data provides a number of logistical and statistical challenges. Patient confidentiality demands that the data be stored in a secure location, with limited secure access for relevant personnel, and limited verified software availability (essentially a closed system). Observational data is rarely free from bias [4] and, in particular, it cannot be assumed that treatment allocation is random meaning that care must be taken when drawing inferences from the data [16]. Methods for analysing this type of data typically take a causal approach [4] with clinical outcomes often modelled individually, excluding the possibility of using relationships that may exist between the outcomes in the analysis. One result of this is that there may be no standard way of assessing the impacts of the outcomes together in order to make a decision about which treatments are best for patients.

The approach implemented in the bhpm software extends methods designed for clinical trials to an observational setting [17, 18, 19], with lack of balance in the data accounted for by a clustered or stratified approach. The model is a conditional Poisson model where the treatment effects are defined as increases in risk relative to a baseline treatment. The related outcomes are modelled following a hierarchical structure with each outcome belonging to a particular grouping of related outcomes. The rationale and theoretical background for the software is described in [2], where a study on the use of direct oral anticoagulants (DOACs) among atrial fibrillation patients in Scotland [20] is used to motivate the development of a Bayesian hierarchical model. The model was applied to the DOAC study data, and a simulation study was used to assess its performance in terms of clinical outcome detection and error rates. The approach can be considered complementary to a targeted univariate approach and may be used to determine which clusters of patients are more likely to suffer outcomes, while adjusting for multiple comparisons. Other recent methods addressing these types of issues have been proposed, for example, the Bayesian self-controlled cases series model (BSCCS) [5] which uses a hierarchical regression model for health insurance claims data, and a Bayesian approach of Crooks et al. which uses information sharing rather than groupings [6]. These provide differing approaches to analysing healthcare data than that implemented in bhpm. An advantage of the bhpm approach is that it allows the analysis of large amounts of summary patient data to be scanned for unexpected outcomes. The bhpm software has been deployed to the Scottish National Safe Haven [7] and is being used for an ongoing research study of outcomes for direct oral anticoagulants (Dabigatran©, Rivaroxaban©, and Apixaban©) [2, 8, 13]. This study has found that Myocardial Infarction had an increased rate under Apixaban© compared to Rivaroxaban©, that Other bleed had decreased rates for both Apixaban© and Dabigatran© compared to Rivaroxaban, and that Pulmonary embolism had a decreased rate under Dabigatran© compared to Rivaroxaban© [2, 13]. A more detailed discussion of these results can be found in [2].

Model

Related outcomes are modelled using a hierarchical structure. Each outcome belongs to a grouping of related outcomes. If there are C treatments, H strata, B groupings of outcomes with kb outcomes in group b, then the number of outcomes, Xbj,hc, for the jth outcome in the bth group for treatment c in stratum h is modelled by the following:

Xbj,hc~Poissonλbj,hc,Tbj,hc 
Tbj,h(c)=Σitih
log λbj,hc=γbj,h+xcθbj,h

where h = 1,,H, b = 1,B, j = 1,kb, c = 1,,C.x(c) is an indicator variable for the treatment, Tbj,hc is the total time spent under treatment c for all subjects in stratum h, λbj,hc is the corresponding underlying rate parameter, γbj,h is the log rate of the baseline treatment, and θbj,h is the increase or decrease in risk relative to the baseline treatment.

This is a three-level hierarchical model. The priors for the model are as follows:

γbj,h~Nμγb, σγb2,  θbj,hc~πbcIθbj,hc=0+1πbcNμθbc, σθbc2 
μγb~Nμγ0,τγ02,  μθbc~Nμθ0c,τθ0c2
σγb2~ IGαγ, βγ,  σθbc2~IGαθ, βθ
πbc~Betaαπc, βπc
μγ0~Nμγ00, τγ002,  μθ0c~Nμθ00,τθ002
τγ02~IGαγ00, βγ00,    τθ0c2  ~IGαθ00, βθ00
απc~MλαIαπc>1,  βπc~MλβIβπc>1

where I(.) is the indicator function, IG(.,.) is the inverse gamma distribution, M(.) is the exponential distribution, and Beta(.,.) is the beta distribution.

The hyperparameters for the model are:

μγ00=0, τγ002=10, αγ=3, βγ=1, αγ00=3, βγ001, λα=1
μθ00=0, τθ002=10, αθ=3, βθ=1, αθ00=3, βθ001, λβ=1

The hierarchy reflects the assumed structure in the data, and the choice of priors contains several conjugate pairs which makes the model more computationally tractable. The occurrence of outcomes in healthcare data can vary widely and the hierarchical structure allows the model to capture this variability [2].

In order to assess the association of an outcome with a treatment the posterior distribution of the model parameter theta (θ), the change in log rate of a treatment compared to a chosen baseline, is used. A posterior probability of an increase or decrease in θ exceeding some threshold value may be used to indicate differences in treatment rates.

The directed acyclic graph for the full model is shown in Figure 1. The model structure and parameters are fully described in [2]. The model fitting functions are shown in Table 1 and an example is given below.

Figure 1

The directed acyclic graph for the model.

Table 1

Model Fitting functions.

FUNCTIONDESCRIPTION
bhpm.pmBayesian hierarchical model with point-mass mixture for change in log odds of outcome rate.
bhpm.npmBayesian hierarchical model with for change in log odds of outcome rate (no mixture prior).

Example

This example shows the main steps in running the model and assessing which, if any, outcomes are associated with treatment. The data should be in a file or data frame with columns as indicated in Table 2. In this example there are three treatments (Trt.Grp): 1, 2, 3, and 4; with 1 taken as the baseline treatment for comparison purposes. A greater than 0.8 probability of an increase or decrease in log outcome rate will be used as the threshold to assess differences in treatment rate.

Table 2

Data format.

COLUMNDESCRIPTION
ClusterPatient clusters
Trt.GrpTreatment identifier (integer)
OutcomeThe clinical outcome of interest
Outcome.GrpA group of related outcomes.
ExposureTotal time under treatment of all patients in Cluster
CountsNumber of Outcomes which have occurred under treatment Exposure

The steps to perform the analysis are as follows. Load the software and data to be analysed.

jors-13-356-g3.png

Fit the model:

jors-13-356-g4.png

Once the model has been fitted, a convergence check should be performed. The Gelman-Rubin statistic from the coda package may be used to give an overview of convergence [9]. These statistics as well as the range of acceptance rates for parameters fitted using the Metropolis-Hastings algorithm may be printed or examined. Values close to 1.0 for the Gelman-Rubin statistics are indicative of convergence:

jors-13-356-g5.png

As the sampled values are available from the model fit, alternative or more informative convergence diagnostics may be applied. For example, the rstan package provides more recent convergence and efficiency diagnostics for assessing model convergence [21]. The monitor function in particular reports the effective sample size and the R^ statistic of the samples being monitored [21, 22, 23]. An R^ value of less than 1.05 is recommended when assessing convergence:

jors-13-356-g6.png

Visual inspection of trace and rank plots is recommended, although for models with large numbers of parameters it is generally not feasible to assess all the parameters this way. These plots may be created using the mcmc_trace function from the bayesplot package [24] (Figure 2):

jors-13-356-g7.png

Figure 2

Plots for the mu.theta parameters.

The implementation of the models uses a default set of global simulation parameters for fitting Metropolis-Hastings and Slice sampling steps. Lack of convergence in the model may be addressed by adjusting these simulation parameters, either at a global level, or for individual model parameters. The functions bhpm.global.sim.param.defaults, and bhpm.sim.control.params may be used to generate a data frame containing the default values. These may be updated as required and passed to bhpm.pm or bhpm.npm to adjust the model fit:

jors-13-356-g8.png

The posterior probabilities of increases or decreases in log rate of the outcomes compared to the baseline are assessed by using the θ model parameters. These posterior probabilities are found as follows:

jors-13-356-g9.png

Using a probability threshold of 0.8, outcomes considered possibly associated with treatment difference are:

jors-13-356-g10.png

For Trt.Grp 2, G00-99_Outcome1 has an increase in occurrence rate compared to Trt.Grp 1 for clusters M/0-64, M/65-84, and M/85+. For Trt.Grp 2, G00_00_Outcome2 has an increased rate compared to Trt.Grp 1 for M/0-64 only. No other outcome differences between Trt.Grp 2 and Trt.Grp 3 and Trt.Grp 1 are indicated at the 0.8 level.

Implementation and architecture

The models are fitted using a Gibbs sampling Markov chain Monte-Carlo method [10]. The joint distributions of the models’ parameters are complicated and difficult to sample directly. However, under assumptions of conditional independence for a model, the Gibbs approach generates samples from the complete conditional distributions of the individual parameters, generating a Markov chain whose stationary distribution is the model’s joint distribution. This approach is particularly useful for fitting hierarchical models such as those considered here. While additional approaches, such as Hamiltonian Monte Carlo methods [15], are possible and will be considered for implementation, dependent on future requirements, the Gibbs sampler has proven suitable over a wide range of data simulations. The joint posterior distributions and complete conditionals for the Bayesian models were derived from the model definitions [2]. Standard distributions are sampled from the corresponding R implementations, with non-standard distributions being sampled using either Metropolis-Hastings (MH) or Slice sampling [10]. Default sampling parameters, for example step-sizes and initial values, are supplied or generated by the package, but these may be overridden by the user if required. A number of helper functions exist to generate these values in the correct format for the models. A complete description of the models is given in [2] and the documentation contains a complete description of all the package functions.

The Bayesian methods are implemented in C++ for efficiency, with the model hierarchy (Figure 1) reflected in the C++ class structure. Data (as described in Table 2) is passed to a model fitting function (Table 1), either as a file or as a data frame. The R code routes the data to the appropriate C++ model fitting function, which performs the sampling. R functions for generating summary statistics, Highest Probability Intervals, and convergence statistics are provided using the services of the coda package [9].

The software is self-contained and designed for ease of use in a closed environment. The data to be analysed may be supplied in either a data frame or a file. Examples of the format are included with the package.

Quality control

All packages uploaded to CRAN must pass a set of standardised tests before being released. The package detailed in this paper was initially released to CRAN in 2020 and has been through further release cycles. Additionally the package formed the basis for peer-reviewed results published in [2] and presented in [13]. Before release a full set of unit and functional tests are performed on the package development system, including memory checks with valgrind [11] and Google address sanitizer [12]. The package contains tests, demonstration analyses, and examples.

(2) Availability

Operating system

Any operating system that supports R.

Programming language

R version 3.0 or higher.

Additional system requirements

None.

Dependencies

R, coda.

List of contributors

Raymond Carragher

Software location

Archive (e.g. institutional repository, general repository) (required – please see instructions on journal website for depositing archive copy of software in a suitable repository)

Code repository

Language

English.

(3) Reuse potential

The R package bhpm provides a self-contained set of methods for epidemiological investigators, statisticians and researchers interested in identifying treatments that have different outcome profiles for different clusters of patients. It is designed to be easy to use with a simple data input format. While the primary use case for the software is in this area, the package will be useful to any project where two or more groups of comparative data may be modelled by hierarchical Poisson models. The methods are flexible within regard to choice of clusters and hierarchies. The clusters, interventions (treatments) and relationships between the outcomes are specified by the users in the input data (single cluster, single grouping or single outcome analyses are also possible).

Currently the package is being used to assess outcomes, defined by ICD.10 codes [14], for direct oral anticoagulant treatments [2, 13]. The use of ICD.10 codes for health related outcomes provides an interface to the assessment of potentially thousands of outcomes for multiple treatments. Internally the C++ hierarchy will allow relatively quick implementation of new models through inheritance as required.

Patient cluster membership is fixed but there is the possibility of moving to a more integrated Bayesian approach, with inclusion in the model of uncertainty regarding cluster membership, outcome groupings, and treatment allocation, allowing for a more extended probabilistic framework. The authors are interested in extending the software and welcome collaborations in this area. Any support issues or questions can be addressed directly to the corresponding author: raymond.carragher@uws.ac.uk, through the associated CRAN author email address, or through the GitHub repository.

Competing Interests

The authors have no competing interests to declare.

DOI: https://doi.org/10.5334/jors.356 | Journal eISSN: 2049-9647
Language: English
Submitted on: Oct 28, 2020
Accepted on: Nov 14, 2025
Published on: Nov 24, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Raymond Carragher, Tanja Mueller, Marion Bennie, Chris Robertson, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.