Statistical Analysis and Application of Quasi Experiments to Antimicrobial Resistance Intervention Studies

Reprints or correspondence: Dr. Eli N. Perencevich, Dept. of Epidemiology and Preventive Medicine, University of Maryland School of Medicine, VA Maryland Health Care System, 100 N. Greene St., Lower Level, Baltimore, MD 21201 (eperence@epi.umaryland.edu).

Search for other works by this author on:

Clinical Infectious Diseases, Volume 45, Issue 7, 1 October 2007, Pages 901–907, https://doi.org/10.1086/521255

01 October 2007 16 February 2007 18 June 2007 01 October 2007

Cite

George M. Eliopoulos, Michelle Shardell, Anthony D. Harris, Samer S. El-Kamary, Jon P. Furuno, Ram R. Miller, Eli N. Perencevich, Statistical Analysis and Application of Quasi Experiments to Antimicrobial Resistance Intervention Studies, Clinical Infectious Diseases, Volume 45, Issue 7, 1 October 2007, Pages 901–907, https://doi.org/10.1086/521255

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

Quasi-experimental study designs are frequently used to assess interventions that aim to limit the emergence of antimicrobial-resistant pathogens. However, previous studies using these designs have often used suboptimal statistical methods, which may result in researchers making spurious conclusions. Methods used to analyze quasi-experimental data include 2-group tests, regression analysis, and time-series analysis, and they all have specific assumptions, data requirements, strengths, and limitations. An example of a hospital-based intervention to reduce methicillin-resistant Staphylococcus aureus infection rates and reduce overall length of stay is used to explore these methods.

Choosing the appropriate study design is critical when performing antimicrobial resistance intervention studies. When randomized studies in single hospitals or multihospital cluster-randomized trials are infeasible, investigators often choose before-and-after quasi-experimental designs [ 1, 2]. Quasi-experimental studies can assess interventions applied at the hospital or unit level (e.g., hygiene education program in the medical intensive care unit [MICU] [ 3]) or individual level (e.g., methicillin-resistant Staphylococcus aureus [MRSA] decolonization programs [ 4]), in which data are collected at equally spaced time intervals (e.g., monthly) before and after the intervention.

Nonrandomization and the resulting data structure of quasi experiments impart several methodological challenges for analysis. First, common statistical methods, including 2-group Student's t tests and linear regression, were developed to analyze independent, individual-level observations, whereas quasi-experimental data are typically correlated unit-level observations; for example, MRSA counts (defined as the number of MRSA infections at multiple time intervals) collected 1 month apart are likely more similar than MRSA counts collected 2 months apart. Second, nonrandom assignment of the intervention often necessitates analytical control for potential confounders.

Unfortunately, application of statistical techniques to quasi experiments is rarely described in introductory biostatistics texts and courses. We aim to provide a resource for bridging the gap between clinician researchers and biostatisticians by introducing clinicians to statistical analysis of quasi experiments while guiding biostatisticians regarding design-related challenges of intervention studies for controlling antimicrobial resistance, thereby improving conduct and reporting of these studies, as recently outlined [ 5, 6]. Strength of evidence from quasi-experimental data depends on the study design [ 1, 2, 7]. Studies with a concurrent nonequivalent control group provide stronger evidence about effectiveness of an intervention than do studies without a control group. Also, studies with several preintervention observations provide stronger evidence than do studies with few or no preintervention observations. As discussed below, the internal validity of quasi experiments is partially related to study design elements that affect researchers' ability to control for correlation, confounding, and time trends. Thus, before a study is initiated, hypotheses should be clearly stated, and design and analysis plans should be carefully developed.

We discuss several statistical techniques using the following example (motivated by a study by Pittet et al. [ 3]). After several months of abnormally high MRSA infection rates in the MICU, a hospital epidemiologist launches an education-based intervention to increase compliance with hand-disinfection procedures. The epidemiologist aims to compare rates of positivity for MRSA in clinical cultures before and after implementing the intervention. A secondary aim is to assess whether the intervention decreases overall length of stay (LOS) in the MICU. For both aims, data from 36 months before the intervention (2003–2005) are compared with data from 12 months after the intervention (2006). For ease of explanation, we first describe statistical methods for this example without a control group. We then discuss adaptations of methods for studies with a nonequivalent control group.

We discuss 2-group tests (e.g., Student's t test and χ 2 test), regression analysis (including segmented models), and time-series analysis in application to quasi-experimental studies of interventions to control antibiotic-resistant bacterial pathogens. We use simulated data for illustration and review data requirements, software, strengths, and limitations for each statistical method (tables 1 and 2). Persons seeking additional resources on statistics or quasi experiments are urged to consult a statistics primer [ 8] and literature regarding quasi-experimental studies, respectively [ 1, 2, 7].

Statistical method and software commands by outcome type.

Characteristics of each statistical method.

Two-Group Tests

Two-group (i.e., bivariate) tests make crude comparisons (i.e., unadjusted for confounders) of MRSA infection rates and mean LOS in pre- and postintervention periods. We specifically discuss Student's t tests for continuous outcomes (e.g., LOS) and 2-rate χ 2 tests for count outcomes (e.g., number of MRSA infections).

Continuous outcomes. For continuous outcomes, 2 mean values are compared using Student's t test. In our example, we test the equality of the mean LOS before and after the education-based hand disinfection intervention. When data from several preintervention and postintervention periods are collected, as in interrupted time-series study designs [ 1, 2, 7], data from multiple periods before and after implementation of the intervention are pooled to produce 2 grand mean values. For example, 2300 patients per year (6900 total) with a mean LOS of 3.0 days during 2003–2005 (preintervention period) and 2800 patients with a mean LOS of 2.5 days in 2006 (postintervention period) can be compared. However, Student's t tests are sensitive to outlying values. If some patients have atypically long LOS, the median value is the preferred measurement of central tendency. Transformation (e.g., natural logarithm) of individual patients' LOS or a nonparametric test to compare median values (e.g., Wilcoxon rank-sum test) can be used.

Count outcomes. Crude comparisons can be made for count outcomes (e.g., number of MRSA infections) by performing a 2-rate χ 2 test. In our example, because the number of hospital admissions varies over time, comparing numbers of pre- and postintervention MRSA infections may produce invalid results. Summarizing data as a proportion, with the number of MRSA infections divided by the number of hospital admissions (e.g., 150 infections/6900 hospital admissions [2.2%], compared with 40 infections/2800 hospital admissions [1.4%]; P = .009), is appropriate if all patients are observed for the same duration of follow-up, when the proportion is interpreted as risk of infection for that particular follow-up period (e.g., 3-day risk of MRSA infection). However, observation of patients in infection-control studies is typically limited to hospital stays that vary in duration. The 2-rate χ 2 test accommodates this difference by comparing rates (number of infections per unit of person-time) between pre- and postintervention periods [ 6]. Given 150 and 40 infections before and after the intervention, respectively, if 6700 preintervention person-days per year (20,100 total) and 6600 postintervention person-days are observed, then the rates are 7.5 and 6.1 infections per 1000 person-days before and after the intervention, respectively (P = .21). Thus, correcting for person-time using rates may produce conclusions different from those using proportions.

The 2-rate χ 2 test assumes that infection counts follow a Poisson distribution [ 9, 10–11]. The Poisson assumption implies that the mean infection count per person-time equals the variance in the infection count for that person-time. If this assumption is violated, then incorrect SE estimates are calculated, resulting in incorrect confidence intervals and P values.

In interrupted time-series study designs, rates are collected at several periods, allowing the variance of infection counts per person per unit of time to be empirically estimated and compared with the mean value. If the “mean equals variance” assumption is not valid, a test using “robust” SEs on the basis of empirically estimated variances is recommended [ 12, 13]. Consider 12 months of data on MRSA infection rates with a mean rate of 2.8 cases per 1000 person-days and a variance of 2.2. Thus, the Poisson assumption appears valid. In contrast, consider MRSA infection rates with a mean rate of 4.4 cases per 1000 person-days and a variance of 6.6. This latter example is typical such that satisfying the Poisson assumption is rare in practical applications. Therefore, researchers should perform both 2-rate χ 2 tests (with and without robust SEs) to evaluate whether confidence intervals and P values vary across assumptions. If conclusions differ between the 2 methods, test results using the more conservative robust SEs should be reported.

Strengths and limitations. Strengths of 2-group tests include simplicity, interpretability of results, and minimal data requirements (2 observation periods) ( table 2). These tests can accommodate >2 groups (e.g., before intervention, after intervention, and after intervention plus change in antimicrobial prescribing), using analysis of variance for continuous outcomes and χ 2 tests for count outcomes.

Two-group tests are limited by several assumptions. One assumption, independence between patients admitted to the hospital in the same period, is implausible because infectious organisms are transmissible. Independence of observations between periods is also implausible, because patients admitted to the hospital in different months may be exposed to constant antibiotic prescribing patterns. Also, without multiple levels of stratification, the ability to adjust for potential confounders (e.g., differences in severity of illness) is limited. Last, 2-group tests can detect changes in outcome levels but not changes in trends (e.g., monthly increases or decreases in the MRSA infection rate). If we use the 2-rate χ 2 test with data in figure 1, the MRSA infection rates for 36 months before and 12 months after an intervention are 6.8 and 6.6 cases per 1000 person-days, respectively (P = .87). However, figure 1 shows rates increasing by 0.25 cases per 1000 person-days per month until implementation of the intervention, then decreasing by 0.75 cases per 1000 person-days per month. By pooling counts into single pre- and postintervention rates, the 2-rate χ 2 test cannot detect this change in slope or trend, incorrectly finding no evidence of effectiveness of the intervention. To detect changes in slopes, a different statistical method, such as segmented regression, is needed.

Changes in rate of infection with methicillin-resistant Staphylococcus aureus (MRSA) over time before and after an intervention implemented at month 36, showing a change in slope that would not be detected by 2-group tests. Preintervention and postintervention rates are 6.8 and 6.6 infections per 1000 person-days, respectively (P = .87, by 2-rate χ 2 test). Preintervention and postintervention slopes are 0.25 and -0.75 infections per 1000 person-days per month, respectively.

Regression Analysis

Regression analysis quantifies the relationship between an outcome (e.g., LOS or MRSA infection) and an intervention, allowing for statistical control of known confounders. Linear regression is used for continuous normally distributed outcomes (e.g., average monthly LOS or log-transformed individual LOS). Other outcome types, including MRSA counts, require analysis using generalized linear models [ 14]. In our example, MRSA infections are considered as MRSA counts per time period with an assumed Poisson distribution; thus, the appropriate method is Poisson regression.

Unlike in statistical literature, in clinical literature, “segmented regression” means regression analysis in which changes in mean outcome levels and trends before and after an intervention are estimated [ 15]. If changes in slopes are not estimated (e.g., nonsegmented regression model is fit), then estimates of the slopes may be biased, and changes in time trends attributable to the intervention would be undetected. Segmented regression models can be fit to estimate changes in levels and trends. In our example below, we estimate pre- and postintervention changes in LOS and MRSA levels and trends.

Continuous outcomes. Although individual LOS is usually skewed, mean monthly LOS is approximately normally distributed for large sample sizes (i.e., >30 patients per month). If LOS increases over time secondary to a steady increase in MRSA infection rates, regression analysis can model this pattern and estimate the effect of an intervention controlling for potential confounders (e.g., age and reasons for hospitalization). Given intervention status and potential confounders, the outcome variable (in this case, LOS) must satisfy the assumption of having constant variance.

Using the same data, we estimate changes in mean LOS, controlling for trends, using 2 different models ( figure 2). Figure 2A shows the results of nonsegmented linear regression, which cannot assess a change in time trend (i.e., slope). Figure 2B shows the results of segmented linear regression, which allows the slopes to differ before and after the intervention. Compared with the model in figure 2A, the estimated time trend using segmented linear regression in figure 2B is flatter after the intervention. Forcing equal slopes before and after the intervention when they are unequal can lead to spurious conclusions about an intervention's effectiveness.

Interrupted time-series data regarding length of hospital stay (LOS) simulated from a segmented linear regression model with a change in slope (before vs. after the intervention), fit with a nonsegmented linear regression model that cannot estimate a change in slope (A) and a segmented linear regression model that can estimate a change in slope (B). The intervention was implemented at month 36.

Count outcomes. Poisson regression is preferred over linear regression for estimating the association between the intervention and monthly MRSA infection rates, controlling for time trend, because counts are not normally distributed ( figure 3). Differences estimated from this model are summarized as incident rate ratios of MRSA infections.

Figure 3. Interrupted time-series methicillin-resistant Staphylococcus aureus (MRSA) infection data simulated from a segmented Poisson regression model with a change in slope (before vs. after the intervention), fit with a nonsegmented Poisson regression model that cannot estimate a change in slope (A) and a segmented Poisson regression model that can estimate a change in slope (B). The intervention was implemented at month 36.

Using the same data, we estimate changes in MRSA infection rates, controlling for trends, using 2 models ( figure 3). Figure 3A shows the results of nonsegmented Poisson regression, which precludes estimation of changes in time trend (i.e., slope), whereas figure 3B shows the results of segmented Poisson regression, which allows different slopes before and after the intervention.

SE estimates of Poisson regression models are constrained by the “mean equals variance” assumption. This assumption is relaxed by fitting an overdispersed Poisson regression model [ 14, 16]. Allowing overdispersion can affect SE estimates if the Poisson assumption is false without changing estimated regression parameters, producing more valid inferences. Poisson regression and overdispersed Poisson regression result in equal incident rate ratio estimates but different confidence intervals.

Strengths and limitations. Regression allows estimation of associations between the intervention and outcome while controlling for potential confounders, which is particularly important in nonrandomized quasi-experimental studies ( table 2). Segmented regression models estimate changes in mean outcome levels (i.e., intercepts) and trends (i.e., slopes), unlike standard regression models. However, some limitations previously discussed with 2-group tests remain. Specifically, independence between individuals and time periods is assumed. Additionally, regression analysis, in contrast to 2-group tests, requires data from multiple pre- and postintervention time intervals to estimate the slope. General guidelines suggest the use of at least 10 observations per model parameter to avoid overfitting [ 17]. The models in figures 2B and 3B contained 5 parameters; thus, they should be used only for studies with at least 50 total observations (in our example, months). For intervention studies, data from at least 10 observations before and after the intervention should be used. However, using at least 24 observations (in our example, 12 months before and after the intervention) would capture potential seasonal changes. Data from shorter intervals can be used (e.g., biweekly); however, choice of time interval is a compromise between maximizing the number of observations and maintaining sufficient data within each interval to provide interpretable summary measures [ 15, 18]. In SAS, the command PROC GENMOD can estimate Poisson and linear regression models ( table 1) [ 19].

Time-Series Analysis

Time-series analysis consists of advanced statistical techniques that require understanding of regression and correlation. Whereas “interrupted time-series design” refers to studies consisting of equally spaced pre- and postintervention observations, “time-series analysis” refers to statistical methods for analyzing time-series design data. Two-group tests and regression analysis assume that monthly LOS and MRSA infection rates are independent over time. In contrast, time-series analysis estimates regression models while relaxing the independence assumption by estimating the autocorrelation between observations collected at different times (e.g., MRSA infection counts among MICU patients across different periods). To estimate autocorrelation, a correlation model is specified along with the regression model, resulting in more accurate SE estimates and improved statistical inference.

Continuous outcomes. Time-series analysis accommodates the previously discussed regression models; however, the challenge is how to correctly model correlation. In linear regression, monthly LOS measurements are assumed to be independent. However, autocorrelation may take one of several forms. For example, if correlation between 2 observations gradually decreases as time between them increases (e.g., correlation between months 1 and 2 is 0.5, correlation between months 1 and 3 is 0.25, and correlation between months 1 and 4 is 0.12), autocorrelation is likely autoregressive. However, if autocorrelation between 2 observations is initially strong but abruptly decreases to ∼0 (e.g., correlation between months 1 and 2 is 0.5 and correlation between months 1 and 3 is 0.05), a moving-average model is more appropriate. Occasionally, autocorrelation is strong for observations close in time and then sharply decreases to a nonzero level after some time threshold. In this case, autoregressive or moving-average models would be inadequate, and autoregressive moving-average (ARMA) models should be used. When correlation between observations does not decrease with duration of time, autoregressive, integrated, moving-average (ARIMA) models may be appropriate. In SAS, PROC AUTOREG estimates autoregressive models, and PROC ARIMA estimates autoregressive, moving-average, ARMA, and ARIMA models.

Count outcomes. Although most time-series software assume that outcomes are normally distributed, methods for Poisson counts are available [ 20, 21, 22–23]. One approach is to transform counts into monthly rates and use time-series methods for normal data (rates are approximately normally distributed if they are based on large numbers). In addition, Autoregressive [ 22, 23], moving-average [ 21], and ARMA [ 20] models have been extended for generalized linear models (including Poisson models), called generalized ARMA models. The “garma” command in the R software library VGAM estimates generalized ARMA models [ 24].

Strengths and limitations. Time-series methods estimate dependence (i.e., correlation) between observations over time, lessening a common threat to valid inferences. They also accommodate segmented models. Thus, time-series methods generalize regression by relaxing the assumption of independent observations. However, the large data requirements often preclude its use. A general guideline is having ∼50 time points (e.g., 3 years of monthly preintervention data and 1 year of monthly postintervention data) to estimate complex correlation structures [ 25]. If fewer observations are available, only simple correlation structures can be reliably estimated [ 15].

Another limitation of time-series analysis is difficulty in building and interpreting correlation models. Several technical resources are available to guide analysts [ 26, 27–28]. Review articles [ 25, 29, 30] and biomedical examples are also available [ 18, 31, 32]. Bootstrapping circumvents the problem of specifying and estimating an autocorrelation model. Bootstrap SEs can be calculated by estimating regression parameters assuming independence (i.e., linear or Poisson regression). Resulting SEs account for autocorrelation by sampling the data multiple (e.g., 1000 times) with replacement and estimating the parameters with each sample [ 33]. Thus, the bootstrap with regression is an alternative to time-series analysis when too few time intervals are observed.

Adding A Control Group

Each method can easily accommodate comparison with a nonequivalent control group, a preferred epidemiological quasi-experimental design, because regression to the mean and maturation effects are common threats in these studies [ 1, 7]. In our example, the intervention could be implemented in the MICU, and the nonequivalent control group could be the surgical intensive care unit. A 2-group t test would then compare changes in the mean LOS in the MICU and surgical intensive care unit (mean LOS after the intervention minus mean LOS before the intervention). Regression analysis (e.g., linear and Poisson) controlling for confounding variables can be performed by fitting separate trends for the MICU and surgical intensive care unit and comparing differences in changes in levels (i.e., intercepts) and trends (i.e., slopes) between the 2 units ( figure 4). In our example, the MRSA infection rate in the MICU decreases by 0.8 cases per 1000 person-days immediately on implementation of the intervention, suggesting a large impact of the intervention. However, the MRSA infection rate in the surgical intensive care unit decreases by 0.6 cases per 1000 person-days, suggesting that the decrease in the MRSA infection rate is partially attributable to nonintervention factors, which could not have been identified without a control group. Hence, including a control group is recommended to identify the true impact of an intervention.

Segmented Poisson regression analysis of interrupted time-series methicillin-resistant Staphylococcus aureus (MRSA) infection data, comparing infection rates in the medical intensive care unit (MICU; intervention group) and surgical intensive care unit (SICU; control group) before and after the intervention (implemented at month 36). The reduction of 0.6 infections per 1000 person-days in the SICU suggests that the reduction of 0.8 infections per 1000 person-days in the MICU was not solely due to the intervention.

Discussion

In summary, 2-group tests, regression analysis, and time-series analysis can accommodate interrupted time-series quasi-experimental data. However, statistical validity depends on using appropriate methods for the study question, meeting data requirements, and verifying modeling assumptions. This last step requires premodeling exploratory data analysis and postmodeling diagnostics not addressed here [ 14, 17, 26, 27].

Obtaining high-quality results depends on performing a well-designed study, because statistics cannot correct for a poor initial design [ 1, 7, 34], nor can they compensate for poor reporting of methods [ 5, 6]. Results from analyses can only provide valid inference on the level of intervention. We provide guidelines of minimal data requirements for using each statistical method ( table 2). However, larger sample sizes may be needed to obtain a desired precision for estimating measures of association (e.g., mean difference or rate ratio) or power for statistical tests. A simulation study can determine required sample size using model-generated data analyzed with an appropriate method [ 35]. Investigators are encouraged to report sample size calculations in addition to statistical analysis methods [ 5, 6]. Analyzing quasi-experimental data is challenging; therefore, we recommend collaboration between investigators, epidemiologists, and statisticians.

Acknowledgments

Financial support. National Institute of Health (grants R37 AG09901, 1 R01 AI6085901A1, and P30 AG028747-01 to M.S.; P60 AG12583 to R.R.M.; and institutional grant 1K12RR023250-01 to J.P.F.), Centers for Disease Control and Prevention (grant 1 R01 CI000369-01 to A.D.H. and E.N.P.), and Department of Veterans Affairs Health Services Research and Development Service (grants IIR 04-123-2 and Level 2 Advanced Career Development Award to E.N.P.).

Potential conflicts of interest. All authors: no conflicts.