categoricalCUSUM    package:surveillance    R Documentation(latin1)

_C_U_S_U_M _d_e_t_e_c_t_o_r _f_o_r _t_i_m_e-_v_a_r_y_i_n_g _c_a_t_e_g_o_r_i_c_a_l _t_i_m_e _s_e_r_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Function to process 'sts' object by binomial, beta-binomial or
     multinomial CUSUM. Logistic, multinomial logistic, proportional
     odds or Bradley-Terry regression models are used to specify
     in-control and out-of-control parameters.

_U_s_a_g_e:

     categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL,
                      pi1=NULL, dfun=NULL, ret=c("cases","value")),...)

_A_r_g_u_m_e_n_t_s:

  stsObj: Object of class 'sts' containing the number of counts in each
          of the k categories of the response variable. Time varying
          number of counts n_t is found in slot 'populationFrac'. 

 control: Control object containing several items

        '_r_a_n_g_e' Vector of length t_{max} with indices of the 'observed'
             slot to monitor.

        '_h' Threshold to use for the monitoring. Once the CUSUM
             statistics is larger or equal to 'h' we have an alarm.

        '_p_i_0' (k-1) times t_{max} in-control probability vector for all
             categories except the reference category.

        '_m_u_1' (k-1) times t_{max} out-of-control probability vector for
             all categories except the reference category.

        '_d_f_u_n' The probability mass function or density used to compute
             the likelihood ratios of the CUSUM. In a negative binomial
             CUSUM this is 'dnbinom', in a binomial CUSUM 'dbinom' and
             in a multinomial CUSUM 'dmultinom'. The function must be
             able to handle the arguments 'y', 'size', 'mu' and 'log'.
             As a consequence, one in the case of the beta-binomial
             distribution has to write a small wrapper function.

        '_r_e_t' Return the necessary proportion to sound an alarm in the
             slot 'upperbound' or just the value of the CUSUM
             statistic. Thus, 'ret' is one of tha values in
             'c("cases","value")'.

     ...: Additional arguments to send to 'dfun'.

_D_e_t_a_i_l_s:

     The function allows the monitoring of categorical time series as
     described by regression models for binomial, beta-binomial or
     multinomial data. The later includes e.g. multinomial logistic
     regression models, proportional odds models or Bradley-Terry
     models for paired comparisons. See the Hoehle (2010) reference for
     further details about the methodology.

     Once an alarm is found the CUSUM scheme is resetted (to zero) and
     monitoring continues from there.

_V_a_l_u_e:

     An 'sts' object with 'observed', 'alarm', etc. slots trimmed to
     the 'control$range' indices.

_A_u_t_h_o_r(_s):

     M. Hoehle

_R_e_f_e_r_e_n_c_e_s:

     Hhle, M. (2010), Changepoint detection in categorical time
     series,  Book chapter to appear in T. Kneib and G. Tutz (Eds.),
     Statistical Modelling and Regression Structures, Springer.

_S_e_e _A_l_s_o:

     'categoricalCUSUM'

_E_x_a_m_p_l_e_s:

     ###########################################################################
     #Beta-binomial CUSUM for a small example containing the time-varying
     #number of positive test out of a time-varying number of total
     #test.
     #######################################

     #Load meat inspection data
     data("abattoir")

     #Use GAMLSS to fit beta-bin regression model
     require("gamlss")
     phase1 <- 1:(2*52)
     phase2  <- (max(phase1)+1) : nrow(abattoir)

     #Fit beta-binomial model using GAMLSS
     abattoir.df <- as.data.frame(abattoir)
     colnames(abattoir.df) <- c("y","t","state","alarm","n")
     m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + 
                       + sin(2*pi/52*t) + cos(2*pi/52*t) +
                       + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1,
                       family=BB(sigma.link="log"),
                       data=abattoir.df[phase1,c("n","y","t")])

     #CUSUM parameters
     R <- 2 #detect a doubling of the odds for a test being positive
     h <- 4 #threshold of the cusum

     #Compute in-control and out of control mean
     pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response")
     pi1 <- plogis(qlogis(pi0)+log(R))
     #Create matrix with in control and out of control proportions.
     #Categories are D=1 and D=0, where the latter is the reference category
     pi0m <- rbind(pi0, 1-pi0)
     pi1m <- rbind(pi1, 1-pi1)

     ######################################################################
     # Use the multinomial surveillance function. To this end it is necessary
     # to create a new abattoir object containing counts and proportion for
     # each of the k=2 categories. For binomial data this appears a bit
     # redundant, but generalizes easier to k>2 categories.
     ######################################################################

     abattoir2 <- new("sts",epoch=1:nrow(abattoir), start=c(2006,1),freq=52,
       observed=cbind(abattoir@observed,abattoir@populationFrac -abattoir@observed),
       populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac),
       state=matrix(0,nrow=nrow(abattoir),ncol=2),
       multinomialTS=TRUE)

     ######################################################################
     #Function to use as dfun in the categoricalCUSUM
     #(just a wrapper to the dBB function). Note that from v 3.0-1 the
     #first argument of dBB changed its name from "y" to "x"!
     ######################################################################
     mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) {
       return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log))
     }

     #Create control object for multinom cusum and use the categoricalCUSUM
     #method
     control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases",
                      dfun=mydBB.cusum)
     surv <- categoricalCUSUM(abattoir2, control=control,
                              sigma=exp(m.bbin$sigma.coef))

     #Show results
     plot(surv[,1],legend.opts=NULL,dx.upperbound=0)
     lines(pi0,col="green")
     lines(pi1,col="red")

     #Index of the alarm
     which.max(alarms(surv[,1]))

     #ToDo: Compute run length using LRCUSUM.runlength

