Skip to contents

Simulate surrogate response values for cumulative link regression models using the latent method described in Liu and Zhang (2017).

Usage

surrogate(
  object,
  method = c("latent", "uniform"),
  jitter.uniform.scale = c("probability", "response"),
  nsim = 1L,
  ...
)

Arguments

object

An object of class clm, glm lrm, orm, polr, or vglm.

method

Character string specifying which method to use to generate the surrogate response values. Current options are "latent" and "uniform". Default is "latent".

jitter.uniform.scale

Character string specifying the scale on which to perform the jittering whenever method = "uniform". Current options are "response" and "probability". Default is "response".

nsim

Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples.

...

Additional optional arguments. (Currently ignored.)

Value

A numeric vector of class c("numeric", "surrogate") containing the simulated surrogate response values. Additionally, if nsim > 1, then the result will contain the attributes:

boot_reps

A matrix with nsim columns, one for each bootstrap replicate of the surrogate values. Note, these are random and do not correspond to the original ordering of the data;

boot_id

A matrix with nsim columns. Each column contains the observation number each surrogate value corresponds to in boot_reps. (This is used for plotting purposes.)

Note

Surrogate response values require sampling from a continuous distribution; consequently, the result will be different with every call to surrogate. The internal functions used for sampling from truncated distributions are based on modified versions of truncdist:rtrunc and truncdist:qtrunc.

For "glm" objects, only the binomial() family is supported.

References

Liu, D., Li, S., Yu, Y., & Moustaki, I. (2020). Assessing partial association between ordinal variables: quantification, visualization, and hypothesis testing. Journal of the American Statistical Association, 1-14. doi:10.1080/01621459.2020.1796394

Liu, D., & Zhang, H. (2018). Residuals and diagnostics for ordinal regression models: A surrogate approach. Journal of the American Statistical Association, 113(522), 845-854. doi:10.1080/01621459.2017.1292915

Nadarajah, S., & Kotz, S. (2006). R Programs for Truncated Distributions. Journal of Statistical Software, 16(Code Snippet 2), 1 - 8. doi:10.18637/jss.v016.c02

Examples

# Generate data from a quadratic probit model
set.seed(101)
n <- 2000
x <- runif(n, min = -3, max = 6)
z <- 10 + 3*x - 1*x^2 + rnorm(n)
y <- ifelse(z <= 0, yes = 0, no = 1)

# Scatterplot matrix
pairs(~ x + y + z)


# Setup for side-by-side plots
par(mfrow = c(1, 2))

# Misspecified mean structure
fm1 <- glm(y ~ x, family = binomial(link = "probit"))
s1 <- surrogate(fm1)
scatter.smooth(x, s1 - fm1$linear.predictors,
               main = "Misspecified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")

# Correctly specified mean structure
fm2 <- glm(y ~ x + I(x ^ 2), family = binomial(link = "probit"))
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
s2 <- surrogate(fm2)
scatter.smooth(x, s2 - fm2$linear.predictors,
               main = "Correctly specified model",
               ylab = "Surrogate residual",
               lpars = list(lwd = 3, col = "red2"))
abline(h = 0, lty = 2, col = "blue2")



dev.off() # reset to defaults once finish
#> null device 
#>           1