Measurement Error and dWOLS

Overview

This app simulates multiple datasets of a user-specified sample size. Datasets simulate the case of a binary treatment decision $A \in \{0, 1\}$ which is made dependent on pre-treatment information $X$ yielding an outcome $Y$ for which larger values are preferred. All three variates may be generated with or without measurement error. The 'advanced options' give more control to these inputs, while all inputs have illustrative defaults.

For each dataset, the method of dynamic weighted ordinary least squares (dWOLS, Wallace and Moodie, 2015) is used to estimate a treatment threshold and a treatment decision rule under various modelling setups. See 'analysis', below.

Please note that not all inputs are user-specified. If you would like to specify an input that is not available, please contact me!

Inputs

Pre-treatment information $X$
Observed pre-treatment information $X^*$ is assumed to be equal to the true pre-treatment information $X$ plus some error.

Truth: $X \sim N(0, 1)$
Error-prone (Additive): $X^* = X + U_X, \text{ where } U_X \sim N(\mu_{UX}, \sigma_{UX})$
Error-prone (Multiplicative): $X^* = X \times \exp(U_X), \text{ where } U_X \sim N(\mu_{UX}, \sigma_{UX})$

The values of $\mu_{UX}$ and $\sigma_{UX}$ are, respectively, 0 and 1 by default, but may be directly specified using advanced options.

Treatment information $A$
It is assumed that the prescribed treatment is error-prone and denoted $A^*$. The relationship between true treatment and prescribed treatment is defined by positive and negative predictive values (PPV and NPV, respectively).

Prescribed treatment: $P(A^* = 1|X) = \left[1 + \exp(-(\alpha_0 + \alpha_1 X^* + \alpha_2 (X^*)^2))\right]^{-1}$
True treatment: $P(A = 1|A^* = 1) = \text{PPV}, P(A = 0|A^* = 0) = \text{NPV}$

Note that $A^*$ is assumed to depend on $X^*$, and if no measurement error is specified in $X$ then $X^* = X$. If the error in $A^*$ is set to depend on $X$, then the PPV and NPV are both scaled by a factor of $(\max(X) - X)/(\max(X) - \min(X))$.

Outcome $Y$
The outcome is a function of the true values of pre-treatment and treatment variates: $$Y = \beta_0 + \beta_1 X + \beta_2 X^2 + A(\psi_0 + \psi_1 X) + \epsilon, \text{ where } \epsilon \sim N(0, 1)$$. If $Y$ is subject to measurement error, it is generated as $$Y^* = Y + U_Y, \text{ where } U_Y \sim N(\mu_{UY}, \sigma_{UY})$$The mean and standard deviation of the measurement error are set to depend on $\text{sd}(Y)$, the sample standard deviation of $Y$ and are given by

$\mu_{UY} = (\mu_Y X + 1_{\text{Adep}}A)\text{sd}(Y)$ where $\mu_Y$ is a user-specified value equal to 1 by default if the error is set to depend on $X$, and $1_{\text{Adep}} = 1$ if the error is set to depend on $A$ and 0 otherwise.
$\sigma_{UY} = \sigma_Y\text{sd}(Y)$ where $\sigma_Y$ is a user-specified value equal to 1 by default if any measurement error is specified.

The optimal treatment decision rule is defined based on the outcome model: $A^{opt} = 1$ if $\psi_0 + \psi_1 X > 0$ and 0 otherwise.

By default, the app uses a random seed for each simulation. If desired, a seed can be set under 'advanced options'.

The trim boxplots option can be used to make plots more visually readable. This removes any results more than three standard deviations from the median estimate.

Analysis

The app uses the method of dynamic weighted ordinary least squares to estimate the model parameters and, by extension, the treatment decision rule.

The app conducts four analyses on each simulated dataset, distinguished by whether the treatment-free and/or treatment models are correctly specified.

Based on the specified inputs, the correct models are:

Treatment-free: $\beta_0 + \beta_1 X^* + \beta_2 (X^*)^2$
Treatment: $P(A^* = 1|X) = \left[1 + \exp(-(\alpha_0 + \alpha_1 X^* + \alpha_2 (X^*)^2))\right]^{-1}$

The treatment model is fit via logistic regression. Models are mis-specified through omission of the quadratic terms. Note that analysis always uses error-prone observed values, but these will equal the true values if no measurement error is chosen for that variate.

Due to the doubly-robust nature of dynamic weighted ordinary least squares, model parameters should be consistent if at least one of the treatment-free or treatment models is correctly specified. However, this is not guaranteed if variates are measured with error.

Outputs

The app summarizes outputs under three tabs: Summary, Plots, Table, and Weights.

The Summary tab provides a simplified overview of the simulation results, and is a good place to start exploring the impact of measurement error in these analyses. This tab compares the results when there is no error with the results when variates experience measurement error. After specifying the error structure (which is summarized in the tab), hitting the 'Simulate' button will return a 'treatment accuracy' statistic for both the error-free and error-prone analyses. This statistic is the percentage of individuals in the sample who would have received the correct treatment if the results of the analysis were applied. Note that when $X$ is error-prone, the treatment accuracy is based on the estimated rules being applied to $X$, not $X^*$. This can be changed under the advanced options when $X$ is measured with error.

The Table tab introduces a second statistic, the 'treatment threshold' $-\psi_0/\psi_1$. This table calculates the true value of this threshold and summarizes the median estimate of this threshold across all simulated datasets for the four modelling setups. The median treatment accuracy is also reported.

The Plot tab summarizes the threshold and treatment accuracy estimates from all simulated datasets using boxplots.

The Weights tab provides visualization of $Y$ aginast $X$ and $X^*$ along with weighting function. Weights are colour-coded based on the true treatment. The relationship between error-free weights and error-prone weights can be compared through $|A-P(A=1|X)|vs|A-P(A=1|X^*)|$ plots below.