The function `knockoff.filter`

is a wrapper around several simpler functions that

- Construct knockoff variables (various functions with prefix
`create`

) - Compute the test statistic \(W\) (various functions with prefix
`stat`

) - Compute the threshold for variable selection (
`knockoff.threshold`

)

These functions may be called directly if desired. The purpose of this vignette is to illustrate the flexibility of this package with some examples.

```
set.seed(1234)
library(knockoff)
```

Let us begin by creating some synthetic data. For simplicity, we will use synthetic data constructed from a generalized linear model such that the response only depends on a small fraction of the variables.

```
# Problem parameters
= 1000 # number of observations
n = 1000 # number of variables
p = 60 # number of variables with nonzero coefficients
k = 7.5 # signal amplitude (for noise level = 1)
amplitude
# Generate the variables from a multivariate normal distribution
= rep(0,p)
mu = 0.10
rho = toeplitz(rho^(0:(p-1)))
Sigma = matrix(rnorm(n*p),n) %*% chol(Sigma)
X
# Generate the response from a logistic model and encode it as a factor.
= sample(p, k)
nonzero = amplitude * (1:p %in% nonzero) / sqrt(n)
beta = function(x) exp(x) / (1+exp(x))
invlogit = function(x) rbinom(n, prob=invlogit(x %*% beta), size=1)
y.sample = factor(y.sample(X), levels=c(0,1), labels=c("A","B")) y
```

Instead of using `knockoff.filter`

directly, we can run the filter manually by calling its main components one by one.

The first step is to generate the knockoff variables for the true Gaussian distribution of the variables.

`= create.gaussian(X, mu, Sigma) X_k `

Then, we compute the knockoff statistics using 10-fold cross-validated lasso

`= stat.glmnet_coefdiff(X, X_k, y, nfolds=10, family="binomial") W `

Now we can compute the rejection threshold

`= knockoff.threshold(W, fdr=0.2, offset=1) thres `

The final step is to select the variables

```
= which(W >= thres)
selected print(selected)
```

`## integer(0)`

The false discovery proportion is

```
= function(selected) sum(beta[selected] == 0) / max(1, length(selected))
fdp fdp(selected)
```

`## [1] 0`

We show how to manually run the knockoff filter multiple times and compute average quantities. This is particularly useful to estimate the FDR (or the power) for a particular configuration of the knockoff filter on artificial problems.

```
# Optimize the parameters needed for generating Gaussian knockoffs,
# by solving an SDP to minimize correlations with the original variables.
# This calculation requires only the model parameters mu and Sigma,
# not the observed variables X. Therefore, there is no reason to perform it
# more than once for our simulation.
= create.solve_asdp(Sigma)
diag_s
# Compute the fdp over 20 iterations
= 20
nIterations = sapply(1:nIterations, function(it) {
fdp_list # Run the knockoff filter manually, using the pre-computed value of diag_s
= create.gaussian(X, mu, Sigma, diag_s=diag_s)
X_k = stat.glmnet_lambdasmax(X, X_k, y, family="binomial")
W = knockoff.threshold(W, fdr=0.2, offset=1)
t = which(W >= t)
selected # Compute and store the fdp
fdp(selected)
})# Estimate the FDR
mean(fdp_list)
```

`## [1] 0.09537065`

If you want to see some basic usage of the knockoff filter, see the introductory vignette. If you want to see how to use knockoffs for Fixed-X variables, see the Fixed-X vignette.