We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in (Palowitch et al. 2017), involves an additive effect of allele count and multiplicative component random noise (hence “ACME”: Additive-Contribution, Multiplicative-Error), and is defined as
\[y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i\]
where
We estimate the model using a fast iterative algorithm.
The algorithm estimates the model which is nonlinear only with respect to \(\eta = \beta_1 / \beta_0\)
\[y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i\]
ACMEeqtl can be installed with the following command.
install.packages("ACMEeqtl")ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.
library(ACMEeqtl)First we generate sample gene expression, SNP allele counts, and a set of covariates.
# Model parameters
beta0 = 10000
beta1 = 50000
# Data dimensions
nsample = 1000
ncvrt = 19
### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))
# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)
# Generate log-normalized expression
y = log(beta0 + beta1 * s) + 
    cvrt %*% rnorm(ncvrt) + 
    rnorm(nsample)We provide two equivalent functions for model estimation.
effectSizeEstimationR – fully coded in ReffectSizeEstimationC – faster version with core coded in C.z1 = effectSizeEstimationR(s, y, cvrt)
z2 = effectSizeEstimationC(s, y, cvrt)
pander(rbind(z1,z2))| beta0 | beta1 | nits | SSE | SST | Ftest | eta | SE_eta | |
|---|---|---|---|---|---|---|---|---|
| z1 | 9956 | 47308 | 6 | 963 | 1760 | 810 | 4.75 | 0.366 | 
| z2 | 9956 | 47308 | 6 | 963 | 1760 | 810 | 4.75 | 0.366 | 
Variables z1, z2 show that the estimation was done in 6 iterations, with estimated parameters
First we generate a eQTL dataset in filematrix format (see filematrix package).
tempdirectory = tempdir()
z = create_artificial_data(
    nsample = 100,
    ngene = 100,
    nsnp = 501,
    ncvrt = 1,
    minMAF = 0.2,
    saveDir = tempdirectory,
    returnData = FALSE,
    savefmat = TRUE,
    savetxt = FALSE,
    verbose = FALSE)## Loading required namespace: RSQLiteIn this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.
multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 1.5e+06,
    threads = 2,
    workdir = file.path(tempdirectory, "filematrices"),
    verbose = FALSE)Now the filematrix ACME holds estimations for all local gene-SNP pairs.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)
pander(t(TenResults))| geneid | snp_id | beta0 | beta1 | nits | SSE | SST | F | eta | SE | 
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 125 | -43.5 | 6 | 88.5 | 104 | 16.8 | -0.347 | 0.0453 | 
| 1 | 2 | 84.1 | 9.72 | 6 | 103 | 104 | 0.508 | 0.115 | 0.176 | 
| 1 | 3 | 89.4 | 2.26 | 5 | 104 | 104 | 0.0264 | 0.0253 | 0.159 | 
| 1 | 4 | 78.7 | 20.6 | 5 | 102 | 104 | 2.15 | 0.262 | 0.211 | 
| 2 | 4 | 57.1 | -8.94 | 4 | 154 | 156 | 0.957 | -0.157 | 0.135 | 
| 2 | 5 | 52.9 | -3.1 | 4 | 156 | 156 | 0.115 | -0.0586 | 0.163 | 
| 2 | 6 | 81.9 | -36.4 | 10 | 112 | 156 | 37.5 | -0.444 | 0.0207 | 
| 2 | 7 | 46.8 | 8.59 | 4 | 155 | 156 | 0.722 | 0.184 | 0.244 | 
| 2 | 8 | 49.2 | 2.82 | 3 | 156 | 156 | 0.091 | 0.0573 | 0.199 | 
| 2 | 9 | 54.2 | -4.28 | 5 | 155 | 156 | 0.235 | -0.079 | 0.151 | 
Now we can estimate multi-SNP ACME models for each gene:
multisnpACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    workdir = file.path(tempdirectory, "filematrices"),
    genecap = Inf,
    verbose = FALSE)Now the filematrix ACME_multiSNP holds estimations for all multi-SNP models.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)
pander(t(TenResults))| geneid | snp_id | beta0 | betas | forward_adjR2 | 
|---|---|---|---|---|
| 1 | 1 | 111 | -45.7 | 0.139 | 
| 1 | 4 | 111 | 14.5 | 0.149 | 
| 1 | 2 | 111 | 14 | 0.151 | 
| 2 | 6 | 91.4 | -36.7 | 0.272 | 
| 2 | 8 | 91.4 | 5.14 | 0.284 | 
| 2 | 9 | 91.4 | -8.36 | 0.285 | 
| 2 | 4 | 91.4 | -7.75 | 0.291 | 
| 3 | 11 | 121 | 162 | 0.218 | 
| 3 | 12 | 121 | -24.3 | 0.22 | 
| 4 | 17 | 57 | 41.4 | 0.0669 | 
Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R\(^2\) value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model’s adjusted-R\(^2\) is greater than that from the previous combined model.
Palowitch, John, Andrey Shabalin, Yi-Hui Zhou, Andrew B Nobel, and Fred A Wright. 2017. “Estimation of Cis-eQTL Effect Sizes Using a Log of Linear Model.” Biometrics. Wiley Online Library.