Download


1. SNVs associated with allele-specific binding (ASB)
ASB v2.1 - added columns
ASB v2.0 - updated results using scripts v2.0 (May 2016)
ASB v1.0 - results from Supplementary Data 1 in the publication
- This tarball contains 2 files:
A) README
B) a tab-delimited file containing all the ASB SNVs (autosomal) annotated by the 40 TFs and individual samples used in our study.
- For each SNV, the raw counts, TF and sample annotation are provided.
- There is redundancy in the data due to SNVs occurring in multiple samples and TF binding sites.

2. SNVs associated with allele-specific expression (ASE)
ASE v2.1 - added columns
ASE v2.0 - updated results using scripts v2.0 (May 2016)
ASE v1.0 - results from Supplementary Data 2 in the publication
- This tarball contains 2 files:
A) README
B) a tab-delimited file containing all the ASE SNVs (autosomal) annotated by autosomal genes and samples.
- For each SNV, the raw counts, gene and sample annotation are provided. If a SNV is not in a gene, it is annotated as 'NA'.
- There is redundancy in the data due to SNVs occurring in multiple samples and genes.

3. SNVs that are accessible to allele-specific binding
accB v2.1 - added columns
accB v2.0 - updated results using scripts v2.0 (May 2016)
accB v1.0 - results used in publication
- This tarball contains 2 files:
A) README
B) a tab-delimited file containing all the SNVs that are accessible to allele-specific binding (autosomal) annotated by the 40 TFs and individual samples used in our study.
- For each SNV, the raw counts, TF and sample annotation are provided.
- Includes ASB SNVs.
- There is redundancy in the data due to SNVs occurring in multiple samples and TF binding sites.

4. SNVs that are accessible to allele-specific expression
accE v2.1 - added columns
accE v2.0 - updated results using scripts v2.0 (May 2016)
accE v1.0 - results used in publication
- This tarball contains 2 files:
A) README
B) a tab-delimited file containing all the SNVs that are accessible to allele-specific expression (autosomal) annotated by autosomal genes and samples.
- For each SNV, the raw counts, gene and sample annotation are provided. If a SNV is not in a gene, it is annotated as 'NA'.
- Includes ASE SNVs.
- There is redundancy in the data due to SNVs occurring in multiple samples and genes.

5. Full sample list
- This tab-delimited file contains information about the individuals used in the study.

6. 382 personal diploid genomes
- This links to 382 tarballs containing personal diploid genomes (autosomal FASTA files) and chain files (for mapping to the human reference genome, hg19, and back to the parental genomes). The personal genomes are based on the SNVs and indels from Phase 1 1000 Genomes Project (1000GP) and were built using vcf2diploid v0.2.6.
See README for details.

7. Uniform peaks recalled on personal genomes
- This is a tarball containing all the peaks recalled using PeakSeq on the diploid personal genomes with ChIP-seq data. The two haplotypes are annotated by "paternal" and "maternal".

8. Supplementary files 3-9 (all)
supp_file3_20kgenes.xlsx
This Excel file (Supplementary Data 3) contains results from our 'collapsed' and 'expanded' enrichment analyses for the 19,257 autosomal protein-coding genes (HGNC symbols) from GENCODE, including the Fisher's exact test odds ratios, p-values (original and Bonferroni-corrected), the number of allele-specific SNVs and accessible non-allele-specific SNVs found in the gene region and the promoter region (upstream 2500bp). The results for housekeeping genes and 4 monoallelically-expressed gene categories are also included. 'NA' is marked in categories where odds ratio cannot be calculated due to insufficient numbers in non-allele-specific SNVs. These are tabulated for ASB, ASE and allele-specific SNVs; the latter is the combined number of ASB and ASE SNVs. Based on results in AS, we define enhancer regions that are "allele-specific" (Bonferroni p value <= 0.05, odds ratio >= 1.5), "balanced" (Bonferroni p value <= 0.05, odds ratio < 1.5) and otherwise, "indeterminate".

supp_file4_40TFs.xlsx
This Excel file (Supplementary Data 4) contains the ASB 'collapsed' and 'expanded' enrichment analyses in promoter regions for 40 TFs used in our database, including the Fisher's exact test odds ratios, p-values (original, Bonferroni-corrected), the number of ASB SNVs, accessible non-allele-specific SNVs both found and not found in the gene region. ASB SNVs for each TF are contributed by different individuals. If either of the parents in the CEU trio is involved, ASB SNVs for NA12878 are not included. Those TFs with only ASB SNVs from NA12878 are annotated '1' under the column 'NA12878 only'. 'NA' is marked in categories where odds ratio cannot be calculated due to insufficient numbers in any of the last three columns.

supp_file5_enhancers.xlsx.gz
This zip file contains an Excel file (Supplemetary Data 5) that shows results from our 'expanded' enrichment analysis for 882 experimentally-determined VISTA70 enhancers and 410,486 enhancer regions from the union of lists by Ernst and Kellis (2012), Hoffman et. al. (2013), and data from distal regulatory modules from Yip et al. (2012). The results include the number of allele-specific SNVs and accessible non-allele-specific SNVs. 'NA' is marked in categories where odds ratio cannot be calculated due to insufficient numbers in non-allele-specific SNVs. These are tabulated for ASB, ASE and AS SNVs; the latter is the combined number of ASB and ASE SNVs. Based on results in AS, we define enhancer regions that are "allele-specific" (Bonferroni p value <= 0.05, odds ratio >= 1.5), "balanced" (Bonferroni p value <= 0.05, odds ratio < 1.5) and otherwise, "indeterminate".

supp_file6_708cat.xlsx
This Excel file (Supplementary Data 6) contains results from our 'collapsed' and 'expanded' enrichment analyses for 708 categories from ENCODE, including the Fisher's exact test odds ratios, p-values (original and Bonferroni-corrected), the number of allele-specific SNVs and accessible non-allele-specific SNVs found in each category. The results for five gene element categories from GENCODE and 16 enhancer categories are also included. 'NA' is marked in categories where odds ratio cannot be calculated due to insufficient numbers in non-allele-specific SNVs. These are tabulated for ASB, ASE and allele-specific SNVs; the latter is the results for the combined number of ASB and ASE SNVs.

supp_file7_motifASB.xlsx
This Excel file (Supplementary Data 7) contains the ASB SNVs that reside in TF motifs described in Kheradpour and Kellis (2014). Under the column 'motif', the information is delimited by "#" in this order: motif identifier (as defined in Kheradpour and Kellis), start position of motif (0-based), end position of motif (1-based), strand and position of SNV in motif. Allelic ratios at each SNV position are defined above, i.e. ratio of number of reference reads to number of alternate reads.

supp_file8_confidentSet.xlsx
This Excel file (Supplementary Data 8) contains sets of 'more confident' ASB and ASE SNVs. The columns are respectively: chromosome (chr), start (0-based start), end (1-based end) position of the SNV, the TF and individual identifier (TF_ind in ASB or ind in ASE), number of individuals with this ASB SNV (indCount), and the allele with more reads (dominantAllele). For the 194 ASB SNVs, each of them is found in at least 3 individuals (indCount >= 3) and the allele that has more reads (dominantAllele) has to be the consistent for all TF_ind. For the more confident 1,890 ASE SNVs, each of them is found in at least 38 individuals ('indCount' >= 38). At the same time, for each of the SNV, the allele that has more reads (dominantAllele) has to be consistent in all the individuals (ind).

supp_file9_Rpseudocode.docx
This Word file (Supplementary Data 9) contains the R pseudocode for the bisection method that is used to estimate the overdispersion parameter.