Tables from Webster et al manuscript.

GKC CIS Selection. Limiting the analysis to late stage clonal integrations, Gaussian Kernel Colvolution (CIMPL/KCRBM) identifies 311 CIS loci with a p-value <0.05.


Genome wide scan for selection. Genome wide scanning of subclonal mutation selection. A scanning 100kb window is moved across the genome in increments of 10kb. For each window the number of insertions in each class (early/late, forward strand/reverse strand, BCL2 transgenic/wild type, B cell/T cell) is counted and the likelihood of this distribution between groups is estimated using Fisher's exact test. By comparing neighboring windows p-value minima are identified (i.e. windows where the p-value is higher on either side). If a run of 2 or more minima are less than 100,000bp from each other all but the lowest minima will be discarded. For each of the remaining local p-value minima, to assign gene names the nearest peak identified by Gaussian kernel convolution (using the CIMPL/KCRBM packages) and the genes associated with this peak are identified. Each locus with a false discovery rate <0.05 is indicated.


Strand bias in early & late stage samples. Comparison of strand bias in early vs. late stage cohorts using equal numbers of integrations. The total number of early stage integrations is 81316. A subset of late 81316 stage integrations was randomly selected and regions with strand/orientation bias were identified using 100kb scanning windows across the entire genome. 16 late-stage loci were significant (FDR < 0.05) after correcting for multiple testing but no early stage loci were found to be significant.


Candidate genes. Combined list of all candidate genes implicated by one or more criteria in the genome wide scans of subclonal mutation selection.


Tab delimited text files.