Unbiased identification of signal-activated transcription factors by barcoded synthetic tandem repeat promoter screening (BC-STAR-PROM)

Gosselin et al. designed a widely applicable method, dubbed BC-STAR-PROM, to identify signal-activated TFs without any prior knowledge of their properties. To establish proof of concept for BC-STAR-PROM, they applied it to the identification of TFs induced by drugs affecting actin and tubulin cytoskeleton dynamics.

A B

Unbiased identification of signal-induced transcription factors by Barcorded Synthetic TAndem Repeat Promoter screening (BC-STAR-PROM)
Pauline Gosselin, Gianpaolo Rando, Fabienne Fleury-Olela, Ueli Schibler Supplemental Figure S1: Estimation of the probability to find TF binding sites in random DNA ligated to both ends of the NheI-XbaI restriction fragments of 2.3kb encompassing all six promoter repeats, the luciferase reporters, and the barcodes. These fragments were sequenced by the PacBio SMRT method.
Strategy 2: Linear promoter-luciferase-barcode restriction fragments (NheI, XbaI) were circularized by intermolecular ligation, and the barcode-promoter junctions were amplified by PCR. To associate each barcode to the corresponding promoter, a 150 bp region spanning the barcode and the first promoter repeat was sequenced on an Illumina MiSeq lane (5M reads).
(B) Read count distribution for the 3363 barcodes whose promoter associations have been confirmed by both strategies. We counted 3,237 barcodes that were associated with 2,894 promoters with at least 50 reads coverage.
(C) BC-STAR-PROM quality control: For each promoter associated with 2 different barcodes (Supplemental Table S2), we plotted the expressed reads of one barcode against the expressed reads of the other barcode. The RNA reads have been divided by the DNA reads measured in the plasmid transfection mix. The scatterplots suggest high reproducibility (Pearson > 0.9).     (E) 41-3t3 cells transfected or not with siRNA against MRTF-A+B were seeded at near confluence in DMEM+ 20%FBS, and luminescence was recorded in a lumicycler in order to determine the timespan of serum induction (SRF responds to serum stimulation).
F) The fold changes of luminescence counts for each of the 100 traces shown in Figure 6E were calculated by dividing each count by the median of the entire trace recorded for an entire cell division cycle (cytokinesis to cytokinesis).   Table S2: Number of promoters associated with different barcodes and vice versa, in the 2631 promoters identified in the MiSeq data (displaying more than 100 reads).

Motifs enrichment analysis
In our adaptation of the Gene Set Enrichment Analysis (GSEA) framework developed by Subramanian and colleagues (Subramanian et al. 2005), the "genes" are replaced by the promoters' sequences and the "sets" are formed by grouping all the promoters containing a given motif. We built 2110 sets, using FIMO (Grant et al. 2011) to identify 105,791 occurrences for binding motif (p < 10^-4, that covered 93% of the 2110 motifs described in the Transfac 2012 database). Then, we determined which motifs were primarily enriched at the top or the bottom of the drug response rank, performing an enrichment analysis essentially as described (Subramanian et al. 2005): i) we ranked all promoter's fold changes on a list that measures the promoter's correlation with a given drug treatment, ii) we identified the rank positions of all promoters of any promoter set, and iii) for each set, we calculated an enrichment score that reflects the degree to which a given set is overrepresented at the extremes of the ranked list. Supplemental Fig. 8A shows as example the distribution of one motif over-represented (SRF) and one motif under-represented (CLOCK-BMAL1) after Jasplakinolide treatment.
To establish the significance, and exclude promoter sets that could be enriched by chance, we randomized the drug/dmso sample labels and retested for enrichment 1,000 times. Finally, we adjusted for multiple hypothesis testing using the standard GSEA false discovery rate calculation. The promoter sets that significantly outperformed iterative random class permutations were considered significant.