The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription

Here, Mikhaylichenko et al. investigate the transcriptional properties of enhancers during Drosophila embryogenesis using characterized developmental enhancers. The authors demonstrate that while the timing of enhancer transcription is correlated with enhancer activity, the levels and directionality of transcription are highly varied among active enhancers and conclude that this is likely an inherent sequence property of the elements themselves.


Figure S6. Transcribing enhancers can act as promoters in vivo
Double in situ hybridization against the lacZ reporter gene, driven by the tested element (green) in the presence or absence of a minimal promoter (hsp70), and a mesoderm marker gene, Mef2 (red). Left panel: elements with "stable" endogenous eRNA (BN31.2, BN5-lf-p, BN5-lf-n (Zinzen et al. 2009) Figure S7. Enhancer-gene distance affects the levels of reporter expression but not its spatiotemporal pattern (A) Single assays, measuring enhancer and promoter activity separately, placing the enhancer directly upstream from the reporter, as typically done. In situ hybridization against the lacZ reporter gene driven by the BN5-lf-p element (green). BN5-lf-p acts as an enhancer (green), driving expression in the somatic muscle from stage 10, as well as in the lateral ectoderm (white arrow). It has weak promoter activity (magenta) restricted to the same pattern (white arrowhead). (B) Dual enhancer-promoter activity vector: Double in situ hybridization against gfp driven by the hsp70 minimal promoter under the control of BN5lf-p (green, enhancer assay) and lacZ driven by BN5-lf-p in the absence of a minimal promoter (magenta, promoter assay). BN5-lf-p placed in the dual E-P vector produces somatic muscle expression (white arrow) in an orientation-independent manner, as well as weak promoter activity in both orientations (white arrowhead). Stronger promoter expression is detected in the "plus" (sense) orientation (in-situ and imaging conditons are identical between sense and antisense). OI=Orientation Index, with 0.89 being largely unidirectional transcription.

Figure S8. Intergenic enhancers generally act as promoters in both orientations
Double in situ hybridization against the gfp reporter gene driven by a minimal promoter under the control of the tested elements (green, enhancer assay) and lacZ driven by the tested elements without a minimal promoter (magenta, promoter assay). Each element was assayed in both orientations; "plus" and "minus" denote the DNA strand. All elments function as enhancers (green), driving expression in different tissues (white arrows). Left panel: CRM1149 and rpr4s4 act as promoters, driving lacZ expression in both orientations, in part of the brain (CRM1149) and head (rpr4s4), white arrows, which overlaps part of the enhancers activity. CRM1594 and CRM3105 drive weak lacZ expression (promoter activity) in the brain (white arrowheads), in only one orientation. Right panel: CRM7821, CRM4307, CRM1202 have no detectable promoter activity, while acting as enhancers in the somatic muscle, visceral muscle, and anterior foregut, respectively (white arrows). Embryos are laterally (stage 11) or dorsally (stage 13) oriented with anterior to the left. OI=Orientation Index value, black arrow denotes the main orientation of endogenous transcription: upwards arrow = "plus" DNA strand, downwards arrow ="minus" DNA strand.  Three promoter elements, overlapping main TSSs (VT32050, VT27684, VT2594) produced similar gfp expression patterns in segmental ectodermal strips characteristic for the J27 line, and were therefore considered as background activity (asterisk). VT0249 did not produce any expression. VT32050 and VT27684 act as promoters in the central nervous system and visceral/somatic muscle, respectively (white arrowheads). (B) Three elements overlapping alternative TSSs (VT62448, VT4241, VT1617) have both enhancer (white arrows) and promoter (white arrow heads) activities. The ttk element (overlapping a unidirectional alternative TSS), only acts as a weak promoter in an orientationdependent manner. Embryos are laterally (stages 11 or 12) or ventrally (stage 13) oriented with anterior to the left. OI=Orientation Index value, black arrow denotes the main orientation of endogenous transcription: upwards arrow = "plus" DNA strand, downwards arrow ="minus" DNA strand.

Processing and classification of DNase l hypersensitivity sites (DHS)
Using stage-specific whole embryo (WE) DNase-seq data from (Thomas et al., 2010) we recalled peaks using hotspot (John et al., 2011) with an FDR of 0.05. Peak summits (150bp long) that intersect between biological replicates from the same developmental stage were merged and used for downstream analysis as DNase l hypersensitivity sites (DHS). DHS were classified as promoter regions (TSS-proximal), intragenic and intergenic as follows: To define promoter regions we used unambiguous TSS-to-gene assignments (n = 24,264) from Batut et al based on RAMPAGE (Batut, Dobin, Plessy, Carninci, & Gingeras, 2012).
Genomic locations of TSS (RAMPAGE peaks, (Batut et al., 2012)) were extracted from FlyBase (ID = FBlc0000537). In addition to annotated TSS, this dataset allowed us to account for alternative transcription initiation sites, which are commonly used during development, but may not be present in the current genome annotation (Batut et al., 2012).
For each gene, the RAMPAGE TSS peak with maximum CAGE signal (using our 6-8hr data), across all assigned TSS for that gene (Batut et al., 2012). was classified as 'main', while the remaining TSS were classified as 'alternative'. For intragenic elements, we required that a non TSS-proximal DHS is located entirely within a gene from FlyBase annotation (v5.57) or lncRNA (Young et al., 2012). Intergenic elements were defined as being located > 500bp and 1.5 kb away from the 5′ and 3′ ends of genes, respectively (FlyBase annotation v5.57 and lncRNAs (Young et al., 2012)). To obtain a unique master set of intergenic non-TSS DHS, we merged DHS from all stages (5,9,11,13), resulting in 4,562 non-TSS intergenic DHS regions. All DHS regions, with their classification (promoter, intraand inter-genic) and RNA signal (PRO-cap, CAGE, meso-CAGE), used for the analysis in  Table S5.

Compiling a compendium of uniformly sized characterised developmental enhancers
We compiled a collection of previously characterised enhancers from Kvon et al (mainly neuronal) (Kvon et al., 2014), REDfly (Gallo et al., 2011) and from our own lab (mainly mesodermal). Embryonic stages were matched to time points as follows: stage 4-6 to 3-4hr, stage 7-8 to 4-6hr, stage 9-10 and 11-12 to 6-8hr, stage 13-16 to 10-12hr (Zinzen, Girardot, Gagneur, Braun, & Furlong, 2009). As these enhancers were collated from different sources, they have very different lengths that do not necessarily represent the 'centre' (or 'peak') of biological activity such as the site of transcription factors (TF) binding. For example, all the Vienna Tiles are approximately 2kb, enhancers from REDfly can range from a few hundred base pairs to over 3kb and the Furlong lab mesodermal enhancers are 400bp on average. We therefore resized each enhancer region, based on its DHS peak (using the DNase-seq data described above), requiring that >90% of the DHS length is contained within it. All enhancer regions with more than one DHS were excluded, as they likely represent two enhancer elements, which might differ in their spatio-temporal activity. This resulted in a stringent set of 1,055 single DHS enhancers with in vivo characterised activity. The point of maximum DHS signal (summing DNase-seq coverage from all stages) was used as the enhancer centre.
500bp up-and downstream relative to this central point were used to define a uniformly sized (1kb) set of developmental enhancers whose spatio-temporal activity has been assessed at all stages of embryogenesis. From this set, we excluded 18 enhancers that overlapped by >50% of their length to avoid bias due to double counting of reads. The remaining 1,037 enhancers (Table S6) were used to select characterised enhancers that are active or inactive at different developmental stages ( Fig. 2A, B, Fig S5A) or tissues (Fig. 2C). As described above, these enhancers were classified as intergenic (n=220), intragenic (n=497), and TSS-proximal (n=198). 122 enhancers were not assigned to any classes according to our definitions.
Libraries were amplified using 16 PCR cycles, and the CAGE purification step was replaced by size selection with AMPure bead purification. To obtain mesoderm-specific CAGE libraries, mesodermal cells were FACS-purified from unfixed, 6-8hr staged embryos expressing an EGFP-tagged CBP20 protein under the control of an early mesodermal enhancer from the twist gene (Bonn et al., 2012). RNA was isolated and CAGE libraries prepared from 2.5 ug of total RNA as described in Schor et al. (Schor et al., 2017).
Biological replicates for both PRO-cap and CAGE came from independent embryo collections and RNA isolations, while technical replicates came from independent CAGE libraries made from the same RNA. Samples were multiplexed and sequenced by an Illumina HiSeq 2000 sequencer (50bp single-end reads).

Processing PRO-cap and CAGE data
Illumina sequence files were demultiplexed and converted to FASTQ. Sequencing quality was assessed using FastQC as provided in Galaxy (Blankenberg et al., 2010;Giardine et al., 2005;Goecks, Nekrutenko, Taylor, Galaxy Team, 2010), tool version 1.0.0. Reads were aligned against the D. melanogaster dm3 genome using bwa mem (Li et al., 2009) (version 0.7.12) with default parameters. Given the high reproducibility between replicates (Fig. S2), technical and biological replicates were merged resulting in 66,085,730 and 52,422,944 reads for 3-4hr and 6-8hr time points, respectively and 109,285,610 reads for mesodermal CAGE data. The whole embryo CAGE data processing is described in Schor et al (Schor et al., 2017). For this analysis, we combined CAGE reads from all samples at equivalent timepoints, which resulted in 1,045 million mapped reads. For data visualization, we extracted positions of 5′ ends of reads and converted them into BigWig file format with a custom R script (provided in Tables S1-4 and S7). The provided Bigwig files represent counts of these 5′ sites per base pair. Signal columns in tables (Tables S5, S6, S8) are derived from counts of read 5′ ends.

Comparing eRNA levels and directionality between Drosophila and human
To compare the relative levels of eRNA on regulatory elements between species, we first made the data comparable by ensuring that the level of RNA signal was similar on promoters, taking care of different sequencing depths between samples. We randomly sampled GROcap (K562) (Core et al., 2014), PRO-cap (S2 cells, GSM1032759) (Kwak et al., 2013), and our PRO-cap (whole embryo) data so that the read counts per promoter regions align between all datasets (Fig. S3). For both human and fly, only protein-coding genes were used: 5′ gene ends were extended to +/-0.5 kb regions, excluding intersections with other TSSs on the opposite strand and RAMPAGE peaks (N=16,685 for human and 8,346 for fly). Using the resampled data, we then compared read counts over DHS defined identically for both Drosophila S2 and human K562 cell lines, taking the DHS data from (Arnold et al., 2013) and ENCODE (ENCODE Project Consortium, 2012), respectively. Only intergenic regions (resized to 1kb prior to analysis), located at least 0.5 kb and 1.5 kb away of gene 5′ and 3′ ends, respectively, were used for the analysis (Fig. 1A, Fig. S3).
To compare directionality of eRNA transcripts, we calculated the orientation index (OI) (Core et al., 2012) as the fraction of total reads over a region found in the sense orientation.
Here, sense orientation for both promoters and enhancers was defined as the one with greatest read count (Fig. 1B).

Motifs enrichment analysis
Inr motif analysis was performed by centering on the maximal point of transcription.

eRNA levels on characterised enhancers in an active and inactive state
For PRO-cap and CAGE data, the summed signal was estimated over 20 bp bins resulting in 100 points per enhancer (from -1kb to +1kb of the enhancer centre). Points with the top and bottom 5% of signal intensity (estimated from all binned regions) were excluded to avoid the impact of outliers. Due to high resolution of the PRO-cap and CAGE data, 95% confidence intervals of the trimmed mean were calculated, assuming a normal distribution, followed by Lowess smoothing of the mean values. Scaled values (used in Fig. 2A-C, Fig   S5A) were obtained by scaling the minimum average intensity value and then dividing by the maximum average intensity for each separate data type, over the bins within the region from +/-1kb of the enhancer centre.

Transgenic reporter assays
To assay for enhancer and promoter activity using the single read-out vector (Fig. 3, Fig.   S6), corresponding genomic regions were amplified by PCR (primers listed in Table S9) and placed either upstream of a minimal hsp70 promoter driving a lacZ reporter gene (enhancer assay) or directly upstream of the lacZ reporter gene with no promoter (promoter assay) in a modified pH Pelican vector (Barolo, Carver, & Posakony, 2000). For the dual assay, genomic regions were amplified by PCR (Table S9), digested with Hind III restriction enzyme and ligated into the multiple cloning site of the dual assay vector. Derived bacterial colonies were screened by colony PCR to obtain clones with inserts in both orientations.