PR-DUB preserves Polycomb repression by preventing excessive accumulation of H2Aub1, an antagonist of chromatin compaction

Here, Bonnet et al. investigated how levels of monoubiquitination of histone H2A at lysine 118 (H2Aub1) must be balanced for Polycomb repression, and show that in early embryos H2Aub1 is enriched at Polycomb target genes, where it facilitates H3K27me3 deposition by PRC2 to mark genes for repression. They show that PR-DUB acts as a rheostat that removes excessive H2Aub1 that, although deposited by PRC1, antagonizes PRC1-mediated chromatin compaction.


ChIP-seq analysis in Drosophila embryos and in larval tissues
Embryo collection, chromatin preparation, and ChIP 0-6 hrs, 13-17 hrs and 21-24 hrs old wildtype, Sce I48A or caly C131S embryos as well as 21-24 hrs old Asx 0 mutant embryos (see Table S1 for details about the genotypes) were dechorionated, quick-frozen in liquid N2 and stored at -80˚C. Chromatins were prepared as described in Finogenova et al. (Finogenova et al, 2020) and 500 ng of chromatin were used for each ChIP experiment. 100 ng of an independently prepared batch of D. pseudoobscura chromatin were spiked-in in each ChIP experiment (prior to the addition of the antibody) for subsequent normalization of the ChIP-seq datasets. The ChIP protocol was then performed as described in Bonnet et al. (Bonnet et al, 2019). ChIP on hand-dissected wing and 3 rd leg / haltere imaginal disc tissues from 3 rd instar Orengo R larvae was performed as described in Laprell et al. (Laprell et al, 2017).

Library preparation and sequencing
Library preparation for sequencing was performed with Ovation® Ultralow System V2 (NuGEN,PART NO. 0344). Illumina systems (NextSeq 500) were used for paired-end DNA sequencing. BCL raw data were converted to FASTQ data and demultiplexed by bcl2fastq Conversion Software (Illumina). All reads were aligned using STAR (Dobin et al, 2013) to the D. melanogaster dm6 genome assembly (Dos Santos et al, 2015) and to the D. pseudoobscura dp3 genome assembly (Nov. 2004, FlyBase Release 1.03). Only sequences that mapped uniquely to the genome with a maximum of two mismatches were considered for further analyses.

Normalization of ChIP-seq datasets
The proportion of D. pseudoobscura reads as compared to D. melanogaster reads in input and in samples was used to normalize the ChIP-seq datasets of histone marks in embryos (See Supplementary file 2 from Finogenova et al. (Finogenova et al, 2020) for more details). ChIP performed using antibodies against Polycomb group proteins or the RNA Pol II, and ChIP from larval tissues were normalized based on the total number of reads in each dataset.

Identification of H3K36me2-and H3K27me3-enriched regions
The Bioconductor STAN-package (Zacher et al, 2017)  H3K27me2 coverage and which overlap with one or more Pho peaks; see Figure S1 for details).

Calculation of read coverage on gene bodies
ChIP-seq read coverages across gene bodies were computed on genomic intervals starting 750 bp upstream transcription start sites and ending 750 bp downstream transcription termination sites. Read coverage is defined as the normalized number of mapped reads per million reads from a ChIP-seq dataset divided by the number of mapped reads per million reads from the corresponding input dataset across a genomic region. Among the D. melanogaster

Identification of Pho, Scm and RNA PolII bound regions
Peak calling for Pho, Scm (from 13-17 hrs old wildtype embryos) and PolII S5P (from 21-24 hrs old wildtype embryos) ChIP-seq datasets was performed using MACS 2.2.6.

ATAC-seq
Batches of 120 13-17 hrs old wildtype or caly C131S embryos were collected, dechorionated and homogenized in ice-cold Nu1 buffer (15 mM HEPES pH 7.6, 20 mM KCl, 5 mM MgCl2, 0.5 mM EDTA pH 7.9, 0.5 mM EGTA pH 7.9, 20% glycerol, 350 mM sucrose and 0.1% NP40, complemented with protease inhibitors) and the homogenates were filtered through Miracloth tissue (Calbiochem). Nuclei were washed in ATAC Lysis Buffer from the ATAC-Seq kit (Active Motif (#53150)) and the washed pellets were snap frozen in liquid nitrogen and stored at -80˚C. Before the last centrifugation step, 5% of the purified nuclei were collected and diluted into a final volume of 100 µL of PBS for genomic DNA extraction to determine the number of purified nuclei in each sample. Proteins were digested by addition of 5 µg of Proteinase K for 1 hrs at 50˚C and the enzyme was then heat inactivated at 94 ˚C for 10 minutes.  (Twist Bioscience) containing these four genomic DNA regions, as quantification standard in parallel qPCR reactions. Frozen nuclear pellets were then resuspended in ATAC Lysis Buffer to get a concentration of 25000 nuclei per µL and 1 µL of the suspension was used for the tagmentation reaction. Tagmentation reaction, DNA purification and PCR Amplification of Tagmented DNA was performed with the ATAC-seq kit (Active Motif (#53150)), following the instructions of the manufacturer. Sequencing and read mapping was performed as described for the ChIP-seq analysis. Different ATAC-seq samples were normalized based on the total number of reads in each dataset. ATAC-seq signal is defined as the normalized number of mapped reads per million reads from an ATAC-seq dataset across a genomic region.

Definition of regions with low, intermediate and high DNA accessibility
The Bioconductor STAN-package ( (Zacher et al, 2017), see above) was used to annotate the genome into 4 states corresponding to regions with low, intermediate and high DNA accessibility, and 'no input' regions (these non-unique DNA sequences are represented in white on the bar track in Figure 5H and Figure S10A). The genome was segmented in 200 bp bins and STAN annotated each of them into 1 of the 4 states based on the number of overlapping reads from 13-17 hrs old wildtype embryo ATAC-seq datasets and from a DNA input dataset (as a control to identify genomic regions with non-unique DNA sequences). The Poisson Lognormal distribution was selected and fitting of hidden Markov models was performed with a maximum number of 100 iterations. The overlap between these DNA accessibility-based regions and the four types of chromatin domains (see above and Figure S1), was then determined. For regions defined based on DNA accessibility that are overlapping with several types of chromatin domains, the same logic described for gene bodies was also applied (see above). Among the regions with low or intermediate DNA accessibility, only those larger than 1kb were considered for further analyses. 4118, 2315, 863, 26, 38, 59, and  RNA-seq datasets were generated. Drosophila transcript (from release 6.35) quantification was performed from fastq files using Salmon (Patro et al, 2017). Abundances were summarized from transcript-to gene-level with the Bioconductor package tximeta (Love et al, 2020) and differential expression analysis was performed using the DESeq2 package (Love et al, 2014).
A first differential expression analysis with all datasets from a mutant genotype and all wildtype datasets was performed to identify by principle component analysis, a subset of comparable datasets originating from wildtype and mutant embryos of equivalent developmental stage.
This selection was further validated by looking at the embryonic morphology. This led to the selection of 5 Asx 0 with 3 wildtype embryos and 4 Sce I48A with 4 wildtype embryos to perform the final differential expression analysis. Among the about 11500 analyzed genes, about 7800, 350, 1000 and 2300 are respectively located in H3K36me2-enriched regions, canonical and non-canonical H3K27me3 regions, and in other genomic regions.

Synthesis of H2AubSS
H2AubSS was prepared following (Chatterjee et al, 2010;Fierz et al, 2011;Debelouchina et al, 2017). In short, ubiquitin (1-76) was cloned in frame with a single-chain version of the splitintein Npu, containing a C-terminal mutation of the catalytic asparagine and the +1 cysteine (in the extein) to alanine: The construct was expressed in BL21(DE3)plysS cells (induction for 4h with 0.5 mM IPTG), the cells were lysed and the Ub-intein fusion was purified over a Ni:NTA affinity column. The protein was eluted with 600 mM imidazole in 20 mM Tris-HCl, pH 6.8 and 200 mM NaCl elution buffer. 25 mM cysteamine and 50 mM TCEP were added and the intein cleavage was let to proceed overnight ( Figure S6A). Ubiquitin-SH was purified by preparative RP-HPLC using a gradient of 0-70% B, analyzed by analytical RP-HPLC ( Figure S6B) and electrospray liquid chromatograpy mass spectroscopy (ESI-LCMS) ( Figure S6C).
Collected fractions were analyzed by SDS-PAGE, and octamer containing fractions were pooled and concentrated. Glycerol was added to a final concentration of 50% (v/v) and octamers were stored at -20 °C.

Production of labelled chromatin DNA
Chromatin DNA was produced as shown in Figure S7A and described in (Kilic et al, 2018).
Briefly, recombinant pieces recP1 and recP5, flanked by DraIII and BsaI restriction enzyme sites were expressed in bacterial cells and purified via PEG-precipitation at the indicated PEG percentages (see Figure S7B-D for 30 bp linker DNA, and Figure S7E-G for 50 bp linker DNA). The sequence for the repeating unit for both 30 and 50 bp linker DNA sequences are given below (601 sequence in grey bold (Thåström et al, 1999)), donor labelling position in red, acceptor labelling position in violet).

TGAACAGCATGATCAGTACTATGGACCCTATACGCGGCCGCC
Fragments P2, P3 and P4 were produced by PCR reactions using fluorescently labeled primers using indicated dyes ( Figure S7A), for P2 and P4, and purified. Each piece was digested with the restriction enzymes BsaI-HF and DraIII-HF, resulting in non-palindromic unique overhangs. Digested DNA fragments were purified by PEG precipitation. For preparative ligations, 30-60 pmol of each DNA piece was used to generate the intermediates in combined volumes of 200-400 µL: P2 was ligated to P1 in 20% excess for 2 h in 1x T4 DNA ligase buffer with 60 U of ligase, then P3 was added in 20% excess relative to P2 and ligation allowed to proceed overnight. At the same time, P4 was ligated to P5 in 20% excess for 12-16 h ( Figure   S7H,I, I. DNA ligation). The pieces were purified by PEG precipitation using a stepwise (0.5% steps) increase in PEG from 7.0% to 8.0%. Pellets containing the purified desired chromatin DNA intermediates were redissolved in 60 µL TE buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 8.0), were pooled and stored for later ligations (Figure S7H,I, II. PEG precipitation). 15-35 pmol of the 6x601 intermediates were mixed using 5-10% excess P4-P5, the biotinylated anchor was added as well as 1x T4 DNA ligase buffer with 60 U of ligase. The mixture was then left to ligate for 10-16 h ( Figure S7H,I, III. DNA ligation). The formation of the product was confirmed by agarose gel electrophoresis and purified by step-wise PEG precipitation in the range 5.0-6.0% ( Figure S7H,I, IV. PEG precipitation). The pellets were redissolved in TE(10/0.1) and analyzed by gel electrophoresis to pool the purified double-labeled array DNA.

Chromatin assembly
Chromatin assembly was performed as previously described (Kilic et al, 2018). Chromatin arrays were reconstituted on a scale of ~ 20 pmol (calculated based on nucleosome positioning sequences (NPS)). Fluorescently labelled chromatin array DNA containing FRET dyes and 30 or 50 bp linker DNA was combined with equimolar amounts of MMTV buffer DNA, NaCl was added to a final concentration of 2 M, followed by the addition of equimolar equivalents of histone octamers (either unmodified or containing H2AubSS). If indicated, H1.2 was added at this stage (equivalents are experimentally determined, usually 1-2 equivalents per nucleosome, the variation being due to uncertainties in concentration determination). The mixture was dialyzed with a gradient from TEK2000 buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 7.5, 2000 mM KCl) to TEK10 buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 7.5, 10 mM KCl) over 16 h.
The dialyzed mixture was taken up in 200 mL TEK10 and further dialyzed for 1 h. The chromatin assembly mixtures were then centrifuged at 25,000 x g for 10 min and the supernatant was collected, and analyzed via native agarose gel-electrophoresis (Figure S8A, F, G). The concentration of the crude assembly was determined by UV-vis spectrometry. The quality of the chromatin assemblies were further assessed by ScaI-digestion of the arrays, liberating individual nucleosomes. Chromatin arrays were combined with an equal volume of 1 x CutSmart buffer (New England Biolabs) and 10 U of ScaI-HF restriction enzyme, followed by digestion for 4 h at 37 °C. ScaI digests were analyzed by native PAGE (Figure S8B, F, H).
Only chromatin arrays that showed full nucleosomal occupancy in ScaI digest samples were used for further experiments.

Flow chamber preparation
Flow chambers for TIRF experiments were prepared as previously described in the literature (Kilic et al, 2018). Briefly, borosilicate glass slides with 2 rows of 4 holes and borosilicate coverslips were cleaned by sonication for 20 min in ultra-pure H2O, followed by acetone, ethanol and piranha solution (25% v/v H2O2 and 75% v/v H2SO4). Then, they were washed with H2O until reaching neutral pH, again sonicated in acetone for 10 min and then immersed in 3% v/v (3-aminopropyl)triethoxysilane in acetone for 20 min. Finally, slides and coverslips were washed in ultra-pure H2O and dried with N2. On each slide, four flow-chambers were assembled using strips of double-sided 0.12 mm tape and a coverslip. The chambers were sealed with epoxy glue and stored under vacuum at -20 °C until use.
Before measurements, the flow chambers were fitted with pipette tips in each of the 2 x 4 holes. Subsequently, 350 μL of 0.1 M tetraborate buffer at pH 8.5 was used to dissolve ~1 mg of biotin-mPEG (5000kDa) -SVA, and 175 μL from this was transferred to 20 mg mPEG (5000kDa) -SVA to generate a transparent clouding-point solution after 10 s of centrifugation.
This was mixed to homogeneity and centrifuged for 10 s before 40-45 μL were loaded into each of the four channels in the flow chamber, incubated at RT for 2 h, after which the solution was washed out with degassed ultra-pure H2O.  c a c a t c c t a t t a a a a t a t g t a c t a t c c t t a g g t c a c g a t g t a c a g g t g g a g g a g g t g t a c g a t t t a c a g a a a c c c a t c g a g a g t c c a t a t g g c t t c a t a t t t c t c t t c c g c t g g a t c g a g g a g c g a c g c g c c a g a c g c a a a a t t g t g g a g a c a a c t g c t g a g a t a t t c g t c a a g g a t g a g g a g g c c a t t t c c a g c a t t t t c t t c g c c c a g c a g g t a g t c c c c a a t a g c a g t g c c a c a c a c g c g t t g c t t t c a g t g c t c c   (Kaushal et al, 2021).

Figure S1
(E) Western blot analysis on serial dilutions of total nuclear extracts from 21-24 hrs old embryos shows that H2Aub1 bulk levels are about 4-fold higher in caly C131S (top) Figure 2C).

21-24 hr old embryos
(D) Same as in (B) but for H3K27me3 coverage (compare with scatter plot in 21-24 hrs old caly C131S mutant embryos in Figure 3B).
(E) Same as in (C) for H3K27me3 average signal (compare with average H3K27me3 profile in 21-24 hrs old caly C131S mutant embryos in Figure S3B).   were wildtype (wt, top row), homozygous for caly C131S (middle row) or homozygous for Asx 0 (bottom row). The yellow-marked clone tissue is identified by the lighter pigmentation of bristles in this tissue as compared to the darker pigmentation of bristles in the neighbouring wild-type tissue. Clone cells also lacked the white + marker which allowed identification of clone tissue in the eye by the absence of red eye pigment. Of note, the animals were in all cases heterozygous for an M(2)53 mutation which slowed down developmental progression and resulted in a Minute phenotype in the animals (Morata & Ripoll, 1975), whereas yclone cells, containing two wild-type M(2)53 + alleles, grew and proliferated normally, allowing generation of very large y --marked clones. In the head, caly C131S or Asx 0 mutant tissue in the antenna resulted in partial transformations of the a3 antennal segment and the arista (black arrowheads), with variable expressivity, as previously described (Halachmi et al, 2007). But note that yellowmarked caly C131S or Asx 0 mutant clone tissue in other regions of the head forms regular head structures, and that the arrangement of ommatidia in caly C131S or Asx 0 mutant tissue (marked by the absence of red pigment) also appears unperturbed. In the legs, appearance of extra sex combs on the L2 and L3 legs (black arrows) are the only major morphological defect in caly C131S or Asx 0 mutants; note that in all three genotypes, the leg tissues consist mostly of clone cells in the individuals shown here. In the thorax, severe transformation of the wing (W) into tissue with characteristics of the haltere (H) is the most prominent morphological defect, whereas in the notum (N), the structure and bristle pattern formed by caly C131S or Asx 0 mutant clone tissue appears indistinguishable from wildtype.

Figure S5
(B) Eye-antennal (top) and wing (bottom) imaginal discs with clones of caly C131S mutant cells, stained with antibodies against Antp or Ubx (red), as indicated and Hoechst to visualize DNA (blue). The caly C131S homozygous mutant clone tissue is marked by the absence of GFP (green).
Note that the Antp and Ubx misexpression phenotypes are very similar to those seen in caly 0 or Asx 0 mutants (compare with Figure 4C).

Figure S10
A

Figure S10
Excessive H2Aub1 levels lead to chromatin opening in vivo.
(A) ATAC-seq profiles from 13-17 hrs old wildtype (wt) and caly C131S mutant embryos at a genomic window encompassing the sens-2 gene, which is located in a canonical H3K27me3 domain. Red bars below Track 2 underline regions with significant gain in DNA accessibility in caly C131S mutants (see also Figure 5H

Figure S11
A

Mutant alleles and genotypes of animals used in the different figures
The following mutant alleles were used in this study: caly C131S : generated in this study, see Material and Methods for details.
Asx 22P4 : null allele (Scheuermann et al, 2010) referred to as Asx 0 in this manuscript.
Pc XT109 : null allele (Franke et al, 1995) referred to as Pc 0 in this manuscript.