Influence of the amino-terminal sequence on the structure and function of HIV integrase

Background Antiretroviral therapy (ART) can mitigate the morbidity and mortality caused by the human immunodeficiency virus (HIV). Successful development of ART can be accelerated by accurate structural and biochemical data on targets and their responses to inhibitors. One important ART target, HIV integrase (IN), has historically been studied in vitro in a modified form adapted to bacterial overexpression, with a methionine or a longer fusion protein sequence at the N-terminus. In contrast, IN present in viral particles is produced by proteolytic cleavage of the Pol polyprotein, which leaves a phenylalanine at the N-terminus (IN 1F). Inspection of available structures suggested that added residues on the N-terminus might disrupt proper protein folding and formation of multimeric complexes. Results We purified HIV-1 IN 1F1–212 and solved its structure at 2.4 Å resolution, which showed extension of an N-terminal helix compared to the published structure of IN1–212. Full-length IN 1F showed increased in vitro catalytic activity in assays of coupled joining of the two viral DNA ends compared to two IN variants containing additional N-terminal residues. IN 1F was also altered in its sensitivity to inhibitors, showing decreased sensitivity to the strand-transfer inhibitor raltegravir and increased sensitivity to allosteric integrase inhibitors. In solution, IN 1F exists as monomers and dimers, in contrast to other IN preparations which exist as higher-order oligomers. Conclusions The structural, biochemical, and biophysical characterization of IN 1F reveals the conformation of the native HIV-1 IN N-terminus and accompanying unique biochemical and biophysical properties. IN 1F thus represents an improved reagent for use in integration reactions in vitro and the development of antiretroviral agents.


Background
Integration of a reverse-transcribed DNA copy of the HIV RNA genome into a host cell chromosome is an essential step in retroviral replication [1]. The integrated provirus serves as a template for retroviral gene expression and the production of a new generation of virions. Integration also establishes the potential for latency, a major barrier to the treatment and cure of and recombinant full-length IN has been reported to exist in forms ranging from monomer to octamer [13][14][15][16][17][18].
The first N-terminal residue of HIV-1 IN is a highly conserved phenylalanine [52][53][54] liberated by retroviral protease cleavage from the C-terminus of reverse transcriptase. Viruses containing engineered substitutions at IN F1 are replication-incompetent [55], showing defects in reverse transcription and integration, characteristic of class II IN mutations such as those that disrupt the HHCC motif [56][57][58][59]. Another closely studied NTD substitution Y15A also affects reverse transcription and integration [60], and IN Y15A is hypo-oligomeric in solution [13,61]. Isolated IN NTD Y15A is structurally constrained, adopting only one of two NTD conformational states (the E form) [62] while the wild type NTD adopts both the E and D forms [3]. Conformational transition between E and D forms involves significant structural rearrangements in the NTD, including a change in the length of the ɑ1 helix by 6 residues [3]. The aberrant phenotypes caused by substitutions at F1 and Y15 led us to investigate the structure and function of the HIV-1 NTD in more detail.
IN is often produced for laboratory studies by bacterial overexpression in vitro with an N-terminal methionine (IN MF) [61,63,64] or as an N-terminal fusion protein, such as the Sso7d-IN fusion [23,24,51]. Solution structures of the isolated NTD were determined from constructs purified with a cleavable N-terminal affinity tag [3,65], so that thrombin cleavage of the fusion protein left three residues (G-S-H-) preceding F1 (IN GSH). In the solution structure of IN GSH NTD [3], the backbone carbonyl of F1 contributes the first hydrogen bond of the ɑ1 helix. The solution structure of another variant, IN GSH NTD H12C , which contains a substitution in the HHCC Zn-binding motif, shows a different N-terminal structure: the carbonyl of F1 is not involved in a hydrogen bond, L2 is displaced, and the ɑ1 helix begins with G4 [65]. The only crystal structure containing the HIV-1 IN NTD (PDB: 1K6Y) [66] consists of a two-domain truncated form (NTD-CCD) also purified using an N-terminal affinity tag and subsequent thrombin cleavage, leaving 3 residues (G-S-H-) preceding F1 [43,66]. In this case as well, the ɑ1 helix is shortened, suggesting that the extra N-terminal residues might be disrupting native folding of the ɑ1 helix.
Four NTDs in two structurally distinct positions exist in the HIV-1 core intasome complex cryo-EM structures determined with Sso7d-IN [23,24]. One NTD, positioned close to the viral DNA and the CCD responsible for catalysis, forms NTD-NTD interactions in the dodecameric HIV-1 intasome and the hexadecameric MVV intasome [25]. The ɑ1 helix of this NTD is shortened in the first HIV-1 tetrameric intasome structure where it begins with Asp 3 [24]. The ɑ1 helix is extended in four of five recent intasome structures, with only one structure showing partial disruption [23]. The second NTD does not interact with the viral DNA and is distant from the active site. This NTD does not form NTD-NTD interactions in dodecameric or hexadecameric intasomes and shows a range of ɑ1 helical structures: disordered, partially unstructured, and extended [23]. Intasomes of a closely-related simian immunodeficiency virus were prepared with IN purified with an N-terminal affinity tag and subsequent human rhinovirus 3C protease cleavage, leaving 3 residues (G-P-G-) preceding F1 [22]. The NTDs in these structures show extended ɑ1 helices.
In this paper, we report a purification scheme of wild type IN with phenylalanine as the N-terminal residue (IN 1F), and associated alterations in the N-terminal structure and IN function. IN 1F was purified with an N-terminal affinity tag, which, when removed, leaves phenylalanine at position 1. We report a two-domain NTD-CCD crystal structure of IN 1F that shows a continuous helical fold beginning with the backbone carbonyl of F1, in contrast to the existing IN GSH NTD-CCD structure [66]. IN 1F also shows greater concerted integration activity in vitro compared to IN GSH and IN MF. IN 1F is altered in its sensitivity to inhibitors, showing decreased sensitivity to the strand-transfer inhibitor raltegravir and increased sensitivity to allosteric integrase inhibitors (ALLINIs). Biophysical characterization reveals that IN 1F has oligomeric properties distinct from previously studied recombinant IN constructs. We propose that HIV-1 IN 1F more closely recapitulates the structure and functions of IN found in authentic HIV infection.

Construction of IN expression vectors
The NL4-3 HIV-1 IN coding sequence was amplified by PCR, fused to an N-His7-Flag-Sumo tag using 4-primer pcr, and cloned into a pCDFDuet expression vector. The fusion junction contains the sequence "G-G-F", where cleavage by the SUMO protease Ulp1 occurs after the second glycine, liberating IN with a phenylalanine at position 1. IN GSH and IN MF were created by insertion of additional codons preceding the native phenylalanine by inverse PCR (IN GSH) or site-directed mutagenesis (IN MF). IN 1F NTD-CCD F185, W131D, F139D was constructed by truncation of the full-length construct and insertion of a synthetic cassette containing the amino acid substitutions. The lens epithelium derived growth factor (LEDGF) integrase binding domain (IBD) (residues 347-471) was cloned into a pETDuet expression vector with the Mxe intein, a chitin binding domain, and a His6 tag as previously described [67].

Protein expression and purification
IN constructs were expressed as previously described with some modification [61,64,[67][68][69]. Expression plasmids were transformed into E. coli BL21(DE3) and grown in 800 mL of 2×YT at 37 °C to an optical density of 1.8-2.2. Expression was induced by addition of isopropylβ-d-1-thiogalactopyranoside (IPTG) and allowed to continue for 5 h at 20 °C. Bacteria were then pelleted and frozen at − 80 °C.
Diffraction data were reduced with DIALS [72]. Molecular replacement, refinement, and the generation of simulated annealing omit maps were carried out in Phenix [73]. The structure was solved by molecular replacement using 1K6Y as a search model. The asymmetric unit contained four monomers (each containing a Zn 2+ , K + , and phosphate ion) and 226 waters. The structure was refined to a R and R free of 22.5% and 25.3%, respectively. Molecular models were visualised with Pymol [74] and secondary structure was analyzed with Define Secondary Structure of Proteins (DSSP) [75,76].

Integrase 3′-processing assay
The 3′-processing assay was adapted from those described previously [77,78]. HIV integrase at 60 μM in 20 mM HEPES-NaOH pH 7.5, 1 M NaCl, 7 mM CHAPS, 10 mM DTT, and 10 μM Zn(OAc) 2 was diluted to a final assay concentration of 400 nM with 20 mM HEPES-NaOH pH 7.5, 100 nM Alexafluor 488-labeled LTR substrate, 50 mM NaCl, 10 mM MgCl 2 or MnCl 2 , 10 μM Zn(OAc) 2 , and 10 mM DTT. Reactions were incubated at 37 °C. SDS was added to a final concentration of 0.25% to stop the reaction and liberate cleaved dinucleotide. After 15 min, fluorescence polarization was analyzed with a plate reader (Victor 3V, Perkin Elmer). Significance was evaluated by two-way ANOVA with P values reported from Tukey's multiple comparisons test. Data analysis was carried out in Prism (GraphPad).
HIV integrase at 60 μM in 20 mM HEPES-NaOH pH 7.5, 1 M NaCl, 7 mM CHAPS, 10 mM DTT, and 10 μM Zn(OAc) 2 was diluted to a final assay concentration of 3 μM with 20 mM HEPES-NaOH pH 7.5, 0.5 μM Alexafluor 488-labeled LTR substrate, 0.5 μM LEDGF IBD, 50-250 mM NaCl, 10 mM MgCl 2 or MnCl 2 , and 10 μM Zn(OAc) 2 . Final assay conditions were identical for IN 1F, IN GSH, and IN MF. Processed U5 LTR substrates with a 5′ Alexafluor 488 N-hydroxysuccinimide (NHS) ester label were prepared by annealing the following oligonucleotides (Integrated DNA Technologies): After 30 min at 37 °C, 15 nM pUC19 plasmid was added. Reactions were carried out for 1-4 h at 37 °C, then quenched using 0.5% SDS, 15 mM EDTA, and 1 mg/mL proteinase K for 30 min at 37 °C. Reaction products were separated on 1.5% agarose gels in Trisacetate buffer and imaged using a Typhoon (Amersham) imager. Gels were then stained with ethidium bromide and imaged using a Gel Doc (Bio-Rad) imager. Reaction products were quantified by ImageJ and data analysis was carried out in Prism (GraphPad). Significance was evaluated by two-way ANOVA with P values reported from Tukey's multiple comparisons test. Dose-response curve fits were performed in Prism (GraphPad) using a three-parameter logistic regression with the Hill slope fixed at − 1. The integrase inhibitor raltegravir was a gift from Merck.

Aggregation assay for ALLINIs
Assays were performed as previously described [61,63] with some modification. Final reaction conditions were 20 mM HEPES-NaOH pH 7.5, 15 μM IN, 250-1000 mM NaCl, 7 mM CHAPS, and 30 μM ALLINI. The ALLINIs BI-224436, BI-D, and CX04328 (HIV-1 integrase inhibitor 2) were purchased from MedChemExpress and resuspended in DMSO. Turbidity was measured after 20 min as the absorbance of the reaction solution at 405 nm in a plate reader (Victor 3V, Perkin Elmer). Significance was evaluated by two-way ANOVA with P values reported from Tukey's multiple comparisons test.

Sedimentation velocity analytical ultracentrifugation (SV-AUC)
SV-AUC experiments were performed at 25 °C with an XL-A analytical ultracentrifuge (Beckman-Coulter) and a TiAn60 rotor with two-channel charcoal-filled epon centerpieces and quartz windows. Experiments were performed in 20 mM HEPES-NaOH pH 7.5, 1 M NaCl, 7 mM CHAPS, 10 μM ZnOAc 2 , and 10 μM β-mercaptoethanol. Complete sedimentation velocity profiles were collected every 30 s for 200 boundaries at 40,000 rpm. Data were fit using the c(s) distribution model of the Lamm equation as implemented in the program SEDFIT [81]. After optimizing meniscus position and fitting limits, the sedimentation coefficients and best-fit frictional ratio (f/f 0 ) were determined by iterative least squares analysis. Sedimentation coefficients were corrected to s 20,w based on the calculated solvent density (ρ) and viscosity (η) derived from chemical composition by the program SEDNTERP [82].

Sedimentation equilibrium analytical ultracentrifugation (SE-AUC)
SE-AUC experiments were performed with an XL-A analytical ultracentrifuge (Beckman-Coulter) and a TiAn60 rotor with two-channel charcoal-filled epon centerpieces and quartz windows. Data were collected at 4 °C with detection at 280 nm at multiple concentrations in 20 mM HEPES-NaOH pH 7.5, 1 M NaCl, 7 mM CHAPS, 10 μM ZnOAc 2 , and 10 μM β-mercaptoethanol. Analyses were carried out using global fits to data acquired at multiple speeds for each concentration with strict mass conservation using the program SEDPHAT [83]. Error estimates for equilibrium constants were determined from a 1000-iteration Monte Carlo simulation. The partial specific volume ( v ), solvent density (ρ), and viscosity (η) were derived from chemical composition by SEDNTERP [82]. SE-AUC data are summarized in Table 2.

Cloning and purification of HIV-1 integrase with a native N-terminus
To determine the biochemical and structural properties of HIV-1 IN with a phenylalanine at the N-terminus, we cloned NL4-3 IN into an expression vector containing an N-terminal His7-FLAG-SUMO tag immediately preceding F1. The SUMO protease Ulp1 cleaves at a G-G-/-X motif (with the cleavage site indicated by/, with X being any residue except proline) [84]. This allows for purification of wild-type IN with a native N-terminus ("IN 1F") by Ulp1 cleavage at the sequence G-G-/-F (Additional file 1: Figure S1). To compare to IN with a non-native N-terminus, we inserted additional N-terminal residues preceding F1. IN GSH contains the three residues (G-S-H) that remain after thrombin cleavage, as used to determine the structure of IN GSH NTD-CCD (PDB: 1K6Y) [66], and IN MF contains an N-terminal methionine found in constructs commonly used for bacterial overexpression [61,63,64]. A nickel-affinity step captures Ulp1 and the cleaved affinity tag and subsequent size-exclusion chromatography yields a highly pure final product (Additional file 1: Figure S1).

Crystallization of an IN 1F NTD-CCD derivative
To investigate structural differences between IN 1F and IN GSH, we created an IN 1F NTD-CCD construct containing the same solubility-enhancing substitutions (W131D, F139D, and F185K) used to determine the structure of IN GSH NTD-CCD [66]. Affinity purification, Ulp1 cleavage, and size-exclusion chromatography yielded a highly pure final product (Additional file 1: Figure S1) that readily crystallized as described previously [66]. The structure was solved by molecular replacement, using the existing NTD-CCD structure (PDB: 1K6Y) as a search model. Four copies of both the NTD and the CCD were present in the asymmetric unit (Fig. 1a), with the inter-domain linker (residues 47-55) unresolved in the electron density. In the structure of IN GSH NTD-CCD , each NTD is assigned to a "distal" position relative to the CCD (Additional file 2: Figure S2). However, in the crystal structure of the HIV-2 IN NTD-CCD complexed with the lens epithelium derived growth factor (LEDGF) integrase binding domain (IBD) (PDB: 3F9K) [85], the interdomain linker is well-defined in the electron density, placing the NTDs in a "proximal" position relative to the CCD (Additional file 2: Figure S2). This is also the favored position for the NTDs in small angle X-ray scattering (SAXS) analysis of IN NTD-CCD coexpressed with the LEDGF IBD [64]. In the IN 1F NTD-CCD structure, the unresolved 10-residue linker would be long enough to span the unobstructed distance of 28.7-31.8 Å to position the NTDs in a "proximal" position. We have therefore defined the NTDs in the "proximal" orientation relative to the CCDs, as observed in the HIV-2 IN NTD-CCD structure (Fig. 1a). Crystallographic statistics are summarized in Table 1.

Structure of the IN 1F NTD-CCD construct
The overall structure of IN 1F NTD-CCD is highly similar to IN GSH NTD-CCD (global RMSD: 0.90 Å). A phosphate ion is found near the active site of each CCD. Each copy of the NTD folds into a 3-helix motif coordinating a Zn 2+ ion with residues H12, H16, C40, and C43. A potassium ion is coordinated by the carbonyl oxygens of V37, A38, C40, C43. Close inspection of the N-terminus reveals differences between IN 1F and IN GSH (Fig. 1b). In the asymmetric unit of IN 1F, two secondary structures are observed at the N-terminus. The ɑ1 helix in chains A and C begins as a hydrogen-bonded turn at the backbone carbonyl of F1, while in chains B and D, a canonical alpha helix begins at the backbone carbonyl of F1 (Fig. 1b, Additional file 3: Figure   S3). In IN GSH, the ɑ1 helix does not begin until D3 due to a shift in the L2 side chain by ~ 10 Å, accompanied by a ~ 4.6 Å displacement of the peptide backbone at L2 (Fig. 1b, c, Additional file 3: Figure S3). F1 is in a similar position in IN 1F and IN GSH, where it caps a hydrophobic core in the NTD made up of I5, L28, P29, and V32. The N-terminal amino group also differs between these two structures due to the peptide backbone displacement at L2. In IN GSH, the N-terminal amino group is oriented toward the C-terminal end of the ɑ2 helix, whereas in IN 1F, it is flipped ~ 180° and oriented toward the ɑ3 helix of a neighboring NTD. The same NTD-NTD interface is observed in dodecameric HIV-1, hexadecameric MVV, and SIV intasome structures [22][23][24][25], and the NTDs modeled at this position adopt extended ɑ1 helical structures in four of six structures (Additional file 3: Figure S3). The NTDs that do not form an NTD-NTD interface show a variety of structures: disordered, partially unstructured, and extended (Additional file 3: Figure S3). Difference maps and simulated annealing omit maps calculated around the N-terminus of each protomer of the IN 1F NTD-CCD structure confirmed the observed differences between the N-termini of IN 1F and IN GSH (Fig. 1d, e).

Activity of IN 1F in vitro
IN carries out two catalytic functions, 3′-processing and strand transfer, which can be replicated in vitro using fluorescently-labeled oligonucleotides that mimic the viral long terminal repeat (LTR). To assay 3′-processing, we used a 3′-fluorescently-labeled doublestranded oligonucleotide mimicking the viral LTR to monitor release of the terminal dinucleotide (5′-GT-3′) using fluorescence polarization [77]. The unprocessed oligonucleotide emits highly polarized fluorescence. Upon cleavage by IN, the released dinucleotide emits fluorescence with low polarization. In the presence of Mg 2+ and Mn 2+ , IN 1F, IN GSH, and IN MF showed similar 3′-processing activities (Fig. 2).
To assay strand transfer activity, we used 5′-fluorescently-labeled oligonucleotides mimicking the viral LTR and a supercoiled plasmid mimicking nucleosomal DNA (Fig. 3a). Concerted integration of two viral LTRs by IN results in linearization of the supercoiled plasmid and incorporation of the fluorescent label. Strandtransfer activity in the presence of Mg 2+ and Mn 2+ was influenced by NaCl concentration, with the highest level of concerted integration occurring at 150 mM NaCl in the presence of Mg 2+ and 200-250 mM NaCl in the presence of Mn 2+ (Additional file 4: Figure S4). In identical assay conditions, IN 1F showed superior concerted integration activity, resulting in the formation of 2 LTR coupled products, as compared to IN GSH and IN MF at all time points measured (Fig. 3b, c). This difference was observed in the presence of either Mg 2+ or Mn 2+ .
A partial reaction, the integration of a single LTR oligo, results in relaxation of the supercoiled plasmid and incorporation of the fluorescent label. Quantification of the fluorescently tagged, relaxed-circular plasmid indicates single-ended integration activity. Single-end activity, resulting in the formation of tagged circle products, was not improved by IN 1F as compared to IN GSH or IN MF (Fig. 3c).
Treatment with the strand transfer inhibitor raltegravir more potently inhibited both the single-strand and concerted integration activity of IN GSH and IN MF as compared to IN 1F (Fig. 3d, Additional file 5: Figure S5). The IC 50

Response of IN 1F and IN MF to ALLINIs
The allosteric inhibitors of integrase (ALLINIs) [86][87][88] (Fig. 4). ALLINI-induced aggregation is NaCl-dependent, so we tested aggregation at NaCl concentrations from 250 mM to 1 M. At 1 M NaCl, no aggregation was observed by the ALLINIs BI-224436 [87], BI-D [90], or CX04328 (Compound 6 from Christ et al. [86]). At NaCl concentrations where ALLINI-induced aggregation was observed, ALLINIs induced equal or greater aggregation of IN 1F as   (Fig. 5a). Sedimentation velocity analytical ultracentrifugation (SV-AUC) experiments performed at similar concentrations and temperatures confirmed the presence of monomers and dimers with the presence of two discrete species at ~ 2.8 S and ~ 4 S, respectively (Fig. 5b). Sedimentation equilibrium analytical ultracentrifugation (SE-AUC) analysis at 4 °C and similar concentrations also confirmed the presence of monomers and dimers, and global fitting of a monomer-dimer equilibrium yielded a K d of 60 ± 6 µM and 127 ± 23 µM for IN 1F and IN MF, respectively (Fig. 5c, Table 2). An attempt to fit a dimertetramer equilibrium could only be accomplished with a K d > 1 mM. Therefore, no evidence of tetramers was observed by three biophysical methods, in contrast to prior studies performed under similar conditions with IN expression constructs with an N-terminal methionine [61,63,64], N-terminal thrombin [42], or human rhinovirus 3C protease [13,85] cleavage sequences. The monomer-dimer behavior of IN 1F in solution also differs from IN with an N-terminal methionine, C-terminal intein cleavage site, and the solubility-enhancing substitution F185H, which exists as a mixture of dimers and tetramers in solution [61,64] and is replication-competent in virus [91]. Introduction of the F185H substitution into IN 1F (IN 1F F185H ) resulted in the formation of dimers and a spectrum of higher-order aggregates in solution, as determined by SEC-MALS (Additional file 6: Figure S6). IN 1F F185H retained similar 3′-processing activity as compared to IN 1F, but showed a significant decrease in single strand and concerted strand transfer activity (Additional file 6: Figure S6), indicating the effect of the oligomeric state of IN on strand transfer activity.

Discussion
In this paper, we report the construction and purification of IN with a native N-terminus (IN 1F). The crystal structure of IN 1F NTD-CCD reveals an extended ɑ1 helix starting with F1, as compared to IN GSH NTD-CCD with a shortened helix. Despite the remainder of the structure showing little to no difference, this change in the N-terminus is sufficient to improve concerted integration activity. In contrast, the 3′-processing and single strand integration activities were not affected. We also observed a change in sensitivity to IN-targeting antiretroviral drugs. IN 1F was less sensitive to the STI raltegravir and more sensitive to ALLINI-induced aggregation. We suggest that IN 1F will be useful in studies of IN function and response to inhibitors in the future.  The zinc finger fold of the NTD is shared with other DNA-binding proteins [92][93][94], with residues homologous to positions 1-3 in IN located adjacent to the phosphate backbone of DNA. In retroviral intasomes, the NTD binds to the distal viral DNA ends [19, 23-26, 39, 40]. However, unlike other helix-turn-helix binding proteins, the NTD does not insert a helix into the major groove of DNA, and F1 is distant from the phosphate backbone. The effect of the N-terminal disruption in IN GSH and IN MF is unclear, because the change is not expected to disrupt a tetrameric intasome. In the hexadecameric maedi-visna virus intasome, however, two pairs of NTDs are closely oriented head-to-head [25], forming a nearly identical NTD-NTD interface as that observed in the structure of IN 1F and IN GSH NTD-CCD [66]. This hydrophobic dimerization interface would involve significant contributions from F1, in contrast to the dimerization interface of the isolated NTD which mainly involves the ɑ3 helix [3]. Additional N-terminal residues, such as the N-terminal Sso7d-IN fusion, could induce a steric clash [24]. It is possible that such a disruption explains the presence of heterogeneous, poorly resolved higherorder intasomes reported in the cryo-EM studies of HIV-1 Sso7d-IN intasomes [23,24]. Additionally, disruption of the ɑ1 helix could affect binding to LEDGF, as the NTD cooperates with the CCD in binding LEDGF [13,85].  [61], and it is not immediately clear how addition of N-terminal residues affects this process. Previously, we have shown that the NTD is dispensable for ALLINI-induced aggregation [63], although others have reported that constructs lacking the NTD are resistant to ALLINI-induced aggregation [95], suggesting that the NTD plays a role in modulating ALLINI-induced aggregation. In multiple structures [13,66,85], the NTD interacts with the CCD in a manner expected to clash with the CCD-CTD interactions observed in the ALLINI-induced IN polymer. An effect on competition between the NTD and CTD for CCD binding may explain the difference in ALLINI potency between IN 1F, IN GSH, and IN MF. Recently, IN tetramers have been implicated as the preferred target of ALLINIs [95], but we show that IN 1F, which is a mixture of monomers and dimers in solution, is aggregated by ALLINIs. However, aggregation is NaCl-dependent, and we have not determined the oligomeric state of IN 1F at lower NaCl concentrations. IN GSH remains soluble at NaCl concentrations that lead to aggregation of IN 1F in the absence of ALLINI, demonstrating that additional N-terminal residues can improve solubility. This is consistent with the observation of improved solubility of Sso7d-IN [51] and PFV IN, which harbors an N-terminal extension domain [26,41]. Additional experiments are needed to determine the details of ALLINI-induced polymer initiation and propagation.
Wild type IN 1F is a mixture of monomers and dimers in solution, which differs from previously reported IN preparations containing substitutions at F185 or additional N-terminal residues which are a mixture of dimers and tetramers [13,43,61,64]. We found that the substitution F185H in the IN 1F background resulted in the formation of higher-order species in solution. NTD-CCD interactions between residues such as E11 and K186 have been shown to be important for tetramerization [13,95],  Table 2 provides the association properties derived from this analysis and we have now shown that modification of the adjacent residue F185 affects oligomerization in the context of a native N-terminus. Notably, the construct used to solve the only HIV IN crystal structure with a naturally-occurring F185, HIV-2 IN NTD-CCD co-expressed with LEDGF IBD, was dimeric in solution [85]. In this structure, the interdomain linker is clearly resolved in the electron density, showing that the NTD contacts the CCD in a "proximal" orientation. This is in contrast to the IN GSH NTD-CCD structure (PDB: 1K6Y) where the interdomain linker is not resolved, short interdomain linkers are assigned, and each NTD is in a "distal" orientation [66]. The interdomain linker is not resolved in our IN 1F NTD-CCD structure, but we favor longer interdomain linkers, positioning each NTD in a "proximal" orientation, as this is the orientation observed in the HIV-2 IN NTD-CCD -LEDGF co-crystal structure [85]. Additional work is needed to understand the effect of substitutions at F185 and K186 on NTD-CCD interactions in dimeric forms of IN.

Conclusions
HIV IN containing a native N-terminus adopts a distinct structural configuration, shows improved activity in vitro, and manifests altered sensitivity to inhibitors. Because it mimics the form of IN produced by proteolytic cleavage in the maturing virion, IN 1F provides an improved reagent for the study of IN activity in vitro and for use in antiviral drug development.