Characterization of the glutathione S‐transferase genes in the sand flies Phlebotomus papatasi and Lutzomyia longipalpis shows expansion of the novel glutathione S‐transferase xi (X) class

Abstract Leishmaniasis control often relies upon insecticidal control of phlebotomine sandfly vector populations. Such methods are vulnerable to the evolution of insecticide resistance via a range of molecular mechanisms. There is evidence that two major resistance mechanisms, target site insensitivity and metabolic resistance, have evolved in some sandfly populations and further genetic characterization of resistance would be useful to understand and combat it. To facilitate the study of the mechanisms of metabolic resistance, here we improved the annotation and characterized a major detoxification gene family, the glutathione‐s‐transferases (GST), in the genomes of two sand fly species: Phlebotomus papatasi and Lutzomyia longipalpis. The compositions of the GST gene family differ markedly from those of Aedes and Anopheles mosquitoes. Most strikingly, the xi (X) class of GSTs appears to have expanded in both sand fly genomes. Our results provide a basis for further studies of metabolic resistance mechanisms in these important disease vector species.


INTRODUCTION
The Leishmaniases, a group of neglected parasitic diseases caused by Leishmania parasites, are transmitted by a number of species of phlebotomine sand fly (Burza et al., 2018). Leishmaniases are neglected diseases, often associated with poverty, that collectively exert a large disease burden in over 100 countries. Disease control methods differ in different contexts, though a major pillar of Leishmaniasis control relies upon control of its sand fly vectors (Wilson et al., 2020). However, insecticidal control methods are vulnerable to the evolution of insecticide resistance, so understanding the mechanisms of resistance is important for identifying, tracking and tackling it.
Insecticide resistance can evolve in a number of ways. One is via mutations in the genes encoding the insecticides' target proteins that render the insecticides ineffective (so-called 'target site resistance').
Another is via increased metabolism and removal of the insecticide ('metabolic resistance'). Insecticides of the pyrethroid class (used in impregnated bednets and house spraying) and the organochlorine class (DDT, used in house spraying) both target the voltage-gated sodium channel (VGSC) on the surface of neurons, while carbamate and organophosphate class insecticides target the acetylcholinesterase enzyme. Metabolic resistance is commonly due to overexpression of members of large multi-gene families encoding detoxification enzymes such as cytochrome P450 monooxygenases (CYP), glutathione S-transferases (GST) and carboxylesterases. Both target site and metabolic resistance are widespread in insects (Ffrench-Constant, 2013) and threaten disease control programmes (Hemingway, 2018).
GSTs play roles in a number of endogenous processes such as hormone biosynthesis and intracellular transport, and they are critical in the detoxification of xenobiotic compounds (Ketterman et al., 2011).
GSTs can confer resistance by direct mechanismsmetabolizing and sequestering toxic compoundsand indirect mechanismsprotecting against oxidative stress caused by exposure to the insecticide (Ketterman et al., 2011;Pavlidi et al., 2018). The GSTs form a superfamily, with different GST classes categorized based on sequence similarity. Two broad groups of GSTsmicrosomal and cytosolicare found in insects. Among the cytosolic GSTs, classes omega (GSTO), sigma (GSTS), theta (GSTT), zeta (GSTZ) are found ubiquitously in animals. In addition, in insects and some other arthropods, the GST classes delta (GSTD) and epsilon (GSTE) are found (Enayati et al., 2005;Ketterman et al., 2011) and are often expanded to outnumber the other GST classes (Ranson et al., 2002). Two additional classes, xi (GSTX) and iota (GSTI), were identified (with 2 and 1 members, respectively) and suggested to be specific to Aedes and Anopheles mosquitoes (Lumjuan et al., 2007). Insecticide resistance has most commonly been associated with the insect-specific, cytosolic GSTD and GSTE classes (Enayati et al., 2005). Multiple lines of evidence have confirmed certain GSTs, such as the epsilon class GSTE2 in Anopheles and Aedes mosquitoes, as major drivers of insecticide resistance. Elevated expression of GSTE2 is seen in DDT-resistant populations of Anopheles funestus (Riveron et al., 2014;Weedall et al., 2019;Kouamo et al., 2021), Anopheles gambiae (Ranson et al., 2001) and Aedes aegypti (Lumjuan et al., 2005).
Population genetic analyses show GSTE loci evolving under strong directional selection in An. funestus (Weedall et al., 2020) and An.
Though little-studied compared with aedine and anopheline mosquitoes, insecticide resistance has been reported in sand fly populations subjected to long-term insecticide exposure. In India, Bangladesh and Nepal, where P. papatasi and P. argentipes transmit cutaneous and visceral Leishmaniasis, respectively, and DDT has been used since the 1950s in disease control programmes, DDT resistance has been reported in P. papatasi populations, reviewed by Dhiman & Yadav (2016). Target site mutations in the VGSC have been detected in P. argentipes in India (Dhiman & Yadav, 2016;Gomes et al., 2017;Sardar et al., 2018) and Sri Lanka (Pathirage et al., 2020), where elevated GST and esterase activity was also reported. Target site mutations in the VGSC have also been reported in a Phlebotomus papatasi population in Turkey (Fotakis et al., 2018). Reduced mortality to deltamethrin and permethrin was seen in sand flies in a region of Turkey with long-term exposure but full susceptibility in another region without exposure (Karakus et al., 2017). Similarly in Sudan, P. papatasi in regions of historical insecticide exposure showed T A B L E 1 Numbers of GST gene families identified in the two sand fly genomes compared with two mosquito species and the fruit fly D. melanogaster GST  One putative gene made by joining two annotated partial genes on different scaffolds. g Gene AAEL023181, on an unlocalized scaffold, is identical to AAEL010157 and may be allelic rather than paralogous.
To understand the evolution and molecular mechanisms of metabolic resistance in sand flies, it is important to first characterize the major detoxification enzyme families. An important first step in this effort is manual editing and curation of the annotation of these gene F I G U R E 2 GSTD gene clusters on D. melanogaster 3R (Dm3R), Ae. aegypti chromosome 1 (Aa1) and An. gambiae 2R (Ag2R), compared with P. papatasi (Pp) and L. longipalpis (Ll) GSTD genes. GSTD genes are shown in blue (negative strand) and red (positive strand), with their numbers shown above the genes. Approximate positions of the clusters on each chromosome arm are shown. Light grey arrows show non-GST flanking genes with no clear orthology to other flanking genes. Dark grey arrows indicate flanking genes with orthology to those in other species. Cyp = cytochrome P450 monooxygenase F I G U R E 1 GST complements of the sand flies, compared Anopheles gambiae, Aedes aegypti and Drosophila melanogaster. Stacked bar charts of counts of genes belonging to each of the GST classes. The most striking contrast between the sand flies and the other species is the absence of the GSTE class (orange) and the expansion of the GSTX class (pale green) families in genome assemblies (Weedall et al., 2015). Here, we defined one of these gene families, the glutathione s-transferases (GST), in the P. papatasi and L. longipalpis genomes and compared its composition with that seen in other disease vector species. The composition of the GST gene family is very different in the sand flies, with a major expansion of the insect-specific GSTX subfamily, which may influence resistance evolution in these species.

RESULTS
Glutathione S-Transferase (GST) proteins from the well-annotated mosquitoes Aedes aegypti and Anopheles gambiae were used to search the predicted proteomes of P. papatasi and L. longipalpis using BLASTp. Putative GST proteins were carefully inspected and compared with annotated GSTs to assess and manually edit their gene models and produce an improved annotation and an inventory of GST genes in each of the sand fly genomes. This resulted in 23-25 GST genes annotated in the P. papatasi genome and 44-47 in the L. longipalpis genome. Table 1 and Figure 1 summarize these results.
Full details of the gene edits and gene-wise information are shown in Table S1 (for P. papatasi), Table S2 (for L. longipalpis) and Appendix S1 (gene edits in both genomes).
The most striking differences between the sand fly genomes and those of fellow nematocera Aedes aegypti and Anopheles gambiae mosquitoes and the more distantly-related brachyceran Drosophila melanogaster fruit flies are the apparent complete absence of the epsilon class of GSTs from both sand fly genomes, the small number of delta GSTs in P. papatasi (though not L. longipalpis) and an apparent expansion of the GSTX class (absent from Drosophila melanogaster and present in only two copies in each mosquito species genome). The large difference between the two sandfly species in the overall number of GST genes, almost twice as many in L. longipalpis than in P. papatasi, is mostly due to the greater number of delta-and xi-class genes in L. longipalpis (Table 1, Figure 1). Each GST class is described in more detail below.
F I G U R E 3 GSTX gene clusters in the P. papatasi (Pp) and L. longipalpis (Ll) genomes. Species and scaffold ID or chromosome are shown on the left (L. longipalpis = Ll; P. papatasi = Pp; Ae. aegypti = Aa; An. gambiae = Ag). GSTX genes are shown in blue (negative strand) and red (positive strand). For the sand flies, the gene code (prior to editing) is shown below each gene (for presentation purposes, the beginning of each is not shown: PPAI for P. papatasi and LLOJ for L. longipalpis) and a single letter above to distinguish genes (e.g., 010531 refers to gene LLOJ010531, which was edited to form two genes: LLOJ010531_a and LLOJ010531_b). PPAI010871_e is shown in pink as a complete in-frame coding sequence could not be reconstructed for this gene. Light grey arrows show non-GST flanking genes with no clear orthology to other flanking genes. Dark grey arrows indicate flanking genes with orthology to those in other species. For example, orthologous RNA helicase and tyrosine transporter genes downstream of major GSTX clusters in the sand fly genomes indicate they occur in orthologous genomic regions (though the orientation and order of GSTX genes indicates gain and loss of genes in the cluster) GST delta, epsilon and xi The GST complements of D. melanogaster, Aedes aegypti and Anopheles gambiae are dominated by expanded GSTD and GSTE classes. These insect-specific classes tend to occur in gene clusters (Ranson et al., 2002) and members of these classes are commonly associated with insecticide resistance (Han et al., 2016;Hu et al., 2014;Lu et al., 2016;Kouamo et al., 2021;Lumjuan et al., 2005Lumjuan et al., , 2011Mitchell et al., 2014;Ranson et al., 2001;Riveron et al., 2014;Zhang et al., 2016).
No putative GSTE genes were identified in either the P. papatasi or the L. longipalpis genome. The best matches based on reciprocal BLASTp searches using GSTE proteins as queries were to proteins defined as GSTD and GSTX.
Six genes (PPAI001211, PPAI006595, LLOJ007285, LLOJ004462, LLOJ004461, LLOJ004460) were identified as putative GSTD. Exon numbers and predicted protein sizes varied greatly among these genes and the gene models were carefully inspected. For P. papatasi, the PPAI001211 gene model was left unchanged and PPAI006595 edited F I G U R E 4 Phylogeny of GST classes delta, epsilon and xi. Neighbour-joining phylogeny of GSTD, E and X proteins from P. papatasi (Pp), L. longipalpis (Ll), An. gambiae (Ag), Ae. aegypti (Aa) and D. melanogaster (Dm). Branches leading to the GSTDs are shown in green, to GSTEs in blue and to GSTXs in red. Protein isoforms are indicated by -RA, -RB and so forth (or a transcript ID for Drosophila). Gene IDs are included to distinguish proteins with the same names. Edited gene models in the sand flies are indicated by a, b, c and so forth. The tree shown is an NJ tree where gap-containing sites were removed. Bootstrap support values for the branches basal to the GSTX, E and D classes were 75%, 60% and 46%, respectively (84%, 91% and 84% when gap-containing sites were retained). A putative GSTD from the sand fly Phlebotomus argentipes is shown in bold. Its position in the tree indicates it is a GSTX to add two exons (new gene model denoted PPAI006595_a). For L. longipalpis, LLOJ007285 and LLOJ004462 were left unchanged. One gene (LLOJ004461) was very long and clearly consisted of multiple individual genes misanotated as a single gene. This was edited to split the gene and add and extend exons into seven complete individual genes (LLOJ004461_a-LLOJ004461_g). LLOJ004460 was edited and split into two incomplete genes (LLOJ004460_a, LLOJ004460_b) spanning sequencing gaps. LLOJ004460_a had an internal gap (spanning part of the second and third of its three exons). LLOJ004460_b had a gap at its 3 0 /C-terminal end that covered part of the third of the three exons. After this extensive manual editing (Appendix S1), two GSTD genes were annotated in the P. papatasi genome and 11 were annotated in the L. longipalpis genome (Table 1, Tables S1 and S2, Figure S1). Two GST genes originally annotated as 'unclassified' in Anopheles gambiae (Ding et al., 2003) and labelled 'GSTu2' and 'GSTu3' (the 'u' here represents 'unclassified' and is not to be confused with the plant-specific tau class GSTU proteins) were later designated as members of a novel, putatively mosquito-specific class, xi (X), in An.

F I G U R E 6
Intron-exon structure of GSTI proteins. An alignment of GSTI proteins is shown, with exon-exon junctions shown in bold and highlighted in yellow (including the amino acid immediately upstream and downstream of the junction). Three partial genes (LLOJ002711_a, _b and _c) from a poorly-resolved region of the L. longipalpis genome, that were identical at the amino acid level in their overlapping regions were merged to form a single gene for the alignment (LLOJ002711_a,b,c). Unknown amino acids (due to genome assembly gaps) are represented by 'X'. AgGSTI = AGAP000947 (originally annotated as GST unclassified 1, GSTU1); AaGSTI1 = AAEL011752. In the protein identifiers: aegypti proteins were GSTX and GSTD in D. melanogaster (Tables S1 and S2). This pattern indicates that D. melanogaster lacks GSTX entirely and that GSTD is the next-best match, as the percentage identity between sand fly GSTX and DmGSTD was only around 40% compared with around 60% for sand fly GSTD versus DmGSTD (Tables S1 and S2). GST iota, omega, sigma, theta, zeta and microsomal GSTs A GST originally annotated as 'unclassified' in Anopheles gambiae (Ding et al., 2003) and labelled 'GSTu1' (the 'u' here represents 'unclassified' and is not to be confused with the plant-specific tau class GSTU proteins) was later designated as a member of a novel class, iota, in An. gambiae and Aedes aegypti (Ding et al., 2003;Lumjuan et al., 2007). In the sand fly genomes, two genes (PPAI009870, LLOJ002711) were identified as putative GSTI. For L. longipalpis, LLOJ002711 was edited to form three incomplete genes (LLOJ002711_a -LLOJ002711_c). The poor quality of the genome assembly at this region (the gene or genes crosses two sequencing gaps) mean that it is possible that these are not three distinct genes, but may be fragments of a single gene that is not fully collapsed in the assembly. This is supported by the observation that the overlapping regions are identical at the amino acid level (Appendix S1).
The best match in the D. melanogaster genome was to a 'GSTcontaining FLYWCH zinc-finger protein', gfzf (FBgn0250732) (Dai et al., 2004). The percentage identity was considerably higher to this gene (around 76%) than to the next-best matches (around 44%). Two isoforms of the gfzf gene contain large exons upstream of the GST region that contain repeated FLYWCH zinc-finger domains. One of these isoforms (from transcript FBtr0091512) was used to search the proteomes and genomes of the two sand flies in order to try to extend the gene models in those species. However, neither BLASTp nor tBLASTn searches could identify the non-GST region of the gene. We therefore annotated the genes as GSTI, as defined by Lumjuan et al. (Lumjuan et al., 2007).
The GstI gene models differed between sand flies and mosquitoes ( Figure 6). Both mosquito and sand fly GSTI genes had two exons and one intron, but the intron occurred in a different place in the mosquito and sand fly genes.
Two genes (PPAI000142, LLOJ009136) were identified as putative GSTO. Both gene models were edited (Appendix S1), to alter the boundaries of exons (LLOJ009136_a) and, for PPAI000142_a, the first half of the gene was found, unannotated, on a different scaffold (AJVK01076482.1) by a tBLASTn search. After editing, each sand fly genome contained 1 GSTO gene like An. gambiae and Aedes aegypti but unlike D. melanogaster, which has 4 (Table 1, Tables S1 and S2).
The GstO gene models were largely the same between sand flies and F I G U R E 1 0 Intron-exon structure of GSTZ proteins. An alignment of GSTZ proteins is shown, with exon-exon junctions shown in bold and highlighted in yellow (including the amino acid immediately upstream and downstream of the junction and those spanning the intron shown in red). Two P. papatasi genes (PPAI006902_a and PPAI000943_a) were merged to form one putative gene (missing a middle exon). In each case, both protein isoforms (differing at the first exon/intron) are shown. Unknown amino acids (due to genome assembly gaps) are represented by 'X'. aegypti (AAEL009017) and An. gambiae (AGAP000761). The P. papatasi gene PPAI000341 was edited (to PPAI000341_a) to alter the length of its first exon, producing a two-exon gene. However, PPAI000341_a was shorter than other GSTT genes and its 3 0 end did not align well, suggesting misannotation and a possible missing third exon (a nearby downstream gap may have prevented this being found). Downstream of this gap, a partial sequence matching the end of a GSTT was found (PPAI000341_b) and PPAI000341_a and PPAI000341_b may be a single gene, but they could not be reconstructed into a good single gene model.
Overall, four GSTT genes were identified in the L. longipalpis genome and 2-3 in the P. papatsi genome (Table 1, Tables S1 and S2).
Three genes (PPAI006902, PPAI000943, LLOJ000305) were identified as putative GSTZ. Closer inspection (Appendix S1) suggested that PPAI006902 and PPAI000943 might be the start and end of the same gene (though missing an internal exon) on different scaffolds. LLOJ000305 was a complete gene. At the beginning of each gene, evidence of alternative splicing patterns could be seen allowing two protein isoforms to be reconstructed (as seen in Aedes and Anopheles orthologues). This suggested that a single GSTZ gene, encoding at least two protein isoforms, was present in each genome.
In both sand fly genomes, the number and position of the introns differed from those seen in both mosquitoes ( Figure 10).  (Table 1, Tables S1 and S2, Appendix S1). The gene models were largely the same between sand flies and mosquitoes ( Figure 11).

DISCUSSION
We characterized the glutathione S-transferase complement in the genomes of Phlebotomus papatasi and Lutzomyia longipalpis. From the study we drew two main conclusions: (i) accurate gene annotation in these species is difficult due to the fragmentary nature of the genome assemblies; (ii) the insect-specific GST complement of these sand flies differs markedly from those of mosquitoes and of Drosophila melanogaster, with no GSTE class and an expansion of the GSTX class.
Overall, 23-25 GST genes were identified in P. papatasi and 44-47 in L. longipalpis, both within the typical range for insects but quite different from one another. It is difficult to conclude definitively if this difference in numbers is due to a real biological difference, or is an artefact of the differing quality of the two genome assemblies. The L. longipalpis assembly (11,532 scaffolds; N50 = 85,093) is much more contiguous than that of P. papatasi (106,826 scaffolds; N50 = 27,956), which may affect the number of GST genes identified in each genome. Genes are more likely to go unannotated in, or to be missing from more fragmented assemblies, and during manual editing of the GST gene annotations, we saw a number of cases of genes running into unsequenced assembly gaps in scaffolds and some cases of genes split across two scaffolds. These factors all affect the quality of genome annotation, so our estimates of gene numbers are necessarily cautious ones. With this in mind, we did try to find additional unannotated GST genes (using tBLASTn searching of the genome assemblies), which did identify some partial genes, though not large numbers. We did not, for example, find any unannotated GSTD genes in P. papatasi in addition to the two we annotated, though it remains a possibility that they exist on parts of the genome not represented in the assembly. Nor did we find any evidence for GSTE genes in either genome, lending weight to the conclusion that this class has been lost from the sand fly lineage.
Extensive manual editing and curation of the annotation of key gene families is often required in draft genome assemblies and is an important prerequisite for further analysis such RNAseq-based gene expression profiling (Weedall et al., 2015). Here, the value of manual curation of the gene models and the draft status of the existing annotation are illustrated by the amount of editing that was required. In GSTEs play key roles in resistance to organochlorine, pyrethroid and organophosphate class insecticides in disease vector species including mosquitoes of the genera Anopheles and Aedes (Ranson et al., 2001;Lumjuan et al., 2005Lumjuan et al., , 2011Mitchell et al., 2014;Wilding et al., 2015;Kouamo et al., 2021;Riveron et al., 2014) and in agricultural pest species including the oriental fruit fly Bactrocera dorsalis (Hu et al., 2014;Lu et al., 2016), the Colorado potato beetle Leptinotarsa decemlineata (Han et al., 2016) and the tobacco cutworm moth Spodoptera litura (Zhang et al., 2016). GSTDs (two genes in P. papatsi and 11 in L. longipalpis) are also associated with resistance in many species, including B. dorsalis (Hu et al., 2014;Lu et al., 2016), L. decemlineata (Han et al., 2016), the diamondback moth Plutella xylostella (You et al., 2015), the red spider mite Tetranychus urticae (Pavlidi et al., 2015), the fungus gnat Bradysia odoriphaga (Tang et al., 2019;Tchouakui et al., 2019) and the mosquitoes Culex pipiens (Xu et al., 2017) and An. gambiae (Pavlidi et al., 2015;Isaacs et al., 2018). Given the importance of these GST classes, the apparent absence of GSTE from the genomes of the two sand fly species may affect their ability to evolve resistance to these insecticide classes, as might the apparent reduction in the size of the GSTD class in P. papatasi.
The fact that the GSTX class is expanded in both sand fly species suggests an adaptive role in sand fly evolution, yet their precise roles are not known, nor whether they may be involved in the detoxification of insecticides. It is interesting to speculate on whether GSTX may perform in the sand flies the roles performed by the related GSTE and GSTD classes in mosquitoes and other insects. The roles of the two GSTX genes (GSTX1 and GSTX2) of Aedes aegypti and Anopheles gambiae in metabolic insecticide resistance are unclear (Grant and Hammock, 1992;Lumjuan et al., 2007). While overexpression of GSTX2 (called GST-2 in that study) in Ae. aegypti from South America was associated with DDT resistance (Grant and Hammock, 1992), a subsequent study found no catalytic activity of GSTX2 on DDT (Lumjuan et al., 2007). That study did, however, identify hematin binding activity in GSTX2, not previously seen in other GSTs, and the authors suggested a possible protective role of GSTX in blood-feeding insects, such as the reduction of heme toxicity after a blood meal.
A recent study on the sand fly vector of visceral Leishmaniasis, Phlebotomus argentipes, from Bihar, India reported the amplification, cloning and enzymatic characterization of a GST reported to be a GSTD . P. argentipes populations have evolved DDT resistance in this region  and target site resistance mutations have been reported (Gomes et al., 2017) but no metabolic resistance mechanisms. Importantly in this regard, Hassan et al.  showed that this protein could metabolize DDT in vitro, making it a key metabolic resistance candidate. Based on the GST family characterization work we report here, we found the best matches to this protein in P. papatasi and L. longipalpis are the class we have defined here as GSTX. Therefore, this would indicate that a P. argentipes GSTX can metabolize an insecticide, making proteins of this novel GST class key candidates for further study of metabolic resistance in sand flies. Further work is needed to functionally characterize the sand fly GSTX genes. This would include the full range of complementary approaches applied to the GST genes of other species, including in vitro characterization of enzyme activity (Lumjuan et al., 2005;Riveron et al., 2014;Mitchell et al., 2014), in vivo heterologous protein expression in a model organism (Riveron et al., 2014;Mitchell et al., 2014), gene silencing in vivo (Kouamo et al., 2021;Lumjuan et al., 2011), protein structure analysis and 3D modelling (Wang et al., 2008;Riveron et al., 2014;Mitchell et al., 2014) and gene expression and population genetic analyses (Riveron et al., 2014;Weedall et al., 2019;Kouamo et al., 2021;Ranson et al., 2001;Lumjuan et al., 2005;Weedall et al., 2020; Anopheles gambiae 1000 Genomes Consortium, 2017). Together, these would allow us to further elucidate the biological roles of the novel GSTX class and its potential to confer insecticide resistance in sand flies.
The work presented here, alongside ongoing work to define the cytochrome P450 monooxygenase and carboxylesterase families in these genomes (in preparation), will provide a basis for further study of metabolic resistance mechanisms in these important vectors of Leishmaniasis.

Experimental procedures
Two well-annotated mosquito species, Aedes aegypti (Liverpool strain) (Matthews et al., 2018) and Anopheles gambiae (PEST strain) (Holt et al., 2002), were used to generate initial GST protein lists to query the predicted proteomes of two sand fly species, Phlebotomus papatasi (Israel strain) and Lutzomyia longipalpis (Jacobina strain), in VectorBase (Megy et al., 2012) (https://www.vectorbase.org/). This was done using BLASTp with default parameters and search results defined initial gene family candidate lists for each sandfly species.
These lists were filtered by reciprocal BLASTp searching against Ae.
aegypti, An. gambiae and D. melanogaster predicted proteomes, retaining genes with best matches to GSTs and removing genes with best matches to other gene families.
Protein lengths and exons numbers were recorded for candidate GST genes in sand flies and for their best matches in the mosquito species and D. melanogaster. Where sand fly and mosquito homologues showed strikingly different gene length or exon number, misannotation of the sandfly gene was suspected and gene models were manually edited. This process was guided by sequence alignments of the sand fly, mosquito and fruit fly genes and proteins made using the MUSCLE algorithm (Edgar, 2004) implemented in Seaview version 4.5.4 (Gouy, Guindon and Gascuel, 2010;Petrella et al., 2015). These alignments helped identify missing or truncated exons in the sand fly genomes, which were edited and new gene models checked using the BLASTp was used to determine the best matches of these edited sandfly genes in other insect species and this information was used to assign each GST to a class. In addition to protein versus protein BLASTp searches, sand fly DNA versus non-sand fly protein BLASTx searching was used to identify unannotated sand fly exons, and nonsand fly GST protein versus sand fly genomic DNA tBLASTn searches to locate unannotated genes. All BLAST searches were carried out in vEuPathDB/Vectorbase (Megy et al., 2012) (https://www.vectorbase. org/).

SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.  Table S1. Glutathione S-Transferase genes in Phlebotomus papatasi.
The gene ID, protein length (amino acids; aa) and number of exons (protein coding exons only; exons in UTR excluded) are shown for P.
papatasi genes and their best matching genes in Aedes aegypti, Anopheles gambiae and Drosophila melanogaster. Gene IDs followed by a letter (e.g., '_a') were manually edited from the original gene model.