Inferring inter- and intrachromosomal Dobzhansky-Muller incompatibilities from imbalanced segregation of recombinant haplotypes in hybrid populations

9 Dobzhansky-Muller incompatibilities (DMIs) are a major component of reproductive isolation between 10 species. DMIs imply negative epistasis, exposed when two diverged populations hybridize. Mapping the 11 locations of DMIs has largely relied on classical genetic mapping, but these approaches are stymied by 12 low power and the challenge of identifying DMI loci on the same chromosome, because strong initial 13 linkage of parental haplotypes weakens statistical tests. Here, we propose new statistics to infer negative 14 epistasis from haplotype frequencies in hybrid populations. When two divergent populations hybridize, 15 the variance of two-locus heterozygosity decreases faster with time at DMI loci than at random pairs of 16 loci. If two populations hybridize at near-even admixture proportions, the deviation of the observed 17 variance from its expectation is negative, which enables us to detect signals of intermediate to strong 18 negative epistasis both within and between chromosomes. When the initial proportion of the two parental 19 populations is uneven, only strong DMIs can be detected with our method, unless migration reintroduces 20 haplotypes from the minor parental population. We use the two new statistics to infer candidate DMIs 21 from three hybrid populations of swordtail fish. We identify numerous new DMI candidates some of 22 which are inferred to interact with several loci within and between chromosomes. Moreover, we discuss 23 our results in the context of an expected enrichment in intrachromosomal over interchromosomal DMIs. 24


Introduction 25
Hybrids between diverged populations often suffer from a fitness disadvantage caused by genetic 26 incompatibilities between two or more loci, called (Bateson-)Dobzhansky-Muller incompatibilities, 27 (DMIs). As a result, hybrids may be less fit than their parental species, inviable, or sterile (reviewed in 28 Orr statistical power that is limited to detecting loci with strong selection against hybrids, e.g., in the case of 36 malignant melanoma that reduce viability in two pairs Xiphophorus species (reviewed in Orr and 37 Presgraves 2000; Powell et al. 2020). In another study, researchers were able to map multiple non-38 independent incompatibilities contributed to male sterility in a house mouse hybrid zone (Turner and Harr 39 2014). 40 Another commonly used approach to infer DMIs from hybrid genomes is to search for unexpected 41 statistical associations (linkage disequilibrium; LD) between physically unlinked regions of the genome. 42 Studies of DMIs that use LD signals between physically unlinked loci (interchromosomal DMIs) have 43 taken advantage of data from simulations and experiments (e.g., Arabidopsis thaliana, Simon et al. 2008, 44 and yeast, Li  Another shortcoming of LD-based statistics is that they are not well suited for detecting interacting loci 50 within chromosomes (intrachromosomal DMIs) in the absence of highly accurate recombination maps. To 51 detect intrachromosomal interactions with LD-based methods, researchers would need to know the exact 52 recombination rates between loci before they could distinguish expected LD caused by physical linkage 53 from excess LD caused by DMIs. The authors of this study are not aware of any existing method that 54 infers both inter-and intrachromosomal DMIs from hybrid genomic data. 55 Although intrachromosomal DMIs have not been well-studied in the hybrid incompatibility literature, 56 several lines of evidence suggest that they are at least as common as interchromosomal DMIs. First, from 57 a mechanistic perspective, gene order is not random in the genome. Genes tend to cluster on the same 58 chromosome when they are co-expressed, co-regulated, included in one pathway, or part of a protein-59 protein complex (reviewed in Hurst 2004). When sequencing and protein precipitation technology (high-60 C, micro-C) were combined to profile the physical interactions (e.g., interactions mediated by enhancers, 61 insulators, cohesion binding sites etc.) between DNA sequences, researchers found that most of the 62 interactions were intrachromosomal, manifesting as chromosome territories and topological associated 63 domains (reviewed in Szabo et al. 2019). This genomic architecture of gene regulation suggests that we 64 may expect DMIs within chromosomes may be more common than DMIs between chromosomes from 65 first principles. Martin et al. 2019, Chaturvedi 2020). One potential consequence is that that high recombination rates 71 disassociate incompatible loci more quickly, whereas tight linkage may facilitate speciation in the face of 72 gene flow (gene coupling, e.g., Bay et al. 2017, Byers et al. 2020; inversions, e.g., Noor et al. 2001). In 73 line with this argument, theoretical studies indicated that hybrid populations isolated from their parent 74 populations are most likely to develop into isolated species (homoploid hybrid speciation) with 75 intermediate recombination between DMI loci (Blanckaert and Bank 2018). Given the predicted 76 importance of genome organization in adaptation and speciation, tools to identify intrachromosomal 77 DMIs are needed to understand the architecture of reproductive isolation between species. 78 Here, we present a new statistical approach to close this key knowledge gap and to sensitively infer both 79 inter-and intrachromosomal DMIs. We report that the imbalanced haplotype frequencies caused by 80 selection against DMIs can lead to a unique pattern of LD and heterozygosity. We capture these patterns 81 in two statistics that we name (2) and Δ 2. We show that sensitivity and specificity of detecting DMIs 82 are high in certain demographic scenarios and apply our statistic to infer DMIs in three swordtail fish 83 hybrid populations. Based on this analysis, we find that DMIs are widespread and complex across hybrid 84 populations of swordtail fish. 85 Results 86

Reduced variance of two-locus heterozygosity indicates DMIs 87
Classical studies of multi-locus evolution revealed that the variance of multi-locus heterozygosity 88 increases with linkage disequilibrium (reviewed in Slatkin 2008). In the presence of population structure, 89 the normalized deviation of the variance of n-locus heterozygosity from its expected value, ( ), is 90 elevated (Sved 1968, Brown et al. 1980, Maynard Smith et al. 1993, see Methods). Here, we demonstrate 91 that a pairwise DMI, where n=2, changes (2) in the opposite direction (i.e., it creates a lower-than-92 expected variance of two-locus heterozygosity) and can be inferred by identifying negative (2) (i.e., the 93 two-locus version of ( )) along the genome, within a hybrid population. Importantly, this statistic 94 distinguishes LD that occurs due to physical linkage from the specific association of alleles as caused by 95 selection against a DMI. 96 To first understand the dynamics of (2) in the simplest scenario, we used a haploid toy model of a DMI 97 with recombination (Kimura 1965, see also Bank et al. 2012). Consider a haploid panmictic population 98 with two biallelic loci, A and B, with alleles a, A, and b, B, where aB and Ab represent the parental 99 haplotypes and a hybrid incompatibility exists between alleles A and B. We follow the haplotype 100 frequency dynamics in a model with discrete non-overlapping generations, ignoring genetic drift and 101 assuming that recombination occurs with probability c. We denote the fitness of the haplotypes by !" = 102 1, #" = 1 + , !$ = 1 + , #$ = (1 + )(1 + )(1 + ) ( Figure S1). Throughout the paper, we 103 assume that and are small positive numbers, representing a weak beneficial effect of A and B 104 (following Blanckaert & Bank 2018), and that − ≫ , (the incompatibility is much stronger than 105 directional selection). We use this toy model to analyze the dynamics of (2) and 2 %& (defined below) 106 upon formation of a new hybrid population composed of a fraction of aB parental individuals and a 107 fraction of 1 − of Ab parental individuals (i.e., f denotes the initial admixture proportion). 108 The toy model highlights unique features of haplotype frequency dynamics in new hybrid populations in 109 the presence of a DMI (see also Blanckaert and Bank 2018 for the corresponding diploid model).

110
Recombinant haplotypes ab and AB appear from the second generation onwards. At first, their frequency 111 sharply increases ( Figure S2) due to recombination. Then strong selection against the incompatibility sets 112 in, depleting the AB haplotype and giving the ab haplotype a marginal advantage because offspring of ab 113 individuals never harbor the incompatibility ( Figure S2). Thus, the strong incompatibility drives 114 haplotype frequency dynamics after recombinants appear. 115 The deviation of the variance of two-locus heterozygosity from its expectation under linkage equilibrium 116 , where * denotes the frequency of allele and %& 117 denotes the frequency of haplotype , and where we define { , } (see Methods). The normalized version of Δ '!( is X(2) = '!( / ( ). In a newly formed 119 panmictic hybrid population, (2) is initially positive because of the overrepresentation of parental 120 haplotypes as compared with recombinants. Importantly, (2) becomes negative if the haplotype 121 frequencies are imbalanced, for example if a lethal DMI causes absence of one recombinant haplotype 122 and overrepresentation of the other. Focusing on the two possible recombinants in the scenario described 123 above, this imbalance is captured by Δ 2 = 2 #$ − 2 !" . 124 125 Figure 1. Dynamics of (2) and Δ 2 = 2 !" − 2 #$ in the deterministic toy model for different 126 recombination rates and admixture proportions. The model parameters are = 0.001, = 0.002, = 127 −0.5. Black dots indicate the time with the largest − (2) or -Δ 2 (i.e., strongest DMI signal) in each 128 trajectory. The values of (2) and Δ 2 are represented by blue color intensity. A&B. Both (2) and 129 Δ 2 show more extreme values at intermediate recombination rates. Here, we assumed equal admixture 130 proportions. C&D. Negative (2) and negative Δ 2 show larger values when admixture proportions are 131 more even. In these two panels, the recombination rate between the two DMI loci is 0.1. 132 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made the signal is not detected until later generations when recombination rates are lower and admixture 140 proportions are more uneven. 141

Unique dynamics of − ( ) at DMI loci 142
We compared the dynamics of the (2) statistic at DMI loci in scenarios where the strength of epistasis 143 was strong ( = −0.5) relative to scenarios without epistasis (i.e., = 0). Without epistasis and with 144 weak direct selection for A and B ( = 0.001, = 0.002), both the recombinant frequencies and their 145 2 %& values are expected to be close to even and independent of the admixture proportion, because 146 recombinants are generated at equal proportions and because weak directional selection cannot drive 147 significant imbalance in the recombinant haplotypes ( Figure S3 A&B). Conversely, with a strong DMI, 148 segregation of the two recombinants is imbalanced, as is evident from the 2 %& trajectories of the two 149 recombinants ( Figure S3 C&D). Interestingly, without epistasis, (2) rapidly decreases and converges to   . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint The signal of negative (2) generated by selection against a DMI is most pronounced when the 160 admixture proportions are even. In addition, -(2) and -Δ 2 are larger at low and intermediate 161 recombination rates (Figure 1 A&B). At the same time, the lower the recombination rate between the 162 interacting loci, the longer it takes for recombinants to emerge, resulting in a later appearance of negative 163 (2) (Figure 2). In theory, this phenomenon suggests that a DMI at any possible recombination distance 164 will be eventually detectable with our approach. An exception is when two loci have vanishingly small 165 recombination probabilities, such that direct selection and genetic drift undermine the signal of the DMI 166 in (2), because the DMI is purged from the population (by losing one of the interacting partners) before 167 detectable epistasis-induced recombinant imbalance appears. In practice, variation in recombination 168 probabilities will change the time period during which a DMI can be detected. 169 Regardless of the recombination rate, our toy model suggests that detecting a DMI in an isolated hybrid 170 population using (2) requires the initial admixture proportion to be near 0.5 (Figure 1 C&D, minor 171 parental frequency 0.3-0.5). Intuitively, this is because the recombinant imbalance Δ 2 (which is ⋅ 172 ) reaches its largest value at equal admixture proportions 173 since = #$ ⋅ !" − !$ ⋅ !$ is maximal when #" = !$ = 0.5. As the recombinant imbalance 174 becomes extreme, − (2) becomes larger. 175 We were interested in evaluating whether − (2) is diagnostic of epistatic selection or sensitive to 176 selection in general. Without epistasis, two alleles A and B can also generate recombinant imbalance 177 when they are under strong selection in the same direction (by means of Hill-Robertson interference; 178 Felsenstein 1974). To evaluate the dynamics of − (2) in this scenario, we assumed that A and B are 179 under strong negative selection and combine additively, such that haplotype AB is the least fit (but 180 without epistasis). Here, recombinant imbalance results from selection differences between AB and ab 181 ( Figure S1), which result in negative (2) ( Figure S4C) and 2. However, even with strong selection 182 (2) remains negative only for a few generations at the very early stage of hybridization (before 183 generation 10, Figure S4C). With a DMI, negative (2) persists for much longer. The stronger the DMI 184 (i.e., the larger the epistasis parameter ), the larger − (2) is observed (after generation 10, Figure 2 and 185 S4A). Altogether, non-epistatic strong selection results in lower maximum values of − (2) and 2 that 186 occur -and fade -much earlier than in the presence of a DMI ( Figure S4). 187

In simulated data − ( ) robustly indicates intra-and interchromosomal DMIs 188
To demonstrate that detection of intrachromosomal DMIs in genomic data is feasible, we used 189 simulations with SLiM (Haller and Messer 2019, Figure 3). We simulated evolution of a chromosome in 190 a hybrid population of 5000 individuals that is generated by a pulse of admixture between two parental 191 populations at equal frequencies. Two interacting loci generating a DMI were located at 10 centimorgan 192 (cM) and 20 cM such that the recombination rate between the two loci was 0.1. For these parameters, the 193 deterministic trajectory indicated that a DMI should be detectable within a time window that spanned 194 from generation 25 to 509, with the strongest signal (i.e., largest − (2)) at generation 58 ( Figure 2). 195 Figure 3 shows the distribution of normalized D, a commonly used measure of linkage disequilibrium, 196 ( ', see Methods), (2) and 2 at generation 50. At generation 50, parental linkage has mostly broken 197 down between pairs of distant loci, resulting in a peak at ′ = 0. + ≠ 0 only within small genetic 198 distances (Figure 3 A&D). Moreover, the distribution of (2) across all pairs of loci along the simulated 199 chromosome has decreased to near 0 by generation 50 (resulting in a peak of (2) at 0, Figure 3B). In 200 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint general, negative LD ( = !" ⋅ #$ − #" ⋅ !$ ) correlates with positive (2) ( Figure 3C). However, 201 the selected pair of loci can be pinpointed by scanning for negative (2). Negative ' is also observed 202 between the DMI loci ( Figure 3D&E). Importantly, whereas the ' signal is consistent with increased LD 203 -that may stem from residual linkage, population structure, or epistatic interaction ( Figure 3D) --(2) is 204 specifically associated with unexpected recombinant imbalance ( 2) caused by the DMI ( Figure 3E&F). Since other metrics used to detect DMIs are sensitive to demographic parameters, we next used SLiM to 216 explore whether the same was true for (2). We considered the effect of two demographic factors, 217 population expansion and migration, on the sensitivity of the (2) statistic to DMIs. Natural hybrid 218 populations may suffer from low fertility and high mortality in the early stages of hybridization, and the 219 hybrid population may occupy a new niche or geographical area. During the period in which hybrid 220 incompatibilities are purged, mean population fitness increases and the population may expand. To 221 explore the possible effects of this phenomenon, we performed simulations assuming that the hybrid 222 population would expand within 35 generations and arbitrarily assumed a 5% growth rate per generation. 223 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint We found that population expansion rarely affects the detection of a DMI ( individuals that derived at least 10% of their genome from each parental population. 235 We next considered continual immigration from one of the parental species into the hybrid population. As 236 shown in Figure 4, high migration rates reduce our ability to detect negative (2) when admixture 237 proportions are even. However, depending on the specific migration scenario, migration can aid in the 238 detection of a DMI. Generally, migration changes the haplotype frequencies and alters the time period 239 during which (2) is negative ( Figure S5). To quantify this effect, we studied three migration rates, no (2) in the presence of a DMI ( Figure 4A). Importantly, we find that migration from the minor parent 244 can increase the probability of detecting a DMI with (2). Whereas we had poor power to detect a DMI 245 in populations with an admixture proportion of 0.3 without migration, power improves substantially with 246 migration. If the minor parental population contributes most migrants, incompatible haplotypes will be 247 continuously formed by recombination between resident and immigrant types. This can result in the DMI 248 persisting in the hybrid population for a longer time period, which makes it possible to use (2) to detect 249 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint DMIs in scenarios of uneven admixture proportions ( Figure S5). We note that immigration of the major 250 parental population, in turn, is expected to reduce our ability to detect a DMI.

252
With strong migration ( = 0.01), the linkage of parental haplotypes remains strong due to the 253 continuous reintroduction of these haplotypes. In such a scenario, (2) remains larger than 0 and thus 254 makes it difficult to detect DMIs with this statistic ( Figure 4A). Here, strong migration acts like strong 255 selection for the immigrating parental two-locus haplotype, which hides the (still existing) recombinant 256 imbalance. This will eventually result in one parental haplotype taking over the population ( Figure 4B). builds up more slowly than in a model with non-overlapping generations ( Figure 5A). At the same time, 269 alleles A and B segregate in the population for much longer with overlapping generations than with non-270 overlapping generations ( Figure 5B). This leads to a longer maintenance time of the interacting loci, 271 resulting in a longer window in which − (2) can be used to detect the DMI.

273
Given these results, we next evaluated the sensitivity and specificity of (2) for DMI detection under a 274 range of models using non-overlapping generations ( Figure 6). We simulated four chromosomes, two of 275 which contained the DMI loci and two of which evolved neutrally. We computed sensitivity as the 276 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint proportion of simulations in which the DMI loci showed X(2) < −0.005 & ' < 0, and specificity as one 277 minus the proportion of simulations in which a pair of loci on the neutral chromosomes showed (2) < 278 −0.005 & ' < 0. We considered six scenarios by combining two admixture proportions ( = 0.5 and 279 = 0.3) and three immigration rates (0, 0.005, 0.01) of the minor parental population. We also compared 280 the performance of (2) to the G test which has previously been used to detect DMIs (Li and Zhang 281 2013). The recombination rate in these simulations was set to = 0.5 because the G test can only be 282 applied to unlinked loci. the DMI in the minor parental population. In all scenarios explored here, (2) has very high specificity. 304 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint In 300 simulations of a null model of neutral evolution, (2) was always greater than -0.001, where 305 negative (2) was caused by genetic drift and random sampling of 300 individuals for the analysis. 306 We compared the performance of (2) with that of the G test, which was identified as one of the best 307 statistics to detect interchromosomal DMIs (Li and Zhang 2013). Figure S6 shows that the G test is highly 308 sensitive in the early stages of hybridization, but its sensitivity decreases after generation 20. There is also 309 an apparent tradeoff between sensitivity and specificity. Interestingly, unlike for the G test there is no 310 tradeoff between sensitivity and specificity for (2). We found that whereas significant hits using the G 311 test had negative (2), the opposite was not true, especially when the parental linkage has broken down 312 from generation 20 onwards ( Figure 6, S6 and S8). Thus, (2) was sensitive to DMIs in scenarios in 313 which the G test was not. 314 Our analysis above relies on the use of phased genotyping data, which can be difficult to collect in 315 practice and can introduce inaccuracies via phasing errors. In contrast, pseudo-phased data is available for 316 many datasets, including the data analyzed below. In this case, only homozygous loci in each genome are 317 used for analyses where the phase can be inferred without computational phasing (e.g., genotyping data 318 reviewed in Davey et al. 2011). We calculated the sensitivity and specificity for the G test and (2) using 319 only homozygous genotypes and found that performance trends were similar to phased data with mildly 320 reduced sensitivities ( Figure S8). 321 We next applied our detection method to simulated genomic data from a range of more general scenarios 322 with one and two DMIs. Here, positions of the interacting loci were distributed randomly across four 323 chromosomes (resulting in both inter-and intrachromosomal DMIs), epistatic coefficients were drawn 324 from a uniform distribution ( ∈ [−1, −0.001]), and the direct selection coefficients and were drawn 325 from an exponential distribution with mean 0.001. We further simulated two neutral chromosomes 326 without DMI loci. 327 The proportion of pairs of loci that were incorrectly inferred as interacting partners (from here on referred 328 to as false positive rate) varied significantly between chromosomes with DMIs and neutral chromosomes. 329 For the neutral chromosomes, this proportion was almost 0 when (2) < −0.005 was used as a 330 threshold (Table S2 & S3) for both phased and pseudo-phased data. However, on chromosomes with 331 DMIs, upwards of 2% of pairwise comparisons (from a total of `, --) a = 79800 pairs, of which 2 pairs are 332 true DMI pairs, corresponding to 0.005% of pairs) were identified as putative DMIs based on the (2) 333 statistic (threshold (2) < −0.005) although the two loci were not involved in a DMI. As expected, both 334 true and false positive rates decreased with a more stringent (2) threshold (Table S1-S3). Comparing to 335 simulations with a single DMI, the false positive rate increased noticeably but not excessively when two 336 DMIs were simulated (from ~1% to a maximum of 2.2% when (2) < −0.005, Table S1 and S2), 337 indicating some interference between DMIs. When only homozygous loci (pseudo-phased data) were 338 used for the inference, the true positive rate remained similar, but the false positive rate was elevated to 339 3.6-6.1% (Table S2 and S3). leave a weaker signal that did not extend as far as the signal of strong DMIs. 354 False positives were most often caused by loci linked to DMIs (e.g., Figure 3E). Markers within 10 cM of 355 DMI loci showed a large number of significant (2) pairs in simulations of two DMIs that were 356 distributed randomly across four chromosomes ( Figure 7A). In the most extreme case observed in the 357 simulations, one DMI locus was involved in significant pairwise interactions with over 200 markers of a 358 total of 400 markers ( Figure 7A). This case involved strong epistasis for both DMI pairs ( ≈ −0.75), 359 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint and two DMI loci that were located very close to each other on the same chromosome (DMI 1 between 360 chromosome 1 (chr1) position 63 and chr4 position 100; DMI 2 between chr3 position 98 and chr4 361 position 92, with the extreme signal emitted from chr3 position 98). Thus, the extreme signal was likely 362 caused by interference between the two DMIs. At generation 30, the genomic window across which the 363 signal of interaction between two true DMI loci extends is large; the proportion of significant pairs when 364 only true DMI loci and their flanking regions were included in the calculation is high within 10 cM 365 (median ~35% vs. genomic median 1.1%, Figure 7B and Table S2), probably caused by hitchhiking or 366 sorting of large genomic blocks in hybrid populations due to the limited generation time and genetic drift. 367 In analyses of unphased genomic data, we can only rely on information from homozygous genotypes. 368 Therefore, we computed the same statistics for simulated data when only homozygous genotypes are 369 considered. Here, consistent with a higher false positive rate, the proportion of significant pairs between 370 DMI regions decreased more slowly with window size than when the complete data were considered 371 ( Figure S9C; see also Figure 7B). We then applied the strict filtering criteria necessary to clean the 372 genomic data from natural hybrid populations before analysis (see below and Methods), for which a 373 homozygous frequency of >0.05 and a frequency of heterozygotes <0.6 were required. Because most 374 strong DMIs are (nearly) purged by generation 30, few DMI loci pass the filtering, and the signal 375 therefore changes such that the highest proportion of significant pairs is observed for genomic windows 376 that span several cM around the true DMI ( Figure S9 B&C). Here, only the flanking regions of strong 377 DMIs (and not the DMI loci themselves) indicate the DMI. The proportion of significant pairs at up to 378 10cM distance from a true DMI locus is still higher than the genome-wide false positive rate (>35% vs. 379 genomic median up to 7.1%, Figure S9B and Table S3). Taken together, our analysis indicates that, 380 although greater precision could be obtained from phased data, our method shows good results for 381 pseudo-phased data. 382

Using ( ) to detect putative DMIs in swordtail fish hybrid populations 383
Two sibling species of swordtail fish, Xiphophorus birchmanni and X. malinche, were reported to 384 hybridize naturally in Mexico (Culumber et al. 2011, Schumer et al. 2014). Since these hybrid 385 populations formed recently and have been studied extensively, they represent an excellent opportunity 386 for using (2) and Δ 2 to infer DMIs from natural populations. Three hybrid populations called CHAF, 387 TOTO and TLMC were sampled and genotyped in previous work (  Figure S7). 398 We used (2), 2 and ′ to identify putative DMI pairs and to infer the most strongly depleted 399 genotypes in these three populations. To roughly count the DMIs in a fashion that is comparable between 400 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint populations, we sliced the genome into fragments of 1 Megabase (Mb), which we will refer to as a locus 401 in the following description. A putative DMI was identified if homozygous markers at this locus showed 402 (2) < −0.005 and + < 0 (see also Methods). Overall, the proportion of pairs of loci that fulfilled these 403 criteria was similar to the proportion observed in simulations of one or two DMIs (Table S3 and S5). To 404 increase our confidence in the candidates, we therefore performed bootstrapping to ensure that the locus 405 pair was consistently inferred to have (2) < −0.005 and + < 0 (Table S5, see methods). 406 In total, we detected 2203, 2039 and 1442 putative DMIs in the three hybrid populations CHAF, TLMC 407 and TOTO with false discovery rate <0.001 (15.2%, 12% and 6% of tests, respectively, Table S5). As 408 highlighted in the previous section, these numbers may include many genomic blocks that are close to 409 DMIs but not themselves carry a DMI locus (Figure 7 and S9, Table S5). Given the false positive rate 410 estimated from the simulations, we view this as a preliminary set of candidate DMIs for further 411 investigation. In each population, 200-400 bins (out of the total 711 bins, Table S5) were involved in at 412 least one putative DMI. If the interactions were randomly distributed across the genome, the occurrences 413 of the loci would fit a Poisson distribution ( = 2 ⋅ number of interactions / number of loci involved in 414 interactions). In contrast to this expectation, loci with very small and very large numbers of occurrences 415 were greatly overrepresented among the DMI candidates (One-sample Kolmogorov-Smirnov test, < 416 2.2 − 16, Figure S9). We speculate that the loci that appear in many putative DMI pairs are likely to be 417 true interaction partners in pairwise or complex DMIs, whereas many of the loci that occur only once or 418 twice may be noise caused by the true DMI loci. More than 99.5% of putative interactions involve a small 419 number of loci that are each involved in >10 interactions (54 in CHAF, 61 in TLMC and 66 in TOTO). 420 Among these putative interaction hubs, CHAF and TLMC share 8 loci, CHAF and TOTO share 2, and 421 TLMC and TOTO share 2 (Table S6). Altogether, these results strongly imply that a set of loci is 422 involved in complex interactions in the hybrid 423 Although our overall false positive rate is low, given the large number of tests performed in our 424 evaluation of the real data, we can be most confident in putative DMIs that are detected in multiple 425 populations. Across three hybrid populations we found 17 putative DMI pairs that were shared between 426 TLMC and CHAF, which are the two populations that have the same minor parental population (X. 427 birchmanni) and are therefore expected to share more detectable DMIs (Table S7). There were no shared 428 DMI pairs between the TOTO and the two other populations. Among all putative interaction partners, 14 429 loci were shared by three populations, out of which four were previously identified (Table S8). This set of 430 loci are exciting candidates for future work. 431 In comparisons between TLMC and CHAF, which share a minor parent species, two loci stood out with 432 respect to their many inferred interaction partners. Both chr20:23Mb (specifically, between 23Mb and 433 23.4Mb) and chr15:15M (specifically, between 15.6Mb and 16Mb) are putatively involved in 8 434 interchromosomal interactions. At both loci, alleles from X. malinche are incompatible with alleles at 435 several loci from X. Birchmanni (Table S7). Interestingly, there was no signal of an interaction between 436 these two loci. However, chr18:19Mb is putatively interacting with both chr20:23Mb and chr15:15Mb in 437 both TLMC and CHAF. X. malinche is the major parental population in TLMC and CHAF, leading to the 438 expectation that homozygous X. malinche alleles would be in the majority. In contrast to this expectation, 439 few alleles from X. malinche remained in these regions when combined with several X. birchmanni 440 backgrounds. In TOTO, where X. malinche is the minor parental population, many X. malinche alleles 441 were already lost from the population or are segregating at very low frequencies. This leads to a power 442 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made intrachromosomal interactions overlapped between any two populations. For example, Figure S11 shows 452 a putative intrachromosomal DMI in TLMC (chr20:9M hybrid and chr20:10M). In the Δ 2 heat plot of 453 this genomic region, the candidate recombinant from TLMC (composed of X. birchmanni ancestry at 454 chr20:9M and X. malinche ancestry at chr20:10M) also showed a weak signal of elimination in CHAF.

455
However, in TOTO, no such recombinant imbalance is visible. The lack of overlap could stem from the 456 (2) signal of intrachromosomal DMIs appearing later than that of interchromosomal DMIs, combined 457 with the different timescales of hybridization in the three populations, differences in power, or true 458 differences in the frequency of intra versus interchromosomal DMIs. 459 In addition to comparing our candidates with a list of previously identified DMIs (Table S6 and  CHAF, it appears that both homozygous hybrid haplotypes are depleted in a part of this region. Thus, the 467 Xmrk-cd97 interaction could be a hybrid incompatibility in which both recombinant haplotypes are 468 deleterious, in which case no negative (2) signal is expected (since the signal is driven by recombinant 469 imbalance as caused by classical DMIs). Regarding the potential DMI between Xmrk and cdkn2a/b, we 470 found negative (2) in all three hybrid populations that support this interaction ( Figure S12, Table S9). In 471 CHAF, (2) of 49.5% of marker pairs was negative but only 0.2% of these passed the threshold (2) < 472 −0.005. In TOTO, which is of intermediate age, 25.9% of all marker pairs showed (2) < −0.005, 473 which is much higher than the expectation obtained for a DMI from bootstrapping (median=5.7%, Table  474 S5). This is consistent with our expectation of the signal of an intermediate-strength DMI that does not 475 display strong negative (2) until later generations. 476

Discussion 477
DMIs play an important role in maintaining reproductive isolation between species. Despite their crucial 478 importance, existing methods to identify DMIs are severely underpowered and susceptible to high false 479 positive rates. Here, we show that the deviation of the variance of heterozygosity from its expectation, 480 (2), can allow us to identify intra-and interchromosomal DMIs in hybrid genomes. Specifically, a 481 negative (2) statistic is indicative of the presence of a DMI, because epistatic selection transiently leads 482 to an imbalance of the recombinant haplotypes. A second measure that we develop, Δ 2, quantifies this 483 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint recombinant imbalance more explicitly. After showing the expected sensitivity and specificity of these 484 statistics in simulations, we used (2), Δ 2 and ' to infer candidate DMIs from three hybrid 485 populations of swordtail fish. Although the detection power with these statistics is strongly dependent on 486 the age and demographic history of the hybrid population, false positive rates across simulations of 487 various scenarios were low and are expected to be especially low among interactions detected in multiple 488 populations. Encouraged by this result, we found that natural hybrid populations of swordtail fish 489 harbored many potential DMIs, and our analysis detected known DMI pairs. Interestingly, putative DMI 490 pairs tended to include the same loci more often than expected by chance, which is suggestive of the 491 presence of interaction hubs in these data. 492

DMIs cause distinctive patterns of LD and variance of heterozygosity 493
Linkage disequilibrium (LD) between physically unlinked loci has been frequently used to identify 494 candidate DMIs. However, LD is influenced by many factors including selection, population structure, 495 population differentiation, genetic drift, and epistasis. Sved first proposed the variance of heterozygosity 496 as a measure of multi-locus LD (Sved 1968). Generally, LD tends to raise the variance in heterozygosity 497 above its expected value at linkage equilibrium. This deviation from the expectation (called ( ), where 498 is the number of loci considered) was used to infer population structure within species, e.g., in natural 499 populations of Hordeum spontaneum and bacteria (Brown et al. 1980, Maynard Smith et al. 1993). These 500 studies assumed that the focal loci are independent, and that any disruption of independence would cause 501 LD and a positive deviation from the expected variance of heterozygosity (positive ( )). In this paper, 502 we show that selection against DMIs generates an exceptional pattern of LD and (2), where LD is 503 strong but (2) becomes negative. 504

DMI inference with ( ) does not rely on recombination rate estimates 505
One advantage of our statistics is that they do not rely on knowledge of recombination rates, which are 506 often unknown or estimated with error and may also differ between the parental and the hybrid 507 populations. For example, the protein PRDM9, which is involved in hybrid incompatibility in mice 508 (Mihola et al. 2009), specifies the locations of recombination hotspots in many mammalian species 509 (reviewed in Smukowski and Noor 2011, Penalba and Wolf 2020). Traditional methods of DMI detection 510 are strongly affected by recombination rate variation, whereas in the case of (2) it solely affects the 511 temporal scale over which DMIs are detected. At the same time, incorporating recombination rate 512 information into DMI detection with (2) is an exciting direction for future work. That is because if the 513 recombination map was known, our statistics could be incorporated in a simulation-based framework to 514 estimate not only the location but also the strength of selection against the DMI (i.e., the size of the 515 epistasis coefficient ). 516

Detection time windows differ between inter-and intrachromosomal DMIs 517
The ability to detect a DMI based on negative (2) and Δ 2 statistics is transient, and the time window 518 during which a DMI is detectable with these statistics depends on the demographic history of the 519 population, as well as the strength of the incompatibility and the recombination rate between the two loci 520 involved ( Figure 2). Specifically, the less recombination there is between two DMI loci and the weaker 521 the incompatibility, the longer it takes for their recombinant imbalance to be exposed. Moreover, since an 522 intrachromosomal (or weak interchromosomal) DMI is purged more slowly, it is detectable for a longer 523 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint time period, unless one of the interacting alleles is lost by other forces (such as genetic drift). This 524 interaction between population history and power to detect a DMI must be considered when interpreting 525 (2) results from real data. Interestingly, we find that certain migration scenarios improve power to 526 detect DMIs over long time periods. 527

No evidence for an enrichment in intrachromosomal DMIs in swordtail population data 528
Two lines of evidence led us to hypothesize that intrachromosomal DMIs may be more prevalent than 529 interchromosomal DMIs. From first principles, systems and molecular biology have established that 530 genes in the same biological pathway tend to cluster physically in the genome, and DMIs may be more 531 likely to arise from disruption or modification of such pathways (reviewed in Hurst 2004). This predicts 532 an enrichment in intrachromosomal DMIs under any scenario of speciation. Secondly, evolutionary 533 theory has predicted that tightly linked DMIs are more likely to accumulate and be maintained in the 534 presence of gene flow (Bank et al., 2012). Thus, both factors should contribute to an enrichment (or 535 reduced purging) of intrachromosomal DMIs. In contrast, in the set of DMI candidates inferred from 536 empirical data in this paper, interchromosomal DMIs were slightly overrepresented as compared to 537 intrachromosomal DMIs (<4% intrachromosomal DMIs compared with the expectation of 4.2% given an 538 average chromosome length 29.2 ± 4.63 Mb; permutation test, p < 0.02). However, given so many 539 unknowns about the true architecture of selection, we are currently not able to conclude whether this is a 540 biological result, or a pattern caused by the limits of our inference method combined with the 541 demographic history of the populations studied here. Although our method is the first that can be applied 542 to detect intrachromosomal DMIs, it is limited to pairs of loci that are at a sufficiently large distance on 543 the chromosome such that recombination occurs with appreciable frequency, thus making the 544 recombinants visible to selection. Therefore, we are likely to miss local interactions, such as those 545 between neighboring genes. Moreover, intrachromosomal DMIs are detected on a different timescale than 546 interchromosomal DMIs, making the two classes not directly comparable. Altogether, our analysis does 547 not allow for a firm statement about the true proportion of intra-to interchromosomal DMIs. 548

The challenge of inferring and understanding complex DMIs 549
We find based on simulations that our statistics are robust to various demographic scenarios such as 550 population expansions and overlapping generations. However, in the modeling and simulation parts of 551 this paper, we mainly study the dynamics of one or two DMI pairs. This is because considering multiple 552 or complex DMIs would generate an intractably large parameter space to explore. Thus, we implicitly 553 assume that DMIs are independent of other incompatible loci in the genome. However, our results 554 indicate that some DMI candidates are involved in many interactions. For example, one locus may 555 interact with multiple loci (e.g., Brideau  results also indicate an interaction beween Xmrk and a genomic block with cdkn2a/b in swordtail fish, 562 which might cause unfit phenotypes other than melanoma. Lu et al. 2020 reported that rad3d, a gene 563 upstream of cdkn2a/b, interacted with Xmrk mapped in hybrids between X. maculatus and X. helleri. 564 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint Taken together, this suggests the simultaneous presence of multiple hybrid incompatibilities caused by 565 Xmrk. This finding is not specific to the study system; recent GWAS and QTL mapping studies in a house 566 mouse hybrid zone also suggested multiple, non-independent genetic incompatibilities (Turner and Harr  567 2014, Turner et al. 2014). In addition to the presence of true hubs of hybrid incompatibilities, other types 568 of selection and interference could contribute to the signal at these loci. For example, there could be 569 interference between different DMIs (analogous to Hill-Roberson interference), or strong negative 570 interactions may cause local stratification of the genome (e.g., Pool 2015). 571 Moreover, it is important to note that the sensitivity of the (2) statistic is specific to the recombinant 572 imbalance caused by classical DMIs (Orr 1995, Gavrilets 1997, where only one recombinant haplotype 573 suffers from the incompatibility, whereas the other has similar fitness to the parental genotype. The 574 evolution of this type of hybrid incompatibility is easy to explain without invoking multi-locus 575 interactions (Orr 1995 and their overall role during hybridization and speciation. 581

Concluding remarks 582
In this paper, we take a step towards genome-wide identification of inter-and intrachromosomal DMIs, 583 providing a powerful new tool for researchers in speciation genetics. However, the complexity of 584 potential interactions in the genotype-fitness map of an organism is vast (Fragata et al. 2019), and DMIs 585 are only one specific kind of pairwise epistatic interaction that is characterized by the fitness deficit of 586 one recombinant genotype. Even if this type of (negative) epistasis between two loci is inferred, drawing 587 conclusions about the functional underpinnings of this epistasis is difficult. For example, Kryashimsky 588 (2021) recently presented theoretical models that demonstrate how epistasis can propagate across modules 589 of a metabolic network, with negative epistasis at lower levels possibly resulting in amplified negative 590 epistasis at a higher level, and positive epistasis at a lower level possibly turning into negative epistasis at 591 a higher level. These findings reinforce the expectation of wide-spread negative epistasis (including 592 DMIs) for fitness. Mapping and functionally understanding complex interactions and their consequences 593 for genome evolution will require new interdisciplinary approaches that combine the tools and expertise 594 from genomics, systems and molecular biology and population genetics in the future. 595

A toy model illustrates DMI dynamics. 597
We first consider a toy model of a single DMI pair, similar to Bank et al. 2012, that is purged from an 598 isolated hybrid population. Consider two divergent loci resulting in parental haplotypes aB and Ab 599 ( Figure S1). Hybridization and recombination generate two recombinant haplotypes, ab and AB. We 600 assume a DMI between A and B, and that direct selection acts on the incompatible alleles A and B. We 601 denote the fitness of the haplotypes as !" = 0, #" = , !$ = , #$ = . Throughout the manuscript, 602 we usually set these coefficients to = 0.001, = 0.002, = −0.5, representing weak direct selection 603 and a strong DMI. To gain an intuition of the purging process and the statistics, we first compare the 604 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint frequency dynamics with and without DMIs by assuming haploid hybrid populations with recombination 605 in a deterministic model. 606

Dynamics of the variance of heterozygosity for a two-locus DMI 607
In the model described above, we denote %& as the frequency of haplotype , where ∈ { , } and ∈ 608 { , }. The allele frequency of allele at locus A is % and the allele frequency of allele at locus B is & . 609 The expected heterozygosity ℎ at a locus is heterozygosity at two loci is the sum of the heterozygosity at A and B, 611 We define the two-locus variance of heterozygosity as in Sved 1968, summing over the contributions of 613 numbers of heterozygous loci (see Table S10), as 614 If there is no LD, the alleles that segregate at different loci in an individual are independent, so the 621 expected value of the variance of heterozygosity is ) 245 = ℎ + ℎ − ℎ ) − ℎ ) . 622 The deviation of the variance of heterozygosity from the independent expectation is therefore 623 Its relative deviation from the expectation is 625 which is the focal statistic of this paper. Similar to LD, (2) indicates the association between alleles at 627 two loci. The association is created by recombination, genetic drift, and negative epistasis between alleles. 628 The LD of recombinants is = !" ⋅ #$ − !$ ⋅ #" . 629 Normalizing D to D' we obtain 630 CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint where 632 We use D' as classical measure of LD throughout the manuscript. 635

quantifies imbalanced segregation of recombinant haplotypes 636
We also rewrote the deviation of the variance of heterozygosity from its expectation in the form of 637 haplotype frequencies ( %& ) and allele frequencies ( % , & ) 638 ) as a measure of the imbalance of haplotype frequencies, which is a 640 key indicator of a DMI. 641 In a deterministic model without selection, we expect the recombinant haplotypes to be generated at the 642 same frequency after hybridization. It follows that 2 !" and 2 #$ of recombinant haplotypes are 643 expected to be equal initially, independent of the initial admixture proportions. When there is a DMI, the 644 recombinant haplotype AB is depleted, and this expectation does not hold. We use the D2 difference 645 between the two recombinants to refer to DMI-indicating recombinant imbalance and from now on refer 646 to this as Δ 2 ≔ 2 #$ − 2 !" (Figure S3 A-D). can also be written as |1 − # − $ |, showing the relationship between the recombinant imbalance and 652 the elimination of the two alleles that interact negatively in the DMI. 653

Sensitivity and specificity 674
Before applying the statistics to real data, we used simulations under a Wright-Fisher model with variable 675 admixture proportions and migration rates to estimate the sensitivity and specificity of our method in 676 detecting DMIs (Figure 2 & S5). We designed these simulations to correspond to realistic conditions for 677 its application to swordtail fish data. According to empirical estimates, the migration rate m ranges from 678 <<0.01 to ~0.05 in the natural hybrid populations studied here. To roughly explore these dynamics in our Since (2) was always larger than -0.001 in the 300 simulations we performed on neutrally evolving 684 pairs, DMI detection criteria were set to (2) < −0.001 & ' < 0. ′ < 0 ensures that the depleted 685 haplotype is a recombinant but not a parental haplotype. The population size was 5000 in each 686 simulation, and the recombination rate was set to 0.5 f or each pair of loci. We simulated each scenario 687 100 times and randomly sampled 300 individuals for further analysis. Sensitivity and specificity were 688 calculated from generation 10 onwards. 689

False and true positive rates with one and two DMI pairs 690
To generalize the performance analyses, we simulated genomes with one DMI or two DMIs on two or 691 four chromosomes (chromosomes with DMIs are referred to as DMI chromosomes below). In the same 692 genome, two additional neutral chromosomes were simulated to estimate the false positive rate. One 693 hundred markers were simulated on each chromosome, and the population size was set to 5000. The 694 position of the DMI loci were distributed randomly on the DMI chromosomes such that DMIs could be 695 either inter-or interchromosomal. Epistatic coefficients were drawn from a uniform distribution ( ∈ 696 [−1, −0.001])). Direct selection coefficients were drawn from an exponential distribution (mean=0.001, 697 , ). Due to the large parameter space, we only considered two demographic scenarios, equal admixture 698 proportions without migration ( = 0.5, = 0) and uneven admixture proportions with high migration 699 rates ( = 0.3, = 0.01). We simulated each scenario 100 times, such that 100 DMIs in the one-DMI 700 scenario and 200 DMIs in the two-DMI scenario could be used to determine the true positive rates. We 701 also studied the distribution of the total number of retained two-locus haplotypes when only homozygous 702 loci were considered for varying sample sizes ( ∈ {100, 200, 300}) in the two-DMI model. At least 200 703 individuals had to be sampled to obtain >50 homozygous-homozygous haplotypes ( Figure S12). The 704 mean and 95% confidence interval of the false positive rate were estimated by resampling 2000 705 interactions from all pairwise statistics of 100 simulations 1000 times. 706 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint

Fish data availability, quality control and bias control 707
Previous work has described our approach for local ancestry inference in X. birchmanni x X. malinche 708 hybrids in detail , Schumer et al. 2020). Ancestry informative sites were defined based 709 on a combination of low and high coverage sequenced individuals from two allopatric X. birchmanni 710 (N=150) and two allopatric X. malinche populations (N=33). The difference in sequencing effort reflects 711 a large difference in genetic diversity and the expected number of segregating sites between the two 712 species (X. birchmanni p per basepair is 0.1%, X. malinche is 0.03%). Based on observed allele 713 frequencies at these sites, we excluded any ancestry informative site that had less than a 98% frequency 714 difference between the two samples. We used observed frequencies at these ancestry informative sites, in 715 combination with counts for each allele from low-coverage whole genome sequence data to infer local 716 ancestry using a hidden Markov model (HMM) based approach. The expected performance of this 717 approach was evaluated in simulations, on pure parental individuals not used in the definition of ancestry 718 informative sites and on early generation F1 and F2 hybrids where small switches in ancestry can 719 confidently be inferred to be errors. A subset of simulations also included genetic drift between the 720 reference panels and parental populations that form the hybrid populations (Schumer et al. 2020). Based 721 on these analyses we estimated error rates to be £0.2% errors per ancestry informative site in all tested 722 scenarios , Schumer et al. 2020). Thus, given high expected accuracy of ancestry 723 inference in X. birchmanni x X. malinche hybrids, we do not expect that errors will cause the patterns 724 observed in the empirical data from natural populations.

726
For convenience in downstream analysis, posterior probabilities from the HMM for ancestry state were 727 converted to hard calls. Sites with a posterior probability greater than >0.95 for a particular ancestry state 728 (homozygous X. malinche, heterozygous, homozygous X. birchmanni) were converted to a hard call for 729 that ancestry state. This resulted in 629112, 629582, and 629101 calls at ancestry informative sites for 730 analysis across populations. Individuals were excluded from this dataset if they were nearly pure parental 731 in their ancestry (e.g., <10% of genomes from either parental population). 732 733 To avoid biases, we filtered markers in the three populations independently using the same criteria. In 734 each population, we retained a locus for analysis if its frequency of each homozygous genotype 735 was >0.05, and if the locus was missing (or heterozygous) in fewer than 60% of individuals. Ancestry 736 informative sites were thinned by LD ( ) > 0.9) within a 10kb window, with 1-marker step size by 737 PLINK (v1.9, Chang et al. 2015). The thinning resulted in 18,188, 7,244, and 7,415 markers for CHAF, 738 TLMC, TOTO respectively. 739 740 We then sampled 700 markers and calculated the statistics (including (2), ', Δ 2 and haplotype 741 frequencies) pairwise in this data set. If (2) < −0.005 and ' < 0 (Table S1-S3), we considered a pair 742 of markers as a candidate DMI pair. To compare the DMI locations among bootstrap replicates and 743 between hybrid populations, we next split the genome into 1Mb bins (based on the reference genome 744 of X. maculatus; Schartl et al. 2013). This window size was inspired by our simulations, which showed 745 that the signal of selection at DMI loci could extend to 1-2Mb ( Figure S11). Any two bins that contained 746 pairwise interaction candidates were finally considered a putative DMI. We repeated this procedure 20 747 times. 748 749 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.04.454891 doi: bioRxiv preprint