Genepop version 4.7.1
F. Rousset
This documentation: 19 January 2018
1 Introduction
1.1 Purpose
This is a documentation for the Genepop software, distributed both as stand-alone software and as an R package. Genepop implements a mixture of traditional methods and some more focused developments:
- It computes exact tests for Hardy-Weinberg equilibrium, for population differentiation and for genotypic disequilibrium among pairs of loci; 
- It computes estimates of \(F\)-statistics, null allele frequencies, allele size-based statistics for microsatellites, etc., and of number of immigrants by Barton & Slatkin’s 1986 private allele method; 
- It performs analyses of isolation by distance from pairwise comparisons of individuals or population samples, including confidence intervals for “neighborhood size”. 
A formal reference for the current version of Genepop is Rousset (2008). Likelihood methods based on coalescent algorithms are being developed in a distinct software, Migraine (Rousset and Leblois 2007; Rousset and Leblois 2012; Leblois et al. 2014).
Genepop also converts data from the Genepop input format to formats of some softwares that were around in Genepop’s youth (Raymond and Rousset 1995b); there has been little need to update this option as many more recent softwares for population genetic analyses read input files in the Genepop format.
1.2 The two Genepop distributions
Genepop is now distributed both as an R package, and as stand-alone software. See the Genepop distribution page for the latter. This documentation describes the use of the executable. The functionalities it describes are available in an R session, using R functions described only in the package documentation.
1.3 Changes since version 4.0
Version 4.7.1
Genepop is now also distributed as an R package. It now uses the implementation of the Mersenne twister pseudo-random number generator found in recent C++ compilers. This has two implications. First, a recent compiler must be used,as described below. Second, test results of previous versions cannot be exactly replicated.
The format of a few file outputs has been modified (in particular the reporting of extreme values of some global tests).
Version 4.6
A bootstrap analysis of mean differentiation has been introduced, in particular to allow comparison of the mean differentiation observed over a given range of geographical distances, in intra vs. inter-ecotypic analyses. It can be called by the setting meanDifferentiationTest.
The Mantel test based on regression slope (not the one on ranks) was not handling appropriately cases where some pairwise data had to be excluded. This is corrected. Such cases concern in particular pairs of samples in the same location (e.g., pairs of individuals), when geographical distance is log-transformed, because the pairwise differentiation between such individuals cannot be used for the computation of the regression. The bootstrap analyses ere already handling correctly this case.
Version 4.5
A new keyword inter_all_types for setting “popTypeSelection” allows one to perform spatial regressions (but not Mantel tests) between all pairs of individuals or populations belonging to different types (e.g., individuals belonging to different patches, excluding pairwise statistics for pairs of individuals within patches).
Version 4.4
Mantel tests are by default no longer based on rank correlation. The older rank tests can be performed using the new MantelRankTest setting. In addition, a MaximalDistance setting has been added, affecting the computation of spatial regressions.
Version 4.3
Two new “miscellaneous” conversion options have been added: option 8.5 converts population data to individual data (as 8.4) but keeps the individual names (hence the geographic location of each individual); and option 8.6 randomly samples haploid data at diploid loci.
Version 4.2
One can now perform all isolation-by-distance analyses with a user-provided distance matrix instead of the geographic distance matrix computed from the coordinates of the samples (geoDistFile setting).
Version 4.1
It is possible to test trends in gene diversity among samples.
Analyses of isolation by distance have been strengthened in several ways. Variants of previously described estimators have been implemented for both haploid and diploid data. 0ne can select subsets of the data for analyses of isolation by distance within and between these subsets. Further, analysis of isolation by distance from several one-locus genetic distance matrices is now possible through the MultiMigFile option. In contrast to IsolationFile, this allows the construction of bootstrap confidence intervals. Finally, it is possible to test specific values of the slope of the spatial regression, using the testPoint setting.
The input file reading procedure is better protected against nonstandard file formats (in particular those produced by some Microsoft software under Mac OS X).
The new sub-option 8.4 has been added to convert population-based data to individual-based data (each individual in its own Pop).
Version 4.0
Version 4.0 was a complete rewrite of the fossil version 3.4, with the following changes:
Use of the \(G\) (log likelihood ratio) statistic has been generalized to all contingency tables (though previous probability tests implemented in Genepop are still available). Genepop now provides bootstrap confidence intervals for strength of isolation by distance between groups of individuals, an alternative estimator for analyses of “differentiation between individuals”, and facilities to evaluate the performance of these methods. The genetic distance matrix produced by these options can also be exported in Phylip (Felsenstein 2005) format. The option for null allele estimation implements additional estimators with confidence intervals, and its output is better organized.
Some additional facilities have been implemented for better ease of use. Earlier versions of Genepop required from the user some effort to deal with either 3-digits-coded alleles or with haploid data. Genepop is more practical, in that haploid and diploid genotypes in both 2- or 3-digits allele codings are automatically recognized as such by the program and all these different types of data can be mixed in the same input file. The input format is otherwise unchanged so that input files prepared for earlier versions of Genepop are still read by Genepop (backward compatibility).
In addition, Genepop’s behaviour can be controlled using an option file and by inline arguments in a console command line. This allows batch calls to Genepop and repetitive use of Genepop on simulated data. However, those familiar with the old Genepop menus can also use Genepop in an almost unchanged way.
Previous Genepop distributions included two small utilities, hw.bat  and struc.bat,  for testing of single data matrices using a fast ad hoc data input. These facilities are available in Genepop 4.0 through the HWfile  and StrucFile options.  Previous Genepop distributions also included the Isolde  program for analysis of isolation by distance between groups of individuals, from one genetic distance and one geographic distance matrices. All such analyses can now be performed through the unique Genepop executable (other facilities that were unique to Isolde are now accessible through the IsolationFile setting).
Other minor, and often trivial, differences with earlier versions of Genepop will be pointed out in footnotes.