Salmon Research at BML -> Genetics Software Applications

WHICHLOCI

A computer program for determining relative discriminatory power among candidate genetic loci

The astounding increase in amount of information yielded by highly polymorphic molecular marker types such as microsatellites has significantly increased resolving power for discrimination among closely related populations.  This together with increased automation of techniques for resolving genetic variation results in an overall boon of new information.  Individual based methods for assessing most likely population origin are among the innovative statistical techniques emerging to take advantage of this increased amount of information (Petkau et al. 1995; Waser and Strobeck 1998; Banks and Eichert 2000).  The program WHICHLOCI concerns these individual based population assignment methods but presents the method looking back on itself.  Trial assignments with loci one at a time allows ranking of loci in terms of their efficiency for correct population assignment and conversely their propensity to cause false assignments.  Subsequent trials with increasing numbers of loci determines what minimum number of which specific loci is required in order to attain defined power for population assignment.

Requirements

Program runs on Windows95, 98, 00 or NT (including Macintosh emulations of these operating systems) and has no specific hardware requirements. 

Input File

The program requires data from populations under consideration listed either as genotypes per sample (in the same format used for GENEPOP ((Raymond and Rousset 1995, http://www.cefe.cnrs-mop.fr/) or as allele frequencies per population (in the same format as allele frequency files created in WHICHRUN (Banks and Eichert 2000, http://www.bml.ucdavis.edu/whichrun.htm).  The program is written to analyze co-dominant as well as haploid data.

Download WHICHLOCI
 

Theory and Program Outline

A resample option allows creation of test data for all populations under consideration.  Computer generated random numbers  specify sampling from an allele table created from frequency data for each population.  This table consists of an array of alleles observed in each population, repeating each allele in accord with the frequency of each allele observed in any population.  The user defines how many samples to generate in this manner and has the option to vary sample size among populations. 

Optimum loci combinations that will match user-defined accuracy for population discrimination are determined through two basic procedures.  First, repeated iterations for assignment of test data using the method applied in WHICHRUN (Banks and Eichert 2000) are performed using data from each locus separately, scoring the number of correct assignments to appropriate source populations for each locus.  A rank order for successful assignment is thus determined among all loci.  A second round of iterations invokes loci from the rank increasing the number of loci one at a time until the correct assignment score matches the accuracy criteria set by the user.  The above description covers procedure scoring accuracy across all populations.  An alternate, critical population routine, allows focus on accuracy for assignment to a specific population.  Iterations using data from each locus separately occurs as above but loci are scored according to how many of the trial samples from the critical population are assigned correctly.  Also the number of samples which might originate from other populations but are falsely assigned to the critical population are tallied.  Rank order under the critical population routine is determined by applying the following formula:

LocusScore = % correctly assigned - (% incorrectly assigned * scoreMultiplier), where: 

% correctly assigned =  % of members of the critical population that were correctly assigned

% incorrectly assigned =  # from other populations assigned to critical population / # from other populations

scoreMultiplier =  (100 – User specified accuracy) / User specified inaccuracy 

This allows the user to weight correct assignment or misses according to how important accuracy or inaccuracy might be to the application at hand.  An allele frequency differential following methods described in Shriver et al. (1997) can also be implemented as an alternate means of ranking loci.  As above, a second round of iterations determines empirically how many of which loci are required to match accuracy criteria. 

 There has been increasing interest in the estimation of confidence intervals for assignment results from individual based methods.  Accuracy for this estimation is obviously closely linked to the accuracy of allele frequency information for populations under consideration and is addressed through ensuring that sample sizes among baseline populations matches estimates required in order to provide accurate allele frequency for polymorphic marker types (see Banks et al. 2000).  The issue of confidence interval estimation in the context of population assignment, however, becomes multidimensional given a comparison between alternate likelihoods that a sample may come from each of the populations under study.  The critical population presented above provides a convenient means of summarizing these multidimensional likelihoods from the perspective of the critical population.  WHICHLOCI provides a means for creating multiple trial data sets.  Summary statistical parameters such as variance, standard deviation and standard error across results from each data set are determined following typical formulae (Sokal and Ralph 1987).  A sub-routine written in WHICHLOCI  allows users to bypass the loci ranking routine to determine assignment accuracy, variance, standard deviation and standard error for a user-selected bank of loci.

 We thus present an empirical method for determining which specific combination of loci would most likely provide defined population assignment power for individuals as well as statistical bounds on the performance of any particular group of loci.  Our hope is that this method will allow researchers to maximize power limits in focused population assignment contexts.

Authors

Michael A. Banks1, Will Eichert1 and J.B. Olsen2

1 Bodega Marine Laboratory, University of California at Davis, Bodega Bay, CA 94923 USA 
2 Gene Conservation Laboratory,Alaska Department of Fish and Game,333 Raspberry Road,Anchorage, Alaska 99518-1599

Email:
              michael(dot)banks(at)hmsc(dot)orst(dot)edu
              jeff_olsen(at)fishgame(dot)state(dot)ak(dot)us

Note: This program is under review for Bioinformatics under the title:
                    Which Loci Have The Diagnostic Power You Need?

Thanks

From The Bodega Marine Laboratory, University of California at Davis, P.O.Box 247, Bodega Bay and The Gene Conservation Laboratory, Alaska Department of Fish and Game USA.  Research and development of WHICHLOCI was supported by funds attained from CALFED and the California Department of Water Resources. 

References

Banks, M.A., Rashbrook, V.K., Calavetta, M.J., Dean, C.A. and Hedgecock, D. (2000)  Analysis of microsatellite DNA resolves genetic structure and diversity of chinook salmon in California’s Central Valley. CJ FAS 57:915-927.

Banks, M.A. and Eichert, W. (2000)  WHICHRUN (version 3.2): A computer program for population assignment of individuals based on multilocus genotype data. J. of Hered. 91:87-89.

Raymond, M. and Rousset, F. (1995) GENEPOP (Version 1.2): Population genetics software for exact tests and ecumenicism. J. of Hered. 86:248-250. 

Paetkau, D., Calvert, W., Stirling, I. and Strobeck, C. (1995) Microsatellite analysis of population structure in polar bears. Mol Ecol 4:347-354.

Shriver, M.D., Smith, M.W., Jin, L., Marcini, A., Akey, J.M., Deka, R. and Ferrell, R.E.  (1997) Ethnic-affiliation estimation by use of population-specific DNA markers.  Amer. J. Hum. Genet. 60:957-964.

Sokal, R.R. and Ralph, F.J. (1995) Biometry. San Francisco: W.H. Freeman 

Waser PM, and Strobeck, C. (1998) Genetic signatures of interpopulation dispersal. 
T. Ecol. Evol. 13:43-44.


BML | Research Topics at BML | Salmon Research

BML Home Salmon Research @ BML