Mark Wright, known to friends and colleagues by the nickname “Koni”, is a Research Associate in the Department of Plant Breeding and Genetics at Cornell University. Mark studied computer science at Cornell as an undergraduate with interests in high performance computing, distributed computing, network security and cryptography, graduating in 1998. For 3 years following Mark continued his undergraduate research job in the Department of Sociology as a professional programmer, developing and optimizing a dynamic microsimulation model of the United States population for policy research. In 2001, Mark opted for a radical career change and took a Programmer/Analyst position at Cornell with Dr. Steven Tanksley. Over the next 3 years he developed the Solanaceae Genomics Network database and website which continues today as the “SOL Genomics Network”. This was his first exposure to biology and genetics. During these next 3.5 years, Mark exercised Cornell employee benefits taking one course per semester, gradually accumulating a foundational background in biology, genetics, population genetics, and statistics. In the fall of 2004, Mark began pursuing a doctoral degree at Cornell full time, initially in the laboratory and field seeking a “hands on” experience but ultimately returning to analysis of large scale genomics datasets under Dr. Carlos Bustamante. Mark completed his Ph.D. in 2010. Currently, Mark works with Dr. Susan McCouch at Cornell developing and analyzing large genotype and sequence datasets in cultivated Asian rice (Oryza sativa) and its wild progenitor species Oryza rufipogon, as well as related African rice species. Mark’s research interests include plant and animal domestication, inference of population structure and history from large genomic datasets, detecting signatures of selection at putative domestication loci, simulation studies for the optimal design of multiple parent quantitative trait mapping populations, and the development of high performance computational methods to utilize increasingly larger datasets for these and other purposes. More recently, Mark has developed broader interests in applying methods he has developed to systems other than rice, and increasing interest in pharmacogenomics and personalized medicine.
Cultivated Asian rice is broadly divided into two subspecies, Japonica and Indica rice, recognized morphologically since ancient times. More recently, analysis of molecular genetic markers has further characterized these two groups into 3 Japonica subpopulations and 2 Indica subpopulations. Pairwise FST between these 5 major subpopulations ranges from 0.20 to 0.43. The high level of divergence between the subspecies has been argued to support independent domestications but the presence of only a single haplotype at key domestication loci suggests otherwise. Alternatively, the high degree of subpopulation structure in rice may reflect a high level and complexity of structure in the related wild species Oryza rufipogon that may have been used repeatedly in different geographic regions to adapt domesticated rice imported from a single origin to local ecological conditions. While global ancestry of cultivated rice diversity panels has been extensively studied, population structure in Oryza rufipogon remains largely unexplored. Additionally, while it is known that natural and artificial admixture of cultivated rice populations occurs, there is no extensive study of local genome ancestry in diversity panels and core collections. Using a relatively simple approach we develop a method for unsupervised discovery of subpopulation structure at the genome local level to reveal ancient and recent cross-subpopulation introgressions in cultivated rice and explore the subpopulation structure of Oryza rufipogon. Key considerations were estimating and modeling sample specific inbreeding coefficients along with other model parameters such as subpopulation allele frequencies, and the ability to handle dense marker data on the order of 100,000 to 500,000 SNP markers, or more. We apply the method to several different types of rice datasets and find robust results are obtained, even in the case of low coverage (<0.5X) random-sheer NGS data as well as restriction site anchored genotype-by-sequencing (GBS) data with high missing data rates (>35%), yet no data imputation is required. Additionally, we explore broader applications in other systems such as cattle and horse and find this method confirms and extends published findings which used global model based structure analyses and PCA, but yields as well a map of the genome of the genotyped individuals showing introgression tracks in admixed populations. Taken together, these results suggest the method may be useful in a broad range of applications especially in characterizing large sample collections (eg, germplasm core collections) with cheap, “dirty” genotyping methods such as GBS or low coverage sequencing, and in new emerging systems of study where unlike rice there may be little or no history of population genetic analyses characterizing subpopulation structure.
Wednesday Jan 29th, 2014
12:45 PM Lunch: sign up sheet here.
1:15 PM Seminar starts.
Location: Clark Center S360
Host: Carlos Bustamante
Schedule: Rosario Monge (rmonge at stanford.edu)