Feb 5th, David Golan: Accurate Estimation of Heritability in Case-Control GWAS

David Golan (Tel Aviv University)

David Golan (Tel-Aviv University)

About David

David Golan is a PhD candidate in the department of statistics at Tel-Aviv University under the supervision of Prof. Saharon Rosset.

His research spans a wide range of problems in genetics and bioinformatics, ranging from modeling and analysis of deep sequencing data to population genetics problems such as heritability estimation using GWAS data. David is a Colton fellow at Tel-Aviv University and a fellow of the Edmund J. Safra Center for Bioinformatics.

Abstract

Linear mixed effects models (LMMs) have recently gained popularity as the method of choice for estimating heritability from GWAS data. Recent results using LMMs suggest that much of the “missing” heritability can be found in common SNPs with small effects which are unidentified by current-day GWAS due to low power.

However, many of the interesting diseases and disorders studied are rare (typically affecting <1% of the population), and so case-control designs are used, wherein the proportion of cases in a study is usually considerably higher than the proportion of cases in the population.

We show that this over-representation of cases invalidates several key assumptions of LMMs, e.g. the normality and independence of the random effects, and show that ignoring these problems results in shrunken estimates of heritability.

We propose an alternative approach for estimating heritability. We derive the relationship between the genetic similarity and the phenotypic similarity of any two individuals as a function of the heritability, while explicitly conditioning on the fact that both individuals were selected for the study. Our method then entials regressing the pairwise phenotypic similarities on the pairwise genetic similarities and using the slope to obtain an estimate of the heritability. We show, using simulations, that our method yields unbiased estimates which are considerably more accurate than the current state-of-the-art methodology.

Applying our method to several well-studied GWAS yields heritability estimates which are considerably higher than previously published results.

Seminar details

Wednesday Feb 5th, 2014
12:45 PM Lunch: sign up sheet here.
1:15 PM Seminar starts.
Location: Clark Center S360
Host: Jonathan Pritchard
Schedule: Tara Trim (ttrim at stanford.edu)

Jan 29th, Mark Wright (Koni): Local Methods for Evaluating Population Structure and Multiple Admixture in Plant and Animal Populations with Inbreeding

Mark (Koni) Wright, Cornell

Mark (Koni) Wright, Cornell

About Koni

Mark Wright, known to friends and colleagues by the nickname “Koni”, is a Research Associate in the Department of Plant Breeding and Genetics at Cornell University. Mark studied computer science at Cornell as an undergraduate with interests in high performance computing, distributed computing, network security and cryptography, graduating in 1998. For 3 years following Mark continued his undergraduate research job in the Department of Sociology as a professional programmer, developing and optimizing a dynamic microsimulation model of the United States population for policy research. In 2001, Mark opted for a radical career change and took a Programmer/Analyst position at Cornell with Dr. Steven Tanksley. Over the next 3 years he developed the Solanaceae Genomics Network database and website which continues today as the “SOL Genomics Network”. This was his first exposure to biology and genetics. During these next 3.5 years, Mark exercised Cornell employee benefits taking one course per semester, gradually accumulating a foundational background in biology, genetics, population genetics, and statistics. In the fall of 2004, Mark began pursuing a doctoral degree at Cornell full time, initially in the laboratory and field seeking a “hands on” experience but ultimately returning to analysis of large scale genomics datasets under Dr. Carlos Bustamante. Mark completed his Ph.D. in 2010. Currently, Mark works with Dr. Susan McCouch at Cornell developing and analyzing large genotype and sequence datasets in cultivated Asian rice (Oryza sativa) and its wild progenitor species Oryza rufipogon, as well as related African rice species. Mark’s research interests include plant and animal domestication, inference of population structure and history from large genomic datasets, detecting signatures of selection at putative domestication loci, simulation studies for the optimal design of multiple parent quantitative trait mapping populations, and the development of high performance computational methods to utilize increasingly larger datasets for these and other purposes. More recently, Mark has developed broader interests in applying methods he has developed to systems other than rice, and increasing interest in pharmacogenomics and personalized medicine.

Abstract

Cultivated Asian rice is broadly divided into two subspecies, Japonica and Indica rice, recognized morphologically since ancient times. More recently, analysis of molecular genetic markers has further characterized these two groups into 3 Japonica subpopulations and 2 Indica subpopulations. Pairwise FST between these 5 major subpopulations ranges from 0.20 to 0.43. The high level of divergence between the subspecies has been argued to support independent domestications but the presence of only a single haplotype at key domestication loci suggests otherwise. Alternatively, the high degree of subpopulation structure in rice may reflect a high level and complexity of structure in the related wild species Oryza rufipogon that may have been used repeatedly in different geographic regions to adapt domesticated rice imported from a single origin to local ecological conditions. While global ancestry of cultivated rice diversity panels has been extensively studied, population structure in Oryza rufipogon remains largely unexplored. Additionally, while it is known that natural and artificial admixture of cultivated rice populations occurs, there is no extensive study of local genome ancestry in diversity panels and core collections. Using a relatively simple approach we develop a method for unsupervised discovery of subpopulation structure at the genome local level to reveal ancient and recent cross-subpopulation introgressions in cultivated rice and explore the subpopulation structure of Oryza rufipogon. Key considerations were estimating and modeling sample specific inbreeding coefficients along with other model parameters such as subpopulation allele frequencies, and the ability to handle dense marker data on the order of 100,000 to 500,000 SNP markers, or more. We apply the method to several different types of rice datasets and find robust results are obtained, even in the case of low coverage (<0.5X) random-sheer NGS data as well as restriction site anchored genotype-by-sequencing (GBS) data with high missing data rates (>35%), yet no data imputation is required. Additionally, we explore broader applications in other systems such as cattle and horse and find this method confirms and extends published findings which used global model based structure analyses and PCA, but yields as well a map of the genome of the genotyped individuals showing introgression tracks in admixed populations. Taken together, these results suggest the method may be useful in a broad range of applications especially in characterizing large sample collections (eg, germplasm core collections) with cheap, “dirty” genotyping methods such as GBS or low coverage sequencing, and in new emerging systems of study where unlike rice there may be little or no history of population genetic analyses characterizing subpopulation structure.

 

 

Seminar details

Wednesday Jan 29th, 2014
12:45 PM Lunch: sign up sheet here.
1:15 PM Seminar starts.
Location: Clark Center S360
Host: Carlos Bustamante
Schedule: Rosario Monge (rmonge at stanford.edu)

Jan 27th, Roy Ronen: Learning Natural Selection from the Site Frequency Spectrum

Roy Ronen, UCSD

Roy Ronen, UCSD

Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Many tests have been proposed to identify the genomic signatures of natural selection by quantifying the skew in the the site frequency spectrum (SFS) under selection relative to neutrality. We build upon recent work that connects many of these tests under a common framework, by describing how selective sweeps impact the scaled SFS (and cross-population SFS). We show that the specific skew depends on many attributes of the sweep, including the selection coefficient and the time under selection. Using supervised learning on extensive simulated data, we characterize the features of the scaled SFS that best separate different types of selective sweeps from neutrality. We develop a test, SFselect, that consistently outperforms many existing tests over a wide range of selective sweeps. Applying SFselect to variation data from a laboratory evolution experiment of Drosophila melanogaster adapted to hypoxia, we identify loci that strengthen the role of the Notch pathway in hypoxia tolerance, but were missed with previous approaches. Finally, we discuss the challenges and possibilities of learning to identify soft selection (e.g. on the standing variation) and distinguish it from classic hard selective sweeps.

About Roy

Roy Ronen is a PhD candidate in the Bioinformatics and Systems Biology program at University of California, San Diego. He works with Dr. Vineet Bafna on the application of statistical learning and discrete algorithms to population genetics. Before this, he completed his undergraduate degree in computer science at Ben-Gurion University in Israel, and worked at Tel Aviv University’s high-throughput sequencing facility as a software developer. Roy is interested in the theory and application of methods for identifying mechanism of genetic adaptation.

Seminar details

Monday Jan 27th, 2014

12:45 PM Lunch: sign up sheet here. Come early for lunch, because we cannot take food into the seminar room!

1:15 PM Seminar starts.

Location: Munzer Hall (after Li Ka Shing on the medical campus)

Host: Carlos Bustamante

Schedule: Rosario Monge (rmonge at stanford.edu)

Jan 22nd, Yang Li: Use of alternative micro-exons in the developing brain

Yang Li, Oxford University

Yang Li, Oxford University

Less than 5% of all annotated internal exons are 51bp or shorter (micro-exons). The sharp decline in the number of exons of decreasing sizes below 51bp suggests that micro-exons are strongly disfavoured, possibly because the splicing machinery struggles to detect or splice out introns flanking short exons. Surprisingly, over a thousand micro-exons have been annotated within human genes, yet their functional roles have largely been overlooked. I will discuss my work on identifying the roles of micro-exons with a particular focus on micro-exons expressed during human brain development. I will also describe putative regulatory mechanisms that are able to control the usage of these micro-exons in a context-specific manner. Additionally, I will present a simple strategy that allows the discovery of novel micro-exons from next-generation sequencing data, which complements general de novo exon discovery tools such as Cufflinks.

About Yang

At the end of his maths and computer science undergraduate degree at McGill, Yang started to discover biology as a new interest. Fascinated by ageing and the large differences in life expectancy across animals, he started to work on the comparative biology of ageing, first during a summer internship in George Church’s lab at Harvard Medical School, and then as a MPhil project at the University of Liverpool under the supervision of Joao Pedro de Magalhaes. He later moved to the University of Oxford for his PhD to study comparative genomics with Richard Copley and Chris Ponting. In Oxford, Yang made major contributions to the genome and transcriptome analysis of five East-African cichlids and also of the hyperthermophile worm paralvinella sulfincola. He also participated in the study of the painted turtle and bowhead whale genomes. Currently, he is investigating gene architecture evolution with particular emphasis on alternative splicing regulation.

Seminar details

Wednesday Jan 22nd, 2014

12:45 PM Lunch: sign up sheet here. Come early for lunch, because we cannot take food into the seminar room!

1:15 PM Seminar starts.

Location: Munzer Hall (after Li Ka Shing on the medical campus)

Host: Jonathan Pritchard

Schedule: Tara Trim (ttrim at stanford.edu)