Aug 13, Kelley Harris: Signatures of demographic history and error-prone polymerase activity in large genomic datasets

Speaker_photo

Kelley Harris, UC Berkeley

I am an applied math Ph.D. student at U.C Berkeley studying population genetics and evolutionary genomics. I work with Rasmus Nielsen and Yun Song on developing methods for DNA sequence analysis that are derived mainly from coalescent theory. I began this work as an M.Phil. student with Richard Durbin at the Wellcome Trust Sanger Institute.

Title: Signatures of demographic history and error-prone polymerase activity in large genomic datasets

When two individuals are closely related to each other, their genomes contain long regions of DNA that are inherited from recent common ancestors. In these regions, genetic mutations distinguishing one individual from the other will be very far apart, and one can find long tracts of identity by state (IBS) where the two genomes are completely identical. Aligning the two genomes and measuring the distances between all sites where they differ, we can obtain a length distribution of IBS tracts that is highly informative about past population size changes and gene flow events. Using coalescent theory, we can derive the expected length distribution of IBS tracts under a variety of demographic models and demonstrate the use of this framework to date key events in the population histories of humans and polar bears. We make these inferences by obtaining models that predict observed IBS tract length distributions to a high degree of accuracy, but we also observe that real genomic data contains an excess of very short IBS tracts that cannot be explained by any demographic processes. We argue that these short IBS tracts are created by multinucleotide mutations (MNMs) that give rise to multiple SNPs in a single generation. We infer the prevalence and distribution of MNMs in a large dataset of 1,092 human genomes by quantifying deviations from patterns that we expect to observe if all SNPs arise independently. In doing so, we uncover a mutation pattern characteristic of the error-prone DNA polymerase Pol zeta, suggesting that some MNMs result from the action of this enzyme in the human germline. 

Seminar details

Date: August 13, 2014

Time: Lunch will be served at 1:00pm & Lecture will follow at 1:15pm

Location: Clark S361

June 18th, Susan Holmes: Using the data, all the data

Susan Holmes

Susan Holmes

UNFORTUNATELY THIS TALK IS CANCELLED

Trained in the French school of Data Analysis in Montpellier, Susan Holmes has been working in non parametric multivariate statistics applied to Biology since 1985. She has taught at MIT, Harvard and was an Associate Professor of Biometry at Cornell before moving to Stanford in 1998. She teaches the Thinking Matters class: Breaking Codes and Finding Patterns, and likes working on big messy data sets, mostly from the areas of Immunology, Cancer Biology and Microbial Ecology. Her theoretical interests include applied probability, Graph Limit Theory and the topology of the space of Phylogenetic Trees.

Talk: Using the data, all the data

Study of microbiome census data together with covarying tables such as Mass Spectroscopy and Metagenomic data poses statistical and computational challenges linked to heterogeneity of the data structures and data sources. We will give some examples of the problem of irreproducibility in the analysis of such data.

This contains joint work with Joey McMurdie and David Relman and his team.

Seminar details

Wednesday June 18th, 2014
1:15 PM Seminar
Location: Clark Auditorium

June 11th, Maude David: Cross-referencing analysis of autism to identify novel genes and pathways

Maude David

Maude David

Maude joined the Wall lab in January 2014 and namely studies the gut microbiome of children with ASD (Autism Spectra Disorder) utilizing a large scale, crowd sourced clinical trial approach. Her expertise are in microbiology, bioinformatics and biochemistry, using and integrating metagenomics, metatranscriptomics and metaproteomics to understand microbial community functions. She received her PhD in December 2009 from the Ecole Centrale de Lyon, University of Lyon, France, with Prof. T.M. Vogel, on the origin of the dehalogenases and bioremediation of chlorinated solvent. Her grad-school work focused on the bacterial adaptation to chlorinated compounds at the genome (evolution mechanisms) and community (bioremediation) level. After graduation, she became a post-doctoral fellow at Lawrence Berkeley National Laboratory with Prof. Janet Jansson. Her work looked at the impact of climate change on soil microbial ecology and specifically at how altered precipitation affect carbon cycle using meta-“omics” analysis of microbial carbon cycling responses.

Talk:  Cross-referencing analysis of autism to identify novel genes and pathways

Autism Spectrum Disorder (ASD) afflicts one out of 88 people. While the causes of ASD are only partially understood, the disease exhibits an important genetic component, with high heritability and familial clustering.

In order to identify potential candidate genes underlying ASD, we performed two analyses in parallel.

First, we identified rare and de-novo gene variants that appear only in a population with ASD. We took advantage of whole-genome sequencing (WGS), as a tool for identifying ASD risk genes as well as unreported mutations in known loci, and applied it to the genomes of 32 trios with ASD sequenced in a previous study (Jiang et al., 2013). To do so, we developed and used a novel software: COSMOS, which allowed us to rapidly annotate and analyze this dataset (Gafni et al., submitted). We identified rare, previously overlooked ASD-related gene variants by comparing our annotated dataset with SNPs reported in the 1000 Genomes Project. (1000 Genomes Project Consortium et al., 2012).

In parallel with this study, we used the NeuroSynth database to extract brain loci relevant for a set of psychologically relevant terms. We then extracted gene expression values from GEO database. We performed a Gene Set Enrichment Analysis (GSEA) (Subramanian et al., 2005) and found several genes associated with autism.

Cross-referencing these two methods allowed us to identify several pathways common to both analyses. We analyzed the potential impact of these candidate genes on the physiology of the patient by mining the KEGG database and identifying the affected pathways. Our study identified novel genes such as multiple genes involved in oocyte meiosis and thyroid hormone synthesis, as well as targets previously implicated in ASD like RNA transport or MAPK family of enzymes. This integrative approach gives us novel insights into the genetic variant most likely to be involved in ASD.

References

1000 Genomes Project Consortium, Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., et al. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 56–65. doi:10.1038/nature11632

Jiang, Y.-H., Yuen, R. K. C., Jin, X., Wang, M., Chen, N., Wu, X., et al. (2013). Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. American Journal of Human Genetics, 93(2), 249–263. doi:10.1016/j.ajhg.2013.06.012

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545–15550. doi:10.1073/pnas.0506580102

Seminar details

Wednesday June 11th, 2014
1:15 PM Seminar
Location: Clark Auditorium

June 4th, Christine Peterson: Multiple testing procedures for eQTL discovery

Christine Peterson

Christine Peterson

Christine Peterson is a postdoctoral scholar at Stanford working with Chiara Sabatti. Her current research is focused on the development of statistical methods to account for multiple hypothesis testing in the context of multivariate phenotypes. Before coming to Stanford, Christine earned an undergraduate degree in applied mathematics from Harvard and a PhD in statistics from Rice University. Her doctoral research was focused on the inference of biological networks using Bayesian graphical models, including applications to the inference of cellular metabolic networks and protein networks in cancer.

Talk: Multiple testing procedures for eQTL discovery

We are developing methods to both improve power and reduce errors in identifying genetic variants that are relevant to multivariate phenotypes such as imaging features, actigraphy measures, or gene expression. Since both the predictor and response variables are high-dimensional, this represents a massive multiple testing problem. For expression quantitative trait loci (eQTL) studies, which focus on gene expression as the outcome measure, the current standard approach to identify genetic effects is to compute pairwise tests of association for each SNP to its nearby (or cis) genes, then control the false discovery rate across all such tests. The search for distant (or trans) eQTLs is typically run separately and is often underpowered. In this talk, I will discuss improvements to this procedure which allow the integration of cis and trans eQTLs through appropriately chosen weights and which take into account the problem structure by focusing on families of hypotheses based on genomic location. The proposed methods will be illustrated both through simulations and through an application to a study of traits associated with bipolar disorder.

Seminar details

Wednesday June 4th, 2014
1:00 PM Lunch In Clark S360
1:15 PM Seminar
Location: Clark S360

May 28th, Rui Zhang: Evolutionary Analysis Reveals Regulatory and Functional Landscape of RNA Editing

Rui Zhang

Rui Zhang

Rui Zhang is a postdoc in Dr. Jin Billy Li’s lab (2011 – now) in the Department of Genetics, studying the dynamics and evolution of RNA editing. He completed his PhD at Kunming Institute of Zoology, Chinese Academy of Sciences in 2009 with Dr. Bing Su, investigating the miRNA-mediated gene regulatory network. Before moving to Stanford, he worked at Beijing Institute of Genomics with Chung-I Wu, studying the miRNA targeting evolution.

Talk: Evolutionary Analysis Reveals Regulatory and Functional Landscape of RNA Editing

Adenosine-to-inosine RNA editing, catalysed by adenosine deaminases acting on RNA (ADAR), promotes functional diversity and is especially prevalent in neural tissues. A plethora of editing sites has been recently identified; however, how they are selected and regulated and which are functionally important are largely unknown. Using the Drosophila genus as a model, we found the establishment of editing and variation in editing levels are largely explained and predicted by cis regulatory element changes. We show that a large fraction of nonsynonymous and 3’UTR editing sites is under evolutionary constraint, highly edited, and thus likely functional. Furthermore, newborn sites are lowly edited and sparsely distributed across genes with diverse functions, while long-lived sites tend to be highly edited in clusters and enriched in slowly-evolved neuronal genes. Our results suggest that RNA editing, rather than nucleotide substitution at the DNA level, may be the preferred evolutionary means of fine-tuning neuronal functions.

Seminar details

Wednesday May 28th, 2014
1:00 PM Lunch In Clark S360
1:15 PM Seminar
Location: Clark S360

May 21st, Iain Mathieson, “Demography and the age of rare variants”

IainMathiesonIain Mathieson is a postdoc in David Reich’s lab at Harvard, working on approaches to detect selection in recent human evolution using ancient DNA. He is spending this semester at the Simons Institute for the Theory of Computing at UC Berkeley. Before moving to Harvard, he completed his PhD at Oxford with Gil McVean, investigating the genetics of spatially structured populations.

Talk: Demography and the age of rare variants.

The distribution of rare variants can be highly informative about recent population history. One way to use this information is to infer the age of these variants using their surrounding haplotype structure. These ages vary enormously. For example, depending on population, the age of variants at frequency 0.1% in the 1,000 Genomes data varies from tens to thousand of generations, revealing the influence of population splits, admixtures and bottlenecks. We can  derive explicit estimators for historical effective population sizes and migration rates by treating these ages as samples from the distribution of coalescent times. This approach is very accurate for the recent past, which makes it a useful complement to sequentially Markovian coalescent approaches which are most accurate for older events.

Seminar details

Wednesday May 21st, 2014
1:00 PM Lunch In Clark S360, (sign up below)
1:15 PM Seminar
Location: Clark S360

 

May 14th, Philip Labo: Yeast population dynamics: Stopping times and logistic curves.

Philip Labo

Philip Labo

IN CLARK S360!!

Philip Labo studied both biology and computer science as an undergraduate at Penn. He received his B.A., in biology, in 2004. During that time he also worked as a programmer/analyst for the Plasmodium falciparum genome database (plasmodb.org). Philip left Penn for Stanford in Fall 2005 to pursue a doctorate in statistics. His doctoral research focused on the modeling of adaptive evolution in certain populations of baker’s yeast. He also studied the modeling of adaptive evolution in general. During the Spring of 2011 he started working with Jamie Jones, of the Stanford Anthropology Department, on the analysis of evolutionary pressures on life history patterns in the Utah Population Database. Philip now works as a post-doctoral scholar with the Prematurity Research Center in the Stanford School of Medicine lending his statistical expertise to the study of preterm birth in United States. Jamie Jones and Paul Wise oversee his work.

Talk: Yeast population dynamics: Stopping times and logistic curves

Kao & Sherlock (2008) describes eight experiments with baker’s yeast. Each experiment involves a chemostat, a sugar-limited medium, a numer- ically large baker’s yeast population, and the evolution of said population over nearly five hundred generations. The output from these experiments lead us to consider the Wright-Fisher and Moran models of population ge- netics lore (Fisher (1922); Wright (1931); Moran (1958)). How might we expect these populations to behave if under the influence of such simple underlying dynamics? We study expected stopping times and expected path functions, reviewing old results and deriving new. We also provide a loose demonstration of our efforts to fit said models to the data from Kao & Sherlock (2008). While these relatively simple models may “fit” these data, this does not suggest that these simple dynamics actually obtain in real life (see for example Desai & Fisher (2007)).

Seminar details

Wednesday May 14th, 2014
1:15 PM Seminar
Location: Clark S360!!