Aug 13, Kelley Harris: Signatures of demographic history and error-prone polymerase activity in large genomic datasets


Kelley Harris, UC Berkeley

I am an applied math Ph.D. student at U.C Berkeley studying population genetics and evolutionary genomics. I work with Rasmus Nielsen and Yun Song on developing methods for DNA sequence analysis that are derived mainly from coalescent theory. I began this work as an M.Phil. student with Richard Durbin at the Wellcome Trust Sanger Institute.

Title: Signatures of demographic history and error-prone polymerase activity in large genomic datasets

When two individuals are closely related to each other, their genomes contain long regions of DNA that are inherited from recent common ancestors. In these regions, genetic mutations distinguishing one individual from the other will be very far apart, and one can find long tracts of identity by state (IBS) where the two genomes are completely identical. Aligning the two genomes and measuring the distances between all sites where they differ, we can obtain a length distribution of IBS tracts that is highly informative about past population size changes and gene flow events. Using coalescent theory, we can derive the expected length distribution of IBS tracts under a variety of demographic models and demonstrate the use of this framework to date key events in the population histories of humans and polar bears. We make these inferences by obtaining models that predict observed IBS tract length distributions to a high degree of accuracy, but we also observe that real genomic data contains an excess of very short IBS tracts that cannot be explained by any demographic processes. We argue that these short IBS tracts are created by multinucleotide mutations (MNMs) that give rise to multiple SNPs in a single generation. We infer the prevalence and distribution of MNMs in a large dataset of 1,092 human genomes by quantifying deviations from patterns that we expect to observe if all SNPs arise independently. In doing so, we uncover a mutation pattern characteristic of the error-prone DNA polymerase Pol zeta, suggesting that some MNMs result from the action of this enzyme in the human germline. 

Seminar details

Date: August 13, 2014

Time: Lunch will be served at 1:00pm & Lecture will follow at 1:15pm

Location: Clark S361


Feb 12th, Dennis Wall: Decoding autisms using machine intelligence and systems medicine

About Dennis


Professor Dennis P. Wall

Dennis Wall is an Associate Professor of Pediatrics at the Stanford University School of Medicine.

The Wall Lab uses machine learning and systems biology to develop clinical solutions for the detection and treatment of autism and other complex human diseases. The lab’s research falls into three categories three general categories: (1) Translating the thinking of systems biology to the field of autism genetics with the intent to develop effective early-stage diagnostics and targets for therapeutic intervention. The work involves the generation and analysis of genomic and phenotypic databases using computational tools of systems biology, machine learning and network inference.(2) Efforts to understand and characterize the clinical significance and utility of human genetic variation. This work involves clinical-grade annotation of human genetic variation, estimating the rates of both true and false positives in present day genetic testing and their likely impacts on the practice of personalized care, the construction of an authoritative knowledgebase for clinical decision support, and efforts in educating present and future doctors on the potentials of genomics in individualized healthcare.(3) Redefining human diseases through computational and comparative network analysis. The work involves the integration and analysis of transcriptomic, genomic and bibliomic data to network all known human diseases. Deliverables include revealing disease connections, properly reshaping blurred boundaries of classification, and opportunities for drug treatment repositioning.

Dr. Wall received his doctorate in Integrative Biology from the University of California, Berkeley, where he pioneered the use of fast evolving gene sequences to trace population-scale diversification across islands. Then, with a postdoctoral fellowship award from the National Science Foundation, he went on to Stanford University to address broader questions in systems biology and computational genomics, work that resulted in comprehensive functional models for both protein mutation and protein interaction.


The incidence of autism has increased dramatically over recent years, making this mental disorder one of the greatest public health challenges of our time. It has a strong genetic component, but molecular pathology remains unclear despite deep sequencing efforts. Thus, the dominant methods for diagnosis rely on behavioral characteristics, however these take hours to administer and often do not reach children until they have aged past key windows of development. In this talk, I will describe recent efforts in my lab to discover both genetic and behavioral markers that enable rapid, early and accurate detection of autism. For the former, I will describe how we have compared the network of autism gene candidates to the complete genetic systems of behaviorally related disorders including ADHD to target novel gene candidates and improve our understanding of the genetic system of autism, and how this work has identified a potentially important role for the immune system. For the latter, I will describe how we have used machine-learning techniques to study over 5,000 autism cases and some of the most commonly used behavioral instruments for autism detection to quicken and mobilize the detection of the core features of autism. Deploying an alternative decision tree-learning algorithm, we identified a procedure that could reduce the total complexity by 93% without loss of accuracy and that can be administered out of the clinic and via mobile media. Such an abbreviated diagnostic instrument could have significant impact on the timeframe of diagnosis, making it possible for more children to receive diagnosis and care early in their development.

Seminar details

Wednesday Feb 12, 2014
12:45 PM Lunch
1:15 PM Seminar
Location: Clark Center S360
Host: Dmitri Petrov