Using functional data analysis to exploit high-resolution “Omics” data

Marzia Cremona

Penn State University

Date: Wednesday 30 January 2019
Time: 2:00 p.m.
Place: Room 241, 2nd floor, Science III building

Recent progress in sequencing technology has revolutionized the study of genomic and epigenomic processes, by allowing fast, accurate and cheap whole-genome DNA sequencing, as well as other high-throughput measurements. Functional data analysis (FDA) can be broadly and effectively employed to exploit the massive, high-dimensional and complex “Omics” data generated by these technologies. This approach involves considering “Omics” data at high resolution, representing them as “curves” of measurements over the DNA sequence. I will demonstrate the effectiveness of FDA in this setting with two applications. In the first one, I will present a novel method, called probabilistic K-mean with local alignment, to locally cluster misaligned curves and to address the problem of discovering functional motifs, i.e. typical “shapes” that may recur several times along and across a set of curves, capturing important local characteristics of these curves. I will demonstrate the performance of the method on simulated data, and I will apply it to discover functional motifs in “Omics” signals related to mutagenesis and genome dynamics. In the second one, I will show how a recently developed functional hypothesis test, IWTomics, and multiple functional logistic regression can be employed to characterize the genomic landscape surrounding transposable elements, and to detect local changes in the speed of DNA polymerization due to the presence of non-canonical 3D structures.