Registration is closed : the maximum number of participants has been reached !!!

Presentation

Biostatistics and Bioinformatics play a key role in biology research with a spectacular development in recent years due to technological advances. This meeting is part of a series of meetings and exchanges between regional teams in bioinformatics and biostatistics. It offers to researchers and engineers the opportunity to present their activity on methodological or technical developments. It also gives the opportunity to biologists to present original results obtained from the application of bioinformatics/biostatistics methods. It follows the events organized in December 2009, March 2011, March 2012, June 2014, June 2015, December 2016.

We invite all regional teams (academic laboratories and private companies) to submit an abstract. We hope that this meeting will provide the opportunity for students, researchers and engineers from different origins to meet each other, to identify important directions for future research and to facilitate fruitful research collaborations. 

This one-day workshop will consist of two invited lectures, contributed talks and poster presentations. The meeting is opened to mathematicians, physicists, computer scientists and biologists.

This year the meeting will be held on Thursday 13th December 2018 at Inra, center of Auzeville.

 

Invited talks

 

Camille Marchet (IRISA, INRIA, Rennes)

CARNAC-LR and C2C: de novo clustering and detection of alternative isoforms in Third Generation Sequencing transcriptomes

Lately, long read sequencing technologies, referred to as Third Generation Sequencing (TGS), Pacific Bioscience and Nanopore have brought the opportunity to sequence full-length RNA molecules. In doing so they relax the constraint of transcript reconstruction prior to study complete RNA transcripts. By avoiding limitations of previous technologies, and giving access to the trancripts structure, they might contribute to complement and improve transcriptomes studies. This is particularly crucial for non model species where assembly was required. Many biological questions (finding gene signatures for a trait, finding expressed variants...) are classically addressed using transcriptome sequencing. However, this gain in length is at the cost of a computationally challenging error rate (up to 15%) that disqualifies previous short-reads methods. In this work we propose to support the analysis of RNA long read sequencing with a clustering method that works at the gene level. It enables to group transcripts that emerged from a same gene. From the clusters, the expression of each gene is obtained and related transcripts are identified, even when no reference is available. In a second, work-in-progress step, we rely on those clusters to discover patterns of alternative splicing/alternative transcription and to propose a consensus and a quantification for each isoform. I will give a short introduction on the possibilities that Oxford Nanopore reads offer for transcriptomics, and their drawbacks. Then I will present a high-level algorithmic overview of our clustering tool CARNAC-LR. I will show results computed on a mouse brain transcriptome data set sequenced at the Genoscope. Finally I will introduce our current research on isoform detection and consensus computing.

Boris Hejblum (Université de Bordeaux - ISPED ; Inserm U1219 BPH/Inria SISTM)

Controlling Type-I error and false discoveries in RNA-seq differential analyses through a variance component score test

Next generation sequencing (RNA-seq) is the current state-of-the-art technology to measure genome-wide gene expression. As transcriptomics studies grow in size, frequency, and importance, it is becoming urgent to develop and refine the statistical tools available for their RNA-seq data analysis. In particular, there is a need for methods that better control the type-I error as clinical RNA-seq studies are including a growing number of subjects, ever cheaper measurements resulting in larger sample sizes. We propose to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity, in a principled, model-free, and efficient manner for detecting differentially expressed genes from RNA-seq data. Our method, varseq, can identify the genes whose expression is significantly associated with one or several factors of interest in complex experimental designs, including studies with longitudinal measurement of gene expression. We rely on a powerful variance component score test that can account for both adjustement covariates and data heteroscedasticity without assuming any specific parametric distribution for the (transformed) RNA-seq counts. Despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, which can be computed quickly. A permutation version of the test is also derived for small sample sizes, with a principled approach to multiple testing correction suited for a permutation test. Applied to both simulated data and real datasets, we show that our approach has very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods limma/voom, edgeR, and DESeq2. In particular, we show that those three methods can all fail to control the type I error and the False Discovery Rate under realistic settings when the sample size becomes larger, while our method behaves as expected.

 

Online user: 1