Research Topics
I am currently preparing a PhD Thesis in Applied Mathematics at ENS Cachan. It is a collaboration with the Systems Biology Unit at Institut Pasteur for the biotechnology aspects. In short, I apply mathematical tools and abstraction to proteomics data generated on a Liquid Chromatography / Mass Spectrometry platform.
Proteomics Data
The genome is not everything. The proteins are often the molecules that perform the biological functions, whereas DNA provides the instructions for building the proteins. There are extensive interactions between the genome, the proteome and environmental factors. Proteomics approaches to biology study the set of proteins in a global, undirected fashion, and aim at deriving overall characteristics of biological processes as well as the detailed mechanics of life. Proteomics approaches typically use large-scale data acquisition techniques.
The basic tasks in the analysis of the proteome are:
- identification of all the proteins in a sample (determining the amino-acid sequence)
- quantification of all the proteins (measuring concentration)
More advanced questions include:
- comparative analysis of several samples (biomarker discovery)
- model building (pathway analysis)
- post-translational modifications (exact state of proteins, more precise than amino-acid sequence).
Standard proteomics analyses are carried-out using 2D gels. This consists in a two-dimensional separation of the proteins, followed by additional experiments for each spot on the gel. This is slow, especially as the number of protein species for simple living organisms are in the range of 10,000.
Mass spectrometry is a very old analytical technique, which is recently being applied to proteomics approaches. It is technically more difficult because proteins are large molecules, because proteins are usually present in an aqueous solvent, and because proteins have diverse physico-chemical properties that make it difficult to analyse everything on the same platform. Mass spectrometry is very promising because it is very fast and automated. It can easily provide identification and quantification information for thousands of proteins in a 30 minutes experiment. The more advanced questions are being adressed by current research initiatives.
In the course of my PhD thesis, I focus on data acquired on Liquid Chromatography / Mass Spectrometry platforms. The chromatography enables to distinguish protein signals, while the mass spectrometer measures the mass-to-charge ratio of the molecules, which eventually leads to their identification. I use data generated on the Mass Spectrometry platform of Institut Pasteur and also data from international collaborations.
Signal Processing, Image Analysis and Statistics
The application of mass spectrometry to proteomics requires several adjustments to the experimental setup and the signal processing. Here is a list of usual processing tasks:
- Pre-processing
- m/z calibration
- retention time alignment
- intensity normalisation
- baseline detection and removal
- contaminant detection and removal
- Identification
- feature detection
- deconvolution
- MS1 identification
- MS2 identification
- Relative Quantification
- associate pairs of intensity levels in two images
- compute quantification of a feature
Most of these tasks are strongly related to image processing algorithms, and LC/MS data can be easily re-organised into 2D images. However, it is important to note that these are significantly different from natural images. In particular, the grey level is important, and not only the level sets.
Teaching Activities
Current
My PhD thesis allowance includes teaching activities called "monitorat". In consists in 64h of "Travaux Dirigés" (tutorial courses, recitations) teaching mathematics at ENS Cachan.
- 2008-2009:
- 18h Signal Processing (Master 1, ENS Cachan)
46h Probability and Statistics, (Préparation à l'agrégation de mathématiques, ENS Cachan) http://en.wikipedia.org/wiki/Agrégation
- 2007-2008:
- 16h Signal Processing (Master 1, ENS Cachan)
48h Probability and Statistics, (Préparation à l'agrégation de mathématiques, ENS Cachan) http://en.wikipedia.org/wiki/Agrégation
- 2006-2007:
- 16h Signal Processing (Master 1, ENS Cachan)
48h Probability and Statistics, (Préparation à l'agrégation de mathématiques, ENS Cachan) http://en.wikipedia.org/wiki/Agrégation
Past
I taught programming and computer science in Caml at Lycée Stanislas. This was "Travaux Dirigés" in MPSI/MP*.
Contact Information
CMLA UMR 8536
Ecole Normale Supérieure de Cachan
61 avenue du président Wilson
94235 Cachan cédex
Email: <lithiaote AT SPAMFREE cmla.ens-cachan DOT fr>
Unité de Biologie Systémique
Institut Pasteur
RdC Bâtiment Laveran
28, rue du Docteur Roux
75015 Paris
