11/08/2022
The amount of data generated in life sciences is increasing exponentially. This has largely been precipitated by the advent of “-omics” technologies which facilitate high-throughput measurement of all molecules of a certain type from a biological sample. Whereas once a PhD project may have been concerned with analysing the expression of a single gene, nowadays a PhD project more likely involves collecting information about the expression of all genes in multiple cell types and conditions. This has left researchers data-rich, but not necessarily information-rich. However, machine learning approaches can be used to interrogate these large datasets for trends or patterns to provide information.
Machine learning algorithms take sample data (known as training data) and build models that are able to make predictions or decisions without being explicitly programmed to do so.
When the COVID-19 pandemic hit during my PhD, and access to lab facilities was severely reduced, I wondered whether machine learning could help drive my project forwards from the comfort of my own home. In a nutshell, the answer was yes!
My project was on the spatial arrangement of DNA in cells that produce antibodies (B cells). I was interested in how the two metres of DNA in each cell folds up and fits in the nucleus, especially the sections of DNA encoding antibodies. I had generated information on DNA interaction frequencies (i.e. information that indicates whether two sections of DNA are close in three dimensional space) and had access to information on gene expression and on the distribution of epigenetic marks. I was then able to use machine learning to interrogate these large datasets, to determine if gene expression or the distribution of epigenetic marks could predict DNA interaction frequencies. This work revealed trends that furthered our understanding of how the section of DNA that encodes the immunoglobulin light chain folds.
It was clear from my experience that machine learning yields results that allow for the generation of clear, focused hypotheses for further experimentation. It is cost effective, time effective and helps to minimise the use of model organisms.
Whilst my venture in machine learning was limited to the analysis of “-omics” data, the applications of machine learning in life sciences are not limited to this area and are far-reaching. From use in microscopy image classification, to predicting protein conformation and designing drug molecules, the potential uses span a large range of life sciences disciplines.
Moreover, machine learning isn’t a tool that is limited to academia, but can provide great commercial benefit. The findings of the 2019 World Intellectual Property Organization (WIPO) Technology Trends report on Artificial Intelligence support this: patent applications in the machine learning field are most likely to also be classified in the life and medical sciences field, over all other fields analysed, including telecommunication, transportation and security. Given the rate of uptake of machine learning in life sciences in recent years, and the advantages it offered during the pandemic, I anticipate the prominence of machine learning in life sciences patenting is likely to continue for many years.
This article is for general information only. Its content is not a statement of the law on any subject and does not constitute advice. Please contact Reddie & Grose LLP for advice before taking any action in reliance on it.