Pharmaceutical research and development, which involves creating new drugs and making old ones safer and cheaper, is a difficult and error-prone process; not to mention its exorbitant costs. As with most research driven fields, trial and error is what’s responsible for the discovery of new products and optimization of existing processes. From drug discovery, formulation, to manufacturing, data-driven techniques play a significant role in the pharma industry – which might have been overlooked until recently.
A majority of drug discovery projects now use bioinformatics tools to dig out appropriate molecule structures for drugs and test their interactions with host molecules in humans. Vast publicly available databases exist which need to be mined in order to extract meaningful information. The idea is to establish in-silico whether the new drug molecule will work in-vivo or not. The advent of new drugs has been stagnant over the last few decades, which is expected to overturn owing to the availability of vast computational resources and the growing interest of scientists to explore computational solutions for such problems.
As with existing molecules, pharma faces problems in coming up with suitable dosage forms (tablets, capsules, injections, sprays, ointments, etc.) and formulations (mixing the active ingredient with some other chemicals called excipients that are mostly inert stabilizing entities). Suitable dosage forms are chosen in view of the target user, drug delivery requirements, or based on experience. For the purpose of this article, let’s take tablets as they comprise of more than 80% of pharma products. It is imperative to get the formulation right every time. The process involved in creating a single tablet requires the right amount of each powder mixed to the right consistency and then compressed into a tablet of a specific strength. Now consider that a single tableting machine is capable of churning out a hundred thousand tablets in an hour. How much variation can that cause? A lot. That is exactly why regulatory authorities in different countries are so strict about the quality of dosage forms. In efforts to meet standards, the pharma manufacturer cannot do without manual optimizations, experienced personnel, and inevitably, monetary losses.
Improvement of these processes would mean safer drugs, possible reduced prices, and shorter time to market. Scientists like Bourquin, Rowe, and Mendyk have been working to create data-driven solutions to exactly such problems. Small amounts of data, precisely based on lab experiments imitating the manufacturing line are used to create intelligent computational models which are representative of a particular process for powders with certain properties. The models are mostly based on detailed formulation characteristics and manufacturing process conditions. Different versions of random forests, fuzzy systems, and neural networks are being explored to find solutions to different problems. Such models help detect failures beforehand, provide optimized solutions, and help reduce variation among different batches. Furthermore, there is a case for transparency in models since the regulatory authorities require full disclosure of methods used. Black box methods are not well suited for such problems; hence, the use of symbolic regression looks promising. The use of R as a language of choice is more prevalent because the FDA endorses it for statistical analyzes. Systematic studies are being conducted in a generously funded EU project called IPROCOM.
Quality characteristics of tablets are important because they are an indirect indicator of efficacy and bio-availability of drug once swallowed. Finding out how much drug actually went into the blood stream is yet another cumbersome process, not to mention the enormous risks involved. A standard way is to conduct clinical trials and collect that data. It goes without saying that conducting clinical trials is a risky and expensive exercise – and it would be great to use tools which can circumvent the process where possible. Mendyk has developed an empirical way of establishing the clinical trial in silico using data from lab experiments only (open source tool(s) available on Sourceforge).
Data methods are a great tool of empowerment to the research and development teams of pharma companies – big and small alike, to develop new drugs and to make existing ones cheaper and safer. Data scientists should focus their energies towards this field – I think wonders are waiting to happen.