Recently, artificial intelligence approach like simulation based reinforcement learning (RL) algorithms, Monte Carlo Tree Search (MCTS) are used to solve complex optimization, control and planning problems. It uses combination of Monte Carlo rollout policy and Upper confidence bound algorithm in the search tree to estimate the value. More recently, the deep neural network architecture and MCTS are combined for study of Alpha Go computer game. The idea of simulation based optimization MCTS can be utilized in the material science and chemistry in order to find optimal structure of the material to optimize the target property of material and optimize the chemical reaction in drug discovery, respectively. It is very useful in industrial research and drug discovery. Further, these ideas are used to identify and the predict the AlphaFold protein structure. In this paper we study policy gradient algorithm for reinforcement learning which is studied for optimization of parameterized policy model. We further analyze the combination of PG and MCTS approach and extend the study of DNN architecture for PG-MCTS and MCTS policy optimization. Deep exploration based Bootstrap DQN. and Ensemble DQN are analyzed. We employ simulation based RL methods to optimize the chemical reaction in drug discovery. The use of these methods leads drug discovery with efficient optimization technique and it is less time consuming compared to traditional method of optimization. We provide numerical examples to illustrate the performance of proposed approach in drug discovery and chemical reaction optimization. Next, we develop the study of these algorithms for material design problem in material science where the goal is to design material with desired properties by optimizing composition of material structures. The standard optimization tools may not be appropriate for large scale optimization with large structure and component of materials. We show that MCTS and PG-MCTS performs optimization efficiently compute optimal material design with reasonable accuracy of desired properties. We finally demonstrate the numerical simulations to illustrate the performance of proposed algorithms.
Rahul Meshram is Assistant Professor at Department of Electronics and Communication, Indian Institute of Information Technology Allahabad since April 2021. He obtained B.E. from Nagpur University in 2006, M.E. from Indian Institute of Science Bangalore in 2010, PhD from IIT Bombay 2017. He was postdoctoral fellow at University of Waterloo, Canada in 2018 He was institute postdoctoral fellow at IIT Madras from July 2019 till July 2020. His current area of interests is in Reinforcement Learning, Deep Reinforcement Learning, Stochastic Optimization, Markov Decision Processes, Multi-armed bandits and its applications to drug discovery.