|
Avoiding Side
Effects in Complex
Navigation Environments
We explore methods to train agents to
complete tasks and simultaneously
avoid
side effects in the SafeLife. We
demonstrate the effectiveness of
MT-DQN, a
multi task variant of Deep Q
Networks for side effect avoidance.
|
|
Distributed Q-Learning
We implement a distributed version of DQN via the Ray Distributed Framework.
|
|
Offline-RL for Bipedal Robots
Reinforcement Learning requires the entire model of the world or interactive access to the world.
However, the world model may not be always known or it may be expensive or unsafe to perform
multiple interactions with the world. In such scenarios, we would like to make use of existing
transaction data to learn a control policy. This is addressed by a class of algorithms referred to
as "Offline Reinforcement Learning". In this work, we study and implement "Behaviour Cloning"(BC),
"TD3" and a combination of both "TD3+BC" for offline reinforcement learning. We evaluate them on
various syntheic datasets and investigate the performance of each of them on different qualities of
datasets. We also attempt to use offline RL for the real-world bipedal robot "Cassie" and introduce
various datasets for a bipedal locomotion task
|
|
Studying Robustness of Semi-supervised Visual Features to Adversarial Attacks
Neural Network Verification is an important tool towards gauging robustness to adversaries. In this
report, I summarise the work of Salman et al who formulate most past work on LP based neural network
verification as a convex relaxation problem. The framework can handle different activation functions
and pooling layers and also can handle both primal and dual versions of verification. In my work, I
try to evaluate the adversarial robustness of classifiers which are trained to simultaneously
classify as well as reconstruct the input. I focus on two domains, image classification on the
CIFAR10 dataset and Q-Learning in the OpenAI gym cartpole environment.
|
|
MC Dropout for Efficient Exploration
Agents need to explore the world intelligently so as to discover new skills that are useful to
perform downstream tasks. To perform exploration, there have been several methods that
have been introduced in literature – however they lack a one-on-one comparison under the same policy
setting. There is a discrepancy in terms of whether a model-based or a model-free policy is used to
perform exploration and the choice of policy can effect the sample-efficiency of the agent
significantly. In this project, we focus on implementing three exploration methods in model-based
reinforcement learning setting and thoroughly investigate their qualitative and quantitative
performance on the continuous control problem of Point Maze. Our experiments show that while
ensemble based Plan2Explore (Sekar et al. 2020) performs the best, a naive and simple method such as
Monte Carlo Dropout can perform on par with other exploration based methods.
|
|
Motion Planning for Bipedal Robots
Using Sim2Real reinforcement learning, Siekmann et all has been able to demonstrate robust bipedal gaits such as standing, walking, hopping, running etc on bipedal robot Cassie. However, real world applications require robot autonomy above the level of heading velocity and direction to achieve meaningful goals. Motion and path planning using learned behaviors is still an open area of research in Robotics due to the computational requirements of planning algorithms and rapid updates required for real world applications. We tackle this problem for the Cassie Robot on simulated terrains using the RRT* algorithm to rapidly and reliable generate trajectory waypoints to reach the desired goals. These waypoints and Cassie’s real-time pose are used by a controller to set the policy targets which performs the lower level locomotion behavior. We successfully demonstrate our system in simulation in a number of challenging scenarios.
|
|
Visualizing QMDPNet
I created a full fledged GUI Visualizer using Python Tkinter Library to understand the QMDPnet
algorithm. I
visualize various components of a POMDP such as reward map, belief and value function to get an
intuition on
how the algorithm works.
|
|
Deep Learning for Table Interest Point Detection
I attempt to find interest points or corner points of tables in a scene using cues from
semantic segmentation and vanishing lines. Availabilty of semantic information such as
interest points can help mobile robots navigate in a better way.
|
|
Automating GrabCut for Multilabel Image Segmentation
Performing Image Segmentation for 3 labels without user guidance by learning a GMM
for each label and performing alpha expansion algorithm using MRF2.2 Library.
|