NASA 1998 SBIR Phase I


PROPOSAL NUMBER: 98-1 24.01-1215

PROJECT TITLE: Reinforcement Learning Approaches for Automated Intelligence Gathering

TECHNICAL ABSTRACT (LIMIT 200 WORDS)

Future autonomous systems must learn optimal behavior by interacting with imperfectly-known environments.  For space exploration, one definition of optimal behavior is gathering a maximum amount of information with minimal cost and risk.  Such complex and multiple-objective optimization problems cannot be solved easily using traditional techniques, especially when the stochastic natures of the environment, resources, and external interactive entities are taken into account.  Reinforcement learning (RL) controllers are ideally suited to such problems, but most RL approaches degrade when the system state and action spaces are continuous and high-dimensional.  Recent research has demonstrated that residual methods can preclude many of these problems.  The proposed work will investigate and refine residual reinforcement learning techniques suitable for high-dimensional systems with many complex subsystem interactions and characterized by an aggregation of continuous, discrete, logical-element, and binary states.  The work shall demonstrate, via simulation, residual reinforcement learning executives that gather information in unknown environments.  The authors will demonstrate that improved performance can be obtained using structure-learning nonparametric models to store the continuous-time policies learned by the RL controller.  The resulting methods will be directly applicable to numerous military and commercial-sector systems, including reconnaissance, data mining, and simulation-based optimization.

POTENTIAL COMMERCIAL APPLICATIONS

In addition to space exploration, the RL algorithms can yield improved routing and sensor-allocation strategies for single or multiple coordinated unmanned vehicles.  In the commercial sector, approaches that seek to maximize information gathered with minimal cost can be applied to challenging tasks that range from data mining to autonomous Internet agents.  Additionally, when coupled with high-fidelity simulations, the proposed RL-based agents can be used to help designers achieve complex system objectives that cannot be achieved readily with traditional optimization methods.

NAME AND ADDRESS OF PRINCIPAL INVESTIGATOR

David G. Ward
Barron Associates, Inc.
1160 Pepsi Place, Suite 300
Charlottesville , VA 22901

NAME AND ADDRESS OF OFFEROR

Barron Associates, Inc.
1160 Pepsi Place, Suite 300
Charlottesville , VA 22901-0807