Complex ecological behaviors are often driven by an internal model, which integrates sensory information over time and facilitates long-term planning. Inferring the internal model is a crucial ingredient for interpreting neural activities of agents and is beneficial for imitation learning. We introduce methods to infer an agent's internal model and dynamic beliefs for a dynamic foraging and game-like navigation tasks. We model agents as rational according to their (possibly defective) understanding of the task and the relevant causal variables that cannot be fully observed. Using a novel gradient-based constrained EM algorithm, we show that it's possible to invert Partially Observable Markov Decision Process (POMDP) from behavior with unknown transition dynamics, partially unknown observation functions and parametrically unknown rewards. We allow that the agent may have wrong assumptions about the task, and our method learns these assumptions from the agent's actions. We validate our method on simulated agents performing suboptimally on a foraging task, and successfully recover the agent's actual model. We show how to extend this approach to a larger range of ecological tasks. The result is a powerful method for eliciting trajectories of latent belief states from behavior that can serve as a powerful tool for interpreting neural activity. |