Dissecting the Learning Curve of Taxi Drivers


Many real world human behaviors can be modeled and characterized as sequential decision making processes, such as taxi driver's choices of working regions and times. Each driver possesses unique preferences on the sequential choices over time and improves their working efficiency. Understanding the dynamics of such preferences helps accelerate the learning process of taxi drivers. Prior works on taxi operation management mostly focus on finding optimal driving strategies or routes, lacking in-depth analysis on what the drivers learned during the process and how they affect the performance of the driver. In this work, we make the first attempt to inversely learn the taxi drivers' preferences from data and characterize the dynamics of such preferences over time. We extract two types of features, i.e., profile features and habit features, to model the decision space of drivers. Then through inverse reinforcement learning, we learn the preferences of drivers with respect to these features. The results illustrate that self-improving drivers tend to keep adjusting their preferences to habit features to increase their earning efficiency while keeping the preferences to profile features invariant. On the other hand, experienced drivers have stable preferences over time.

Stage 1 Data Preprocessing

Stage 2 Inverse Preference Learning

Stage 3 Preference Dynamic Analysis