Data and Code Release

This release contains the datasets and the codes for the preference recovering and analysis introduced in the SDM19 paper: Dissecting the Learning Curve of Taxi Drivers: A Data-Driven Approach. Two datasets are included: the MDP(spatial-temporal region based) trajectory data and the feature data, and the codes for inverse preference learning and the preference dynamics analysis will also be included. Reusing the sources released in this website, please cite our publication.



Description of Datasets

Two datasets are released here: 1) MDP trajectory data, and 2) feature data.

MDP trajectory data

The MDP trajectory data contains the trajectory of 200 self-improving drivers and 200 stabilized drivers in July and December. The data is stored in 2 python pickle files, i.e.,
trajectories_drivers_07.pkl,
trajectories_drivers_12.pkl.

Taking ''trajectories_drivers_07.pkl'' as an example, after loading the pickle file in Python program, let’s name it trajs_si in python, it is a list data in python. The length of the list is the number of drivers, i.e., 200. Each element of the list is also a list, let’s name it trajs, which is corresponding to the trajectories of a driver in a month. Each trajectory is represented by a list traj in the list trajs. The traj list contains a list of steps, and the i-th step is represented by:

[grid_index_x_i, grid_index_y_i, time_i],

where grid_index_x_i is the horizontal (west-east) index of the grid in step i, and the grid_index_y_i is the vertical (south-north) index of the grid in step i, and time_i is the index of the time slot of step i.

Feature data

The feature data contains the profile features for each driver and the habit features for all drivers. Each of the features is corresponding to a grid and time slot. And in the IRL program, the features will be encoded to each trajectory based on the steps in the trajectory.

The profile features are stored in a python pickle file, profile_info.pkl, which is a dictionary features_drivers after loading in python program, and the keys of the dictionary are the driver ID's (to protect the privacy of the drivers, we rename the drivers as "si_0", "si_1", etc.).
The habit features, habit_features.pkl, are just a dictionary of features let's call it features here, the keys of dictionary features are the grid and time slot index, and each value of features is a list feature values. And the list of feature values for habit features are:

[num_pickups, mean_trip_dist, mean_trip_time, traffic_condition, dist_to_train, dist_to_airport].



Description of Codes

In this release, we publish two programs for the paper, 1) inverse preference learning, and 2) preference dynamics analysis. The codes are written in Python and the libraries used are: scipy, sklearn, numpy, pandas, and itertools.

Codes for inverse preference learning

The program is named as '' inverse_preference_learning.ipynb'', which takes the trajectories of a group of drivers in a month and the features as input, the preference of each driver in the input month will be output after running the program.

Codes for preference dynamics analysis

After learning the preference of each driver for each group in each month, the preference dynamic analysis program, named as '' preference_dynamics_analysis.ipynb'', will take the preferences in two months as input, and output the t-values for each dimension of preference.