Methodology:
This tool uses available WPDx attributes, such as #water_tech, #water_source, #pay, and others as training data for developing a classification machine learning model. The target variable is #status_id. The models are tuned to optimize the precision (percent of water points that are actually broken) and the recall (percent of all broken water points that are identified as high risk). Predictions are based on adjusting calculating the age of each water point based on #install_year and the current year. A priority for each water point (high/medium/low) is assigned based on the relative number of water points within 1 kilometer and the population within 1 kilometer.
Limitations:
Like all predictions, these predictions are based on probabilities and may not reflect the reality of the status of water points at a given point in time.