The WPdx Data Standard was created in 2015 by an expert working group and defines a set of basic parameters that should be collected when gathering water point data. The Standard was designed to accept data from a variety of formats to compile a global dataset of water point records. That said, one of the challenges in compiling data from a variety of sources is transforming the different datasets to use a uniform set of terms and categories to ensure the final dataset is consistent and analysis-ready. The Standard allows for open text responses, which means that substantial data cleaning is necessary to create a dataset that is consistently formatted for analysis. WPdx automates this cleaning process using a combination of pre-defined categories, natural language processing (NLP) and detailed reviews, but recognizes that there is the potential for error and misinterpretation. 

The purpose of a data standard is to ensure not only that the right parameters are collected, but that data is collected in a way which is consistent and comparable with data from other organizations and collection efforts. WPdx created a new Monitoring, Evaluation, Adapting, and Learning (MEAL) Guide as an annex to the WPdx User Guide to reflect recent updates to the WPdx Data Standard, which defines both the standard parameters and provides a suite of recommended responses. The WPdx platform will continue to clean and categorize data as needed, but recommends that this document and its associated parameters and responses be used by entities as the minimum required during data collection efforts.

A simple example which demonstrates the needs for this process is a potential set of entries provided under the #water_tech parameter, which describes the system being used to transport water from the source to the point of collection. A common entry for this parameter is a hand pump, and one common manufacturer of hand pumps is Afridev. Depending on the organization collecting data, datasets uploaded to WPdx to describe an Afridev hand pump might include Afridev Handpump, Afridev hand pump, HP – AfriDev, Afri Dev pump, Afri Ev, etc.

In order to make this information analysis-ready, the terms above must be translated into a consistent format. The table below provides a sample of common entries received and how they appear in the WPdx dataset. 

Table 1. Examples of how #water_tech entries are transformed to #water_tech_clean

#water_tech

Common entries received for Afridev Hand Pump

#water_tech_clean

Amended entry on WPdx

Afridev

 

 

 

 

 

Hand pump – Afridev

 

AfriDev
Afridev Handpump
AfriDev Handpump
Hand Pump Afridev
Aferdive pump
Afridev, Hand pump
Pump AFRIDEV
Hand pump Afridev
Handpump-Afridev

Undetectable errors and discrepancies during the data cleaning process may impact the results of predictive analyses like the WPdx Decision Support Tools. If there is a collective interest in having data compiled for analysis, it is imperative to adopt standard parameters and responses across organizations. The WPdx MEAL Guide is a tool for partner organizations to reference during the design of data collection surveys or after data collection but prior to upload to the WPdx platform, so that those most familiar with the data can ensure it is interpreted accurately during the data cleaning process. 

Please review the WPdx User Guide and the MEAL Guide annex to understand the recommended standardized responses for inclusion when developing a survey for both required and optional parameters from the WPdx Data Standard.

Questions and Feedback

Please reach out to info@waterpointdata.org for more information.

Interested in sharing data with WPdx? Please see here for more details.