Integrating Governance Factors into WPdx

Governance is recognized as a key aspect of sustainable rural water services. The USAID Governance Research on Rural Water Systems (GROWS) activity was designed to identify and disseminate innovative governance and private sector-derived models and tools to improve water services to help accelerate eliminating extreme poverty in sub-Saharan Africa. GROWS was targeted to USAID Democracy, Human Rights and Governance (DRG) Officers at Missions across Africa to support increase cross-sectoral programing, leading to the enhancement of the delivery of rural water services. The final product of GROWS is a comprehensive toolkit which includes key research findings and program design support for USAID.

A key finding from GROWS was that the effective sharing of data at the community scale had the potential to improve the governance of local drinking water systems, due to greater levels of transparency, accountability and increased the trust between water users and providers. An extension of this finding is that sharing data from individual community water systems can strengthen district and/or regional governance by providing a comprehensive understanding of what is and is not working at different geographic scales.

The Water Point Data Exchange (WPdx) provides a free and open user-friendly platform for data sharing, access, and use, with the ultimate goal of supporting evidence-based decision-making. Through GROWS, WPdx added specific governance-oriented parameters and features to the WPdx+ dataset and decision-support tools app. This includes the integration of new governance-oriented datasets into our status prediction models. These datasets include:

Additionally, WPdx conducted increased data cleaning on the #management parameter to created standardized categories for management including: community based management, direct government operations, private/delegated management, health care facility management, school management, religious institution management, other institutional management, no management and unknown. Users can filter to view the results from the WPdx decision support tools by management type by selecting “Filter By Attribute” from the menu, as shown below:

The comprehensive WPdx user guide, which provides additional details about the platform is available here.

Comparing water point based and household survey based access estimates

Figure shows how DHS regions in Uganda (labeled) compared to WPdx regions (sub-counties from GADM) in red

What data should I use? Is the data valid? These are some of the driving questions facing decision makers around the world. Multiple sources of data are available to decision makers on the state of water access and services. There is relatively strong agreement that reliable data for decision making is needed. At the same time, it is not always clear which data sources are both available and appropriate to answer the questions about where and how to invest resources in water services and how to appropriately target the poorest.

With funding from the United States Agency for International Development (USAID) Governance Research on Water Systems (GROWS) activity, a study was commissioned to determine how water point coverage estimates based on publicly available data from the Water Point Data Exchange (WPdx) compare and contrast with the official Joint Monitoring Programme of WHO/UNICEF (JMP) figures. The goal is to provide recommendations about how these different estimates could be used in tandem and to identify their respective strengths and limitations. The study was carried out by Nick Dickinson of WASHNote.

Comparing between metrics and triangulating different measured results can be useful to validate conclusions and inform decision-making. This study finds a relatively strong correlation and linear trend between these two estimates in four countries that suggests that using household surveys and water point inventories together can be useful to decision makers who may only have one or the other data sources or may want to validate the conclusions from one against another.

The full paper has been submitted for peer review. A link to the paper will be added here once the paper has been accepted.

 

Acknowledgements from the Author, Nick Dickinson of WASHNote

This study would not have been possible without the contribution of open data on water points by data providers to WPdx. Members of the Water Point Data Exchange (WPdx) working group reviewed both the proposal and findings of this work. Katy Sill of WPdx first recognized the potential of the work, provided invaluable feedback, and responded quickly with explanations about how the WPdx algorithms work while investigating and delivering improvements to the tools when required to make this comparison possible.

Similarly, the National Statistics Offices (NSOs) and the Demographic and Health Surveys (DHS) Program of the United States Agency for International Development (USAID) made it possible to use household survey data from different countries. I would like to thank the Joint Monitoring Programme of WHO/UNICEF (JMP) team for sharing country, regional and global estimates of progress on drinking water, sanitation and hygiene (WASH) in households as well as the estimates for the sub-indicators required to generate those estimates, for providing clarifications about the JMP methodology, and for taking time to reflect on study findings.

This material is based upon work supported by USAID under award number 7200AA18CA00033.

Sierra Leone Data Use Impact Desktop Study

Map image from Tonkolili showing investments made and WPdx recommendations.

A desktop study on the potential impact of data use in Sierra Leone was completed by Global Water Challenge for Akvo in support of the Data to Decisions program. A brief summary of the study is shared below, and the full report can be viewed here.

Objective
A common hypothesis is that using evidence to inform decisions regarding placement and repair of water points will lead to more impactful investments compared with traditional methods which rely heavily on political pressures and assumptions. The objective of this desktop study is to determine how many additional people could have theoretically received water services (defined as access to a functional water point within a 1km radius) if decisions about water point investments used evidence-based decision-support tools rather than traditional approaches in Sierra Leone.

Approach
This study focused on analyzing the number of people reached with water point investments made during 2012 in 12 districts in Sierra Leone in comparison with the number who might have been reached if the investment decisions had been driven by evidence. Twelve districts were included in the analysis: Bombali, Bo, Bonthe, Kailahun, Kambia, Kenema, Koinadugu, Kono, Moyamba, Port Loko, Pujehun, and Tonkolili.

Data from water point investments made prior to 2012 were downloaded from the Water Point Data Exchange (WPdx) database to provide a baseline of the data which could have been used to inform decisions made in 2012. Data from investments made in 2012 were also downloaded for the 12 districts from the WPdx database. The majority of data was provided by the Ministry of Water Resources, with additional contributions from non-governmental organizations (NGOs) working in Sierra Leone.

Using the evaluation and repair priority methods, the number of people reached by water point installations and rehabilitations from twelve districts in Sierra Leone were analyzed and compared with the number of people which could have been reached based on recommendations from the (first generation) WPdx decision-support tools. The 2012 Sierra Leone dataset did not clearly differentiate between rehabilitations and new constructions.

 Key Findings

From the available data, there were 1,561 water investments made in 2012, which reached a total of 28,556 people. Had WPdx data been available and used for making decisions on water investments in 2012, it would have been possible to reach nearly four times as many people with only about a third of the cost in water point investments. WPdx recommendations included 430 water point rehabilitation reaching 109,043 people. This is equivalent to a reduction in costs per-person reached from $54.66 to $3.94.

 The full study is available here.

Acknowledgements

With appreciation to Angela Cotugno for her assistance with the GIS analysis and Daniel Siegel for his contributions to the development of the first generation of WPdx geospatial decision support tools.

*Please note the study utilized the first generation of WPdx decision tools which have since been updated, though utilize similar algorithms.

Building Capacity and Improving Decisions with YouthMappers

Photo courtesy of Project team/Gulu YouthMappers

With contributions from Stella Nacakwa, M.S. Candidate, West Virginia University & Courtney Clark, YouthMappers.

The YouthMappers Chapter at Gulu University, in partnership with YouthMappers, West Virginia University, the Water Point Data Exchange (WPdx) and the U.S. Agency for International Development (USAID)recently launched the Uganda Water Infrastructure Mapping Project (U-WIMP). The U-WIMP project has two complementary goals including:

  • building the technical capacity of Ugandan YouthMappers in both mapping and spatial decision-support analysis, and
  • supporting the Gulu District Water office in making evidence-based decisions through the collection of digitized water data and the application of cutting-edge data analytics.

Water point data collection in Gulu has historically been paper-based, making it challenging to compile, analyze, share, and use the data to inform decisions. The YouthMappers U-WIMP project will provide much-needed data updates as well as pilot an approach to digitize drinking water resource monitoring. Dr. Denis Nono, a Gulu University lecturer specializing in water resources, and advisor to the project, applauded the engagement of YouthMappers students as this will provide a significant resource to the Gulu Water District Office as well as a hands-on learning opportunity.

Data collected through the project will be uploaded and harmonized with existing records on the Water Point Data Exchange (WPdx). WPdx is an online platform for sharing, accessing, and using water point data that currently hosts over 600,000 records from over 50 countries. The WPdx database includes 557 water point records in Gulu District, reported between 2010-2022. WPdx hosts a suite of decision-support tools designed to provide decision-makers with information to optimize limited resources by prioritizing locations for investments including preventative maintenance, rehabilitation, and new construction, and providing updated estimates on drinking water service coverage in rural areas for local communities, host country governments, and watershed authorities. Results from the decision-support tools are visualized on interactive web-maps and summary graphs which can be utilized by a range of audiences. Additionally, the collected data will be utilized in combination with high-resolution satellite imagery to build models which can automatically detect water points.

Following data collection and analysis, the team will host a series of workshops with government, NGO and academic stakeholders to share the findings and explore how the information can be used to inform decision-making processes and work planning in the district.

The project activities will build the chapter student members’ technical knowledge of the water sector and serve to connect these youth with their communities. The project also emphasizes gender considerations through the YouthMappers Everywhere She Maps campaign which seeks to improve the availability of geographic datasets for women’s economic empowerment. 

This project will also raise global awareness among YouthMappers Chapters of the importance of evidence-based decision making and effectively managing water points and water resource allocation, and it will build on WPdx’s current work to improve decisions which ultimately lead to an increase in sustainable access to water services.  

Project Objectives

  1. Build capacity of the Gulu University YouthMappers chapter through introduction to rural water challenges and practical experience in collecting field data.  
  2. Pilot deployment of a team of YouthMappers students utilizing open-source software to collect up-to-date digitized water point data to provide a better understanding of current water access and create a more updated dataset for analysis through the WPdx platform.
  3. Partner with district, regional, and national government authorities to integrate findings from decision-support tools into budgeting and planning decisions. 
  4. Explore the feasibility of using satellite imagery to identify existing water points, both manually and through AI/ML predictive models to help build a more comprehensive inventory which can then be used to help prioritize investments in water point monitoring, preventative maintenance, and rehabilitation to support sustainable service delivery.

Photo courtesy of Project team/Gulu YouthMappers

Project Timeline

An initial set of in-person trainings for the University of Gulu YouthMappers student chapter was held in December 2021 and January 2022 including introductions to rural water, OpenStreetMap, WPdx and a practical session on water point data collection. Virtual trainings have continued throughout early 2022 focused on the application of OpenStreetMap to map and visualize features. Additional details and updates on progress on the project can be found on the OpenStreetMap wiki platform.

Next Steps

A comprehensive data collection effort for two sub-counties in Gulu district is scheduled for early summer 2022. Collected data will be cleaned and uploaded to both OSM and WPdx. Collected data will also be utilized to build a training data set using high-resolution satellite imagery for the development of models which can automatically detect water point locations. Decision-support analyses to estimate current coverage levels, and priority areas for rehabilitation, new construction and preventative maintenance will be conducted through the WPdx platform. The results of the data collection and combined analyses will be shared with government and NGO stakeholders during a collaborative workshop.

Celebrating Open Data Day 2022: The Power of Rural Water Point Data for Evidence-Based Decisions

WPdx is excited to continue to promote transparent data sharing and use of open data in the rural water sector through our second annual Open Data Day celebration!

Launch of Decision Support Tools Web App

To celebrate Open Data Day 2022, we are pleased to announce the launch of the new beta WPdx Decision Support Tools web app.

The WPdx Decision Support Tools interactive web app allows users to view and explore available water point data and analytical results from the WPdx+ dataset and suite of decision-support tools.

For more information on the launch of the tools, please visit our related blog post and/or see the links below which provide detailed information on each of the available tools. 

  1. Administrative Region Analysis
  2. Rehabilitation Priority Analysis
  3. New Construction Priority Analysis
  4. Data Staleness Analysis
  5. Functional Status Prediction Analysis (coming soon)

Recognition of Leaders in Data Sharing

Over the past year over 50,000 new water point records have been uploaded to the WPdx platform from 14 different organizations. We want to take this opportunity recognize and celebrate the following entities that have demonstrated their commitment to
transparency and accountability by sharing data with the WPdx platform.

Interested in sharing data with WPdx? Please see here for more details or contact info@waterpointdata.org with questions.

Direct Contributors to WPdx

  • Inter Aide
  • IRC WASH
  • Uganda Water Project
  • USAID Lowland WASH Activity, implemented by DT Global
  • Village Water
  • Water4
  • Water & Sanitation for the Urban Poor (WSUP)
  • Water For People
  • World Serve International
  • YouthMappers

Contributors via Open Data Portals 

Africa GeoPortal

  • Grid3

Humanitarian Data Exchange (HDX)

  • iMMAP
  • United Nations Office for the Coordination of Humanitarian Affairs
  • REACH Initiative

Transforming Data to Action

Over the past year, WPdx has continued to work with government and NGO partners in Ethiopia, Ghana, Uganda and Sierra Leone.

During 2022 we continue to work to integrate the results from the WPdx decision support tools to strengthen existing decision-making processes.

Thank you to our generous funders and key partners:

 

 

Thank you to the entities which have shared data with WPdx in the past year:

 

Launch of WPdx Decision Support Tools

New WPdx Decision Support Tools

We are excited to release the new suite of WPdx Decision Support Tools (v.1.0 beta).

The WPdx Decision Support Tools interactive web app allows users to view and explore available water point data and results from the WPdx+ dataset and suite of decision-support tools.

The decision-support tools provide insights on rural basic water services for each available administrative division, recommendations for where water points should be rehabilitated or constructed, an overview of the average age of available data, and predictions of likely water point status (coming soon). The results from these analyses can provide decision-makers with tangible evidence for allocating resources and developing work plans to improve rural water services. 

For more information on each decision-support tool, please visit the following links or open the “Information” pop-up on the web app.

  1. Administrative Region Analysis
  2. Rehabilitation Priority Analysis
  3. New Construction Priority Analysis
  4. Data Staleness Analysis
  5. Functional Status Prediction Analysis (coming soon)

Please check out our detailed WPdx User Guide for more information about how to use the entire WPdx platform.

WPdx Decision Support Tools Quick Guide

1. Focus. The landing page of the app showcases the entire WPdx+ dataset. From here, users can scroll to view data and results from the global dataset or zoom into geographies  of interest by selecting ‘Filter by Region’ from the header bar. Depending on the country, users can filter all the way down to the Admin 4 level. 

2. Filter. Users can also filter available data by water point source, water point technology or management type if they are seeking information on specific types of water points. Please note that the analyses are conducted on the comprehensive dataset, not on the filtered view.

3. Explore. Users can select the desired decision-support tool from the drop-down menu and view the results from each tool for their geography of interest. Results are available for download in CSV format. Select Download data from the menu in the upper left corner and choose the results file.

4. Download. Results are available for download in CSV format. Select Download data from the menu in the upper left corner and choose the results file.

Example Use Cases

View Water Points

The View Water Points tools allows users to explore all available functional and non-functional water points available in the WPdx+ dataset. Users can choose to filter by region to a specific country, district, or sub-district of interest, and click on individual water points to learn more about that point. Additional filtering options allow users to view water points based on source, technology, and management.

Administrative Region Analysis

The Administrative Region Analysis Tool provides an overview of the rural population with access to basic services, without access to basic services, and uncharted (i.e., data is not available in WPdx to determine access for these populations) for each available administrative level.

  • Rural Population with Basic Access: Population within 1km of a functional water point
  • Rural Population without Basic Access: Population within 1km of a non-functional water point (but not within 1km of a functional water point)
  • Uncharted Rural Population: Population for which no data on water services is available in WPdx. These populations may be without basic access or basic services may exist, but data has not been shared with WPdx.

Users can view chloropleth maps, which provide administrative regional analysis for the percentage of the rural population With Basic Access, Without Basic Access and which are Uncharted.

Illustrative Uses

  •  Prioritizing administrative divisions for budget and resource allocations
  • Identifying target administrative divisions for interventions
  • Evaluating equity

Rehabilitation Priority Analysis

The Rehabilitation Priority Tool provides recommendations for which non-functional water points should be considered for rehabilitation and repair. The tool also provides insights on which water points are critical in that there are limited nearby alternatives and which water points are being over-utilized. Results can be viewed and filtered based on:

  • Potential population that would regain access if point was repaired (default)
  • Total population within 1km of water point
  • Crucialness of water point (i.e., are there alternative water points nearby)
  • Pressure on the water point (i.e., is the water point over or under-utilized)

Illustrative Uses

 Prioritizing which water points to rehabilitate

  • Highlighting areas where there are limited alternative water points available
  • Understanding which water points are over- or under-utilized
  • Benchmarking rehabilitation needs to inform district budgets and workplans  

New Construction Priority Analysis

The New Construction Priority Analysis Tool evaluates all possible locations where a water point could be constructed in a given administrative area and evaluates how many people that are not near an existing water point (regardless of functionality) could gain access if a water point was constructed in that location.

 

Illustrative Uses

  • Identify locations to construct new water points
  • Evaluate the relative benefit of new construction compared to rehabilitating existing water points
  • Provide insights on potential data gaps which could be filled by uploading data to WPdx

Data Staleness Analysis

The Data Staleness Analysis provides a relative measure of the average age of data available from the WPdx+ dataset. 

Illustrative Uses

  • Identifying areas for targeted data sharing outreach
  • Selecting areas for focused data collection
  • Ensuring a clear understanding of the age of data available for other analyses

Questions and Feedback

Please contact info@waterpointdata.org with any questions.

Interested in sharing data with WPdx? Please see here for more details.

Global Water Challenge Announces the Development of WHdx – an Open Data Exchange to Improve WASH in Healthcare Facilities

Global Water Challenge and the Water Point Data Exchange (WPdx) are excited to announce the launch of the development of a water, sanitation, and hygiene (WASH) in health care facilities data exchange platform. Please see the press release for additional details.

The new platform will be a critical resource for governments, NGOs, and companies to close the gap of 1 in 4 healthcare facilities without basic water services. The platform will be developed in partnership the Millennium Water Alliance and funding from the Conrad N. Hilton Foundation. The WASH Health Facility Data Exchange (WHdx)
platform will support decision makers to improve health services through optimized water, sanitation and hygiene (WASH) investments.

According to WHO/UNICEF 2020, globally, 1 in 4 healthcare facilities lack basic water services, impacting more than 1.8 billion people – worsened by large gaps in sanitation, hygiene, and waste management services. As a result, healthcare providers are unable to provide quality patient healthcare and put themselves at risk of infection, a reality further intensified during the COVID-19 pandemic. Given the often-limited resources available, health and WASH leaders must prioritize which facilities receive improvements even when they lack a clear understanding of the gaps.

WHdx will harmonize healthcare facility WASH data into a singular, publicly available dataset through the establishment of a data standard, providing unique data analysis and decision-making tools for both the water and health sectors. Furthermore, WHdx will be able to provide WASH service records from individual health facilities over time and compare health facilities across geographies from village to country-levels, showing locations of greatest need, problematic issues, and recommendations for highest impact interventions.

Building on the Water Point Data Exchange (WPdx), the world’s largest rural water open data platform with 600,000 water point records from over 80 organizations across more than 50 countries, development of the WHdx platform is a collaboration between WASH and health sector experts to ensure that consistent, user-friendly data is readily available for evidence-based decisions.

The WHdx platform will be guided by a working group including Catholic Relief Services, Centers for Disease Control and Prevention (CDC), Emory University, Helvetas, the Safe Water and AIDS Project (SWAP), Millennium Water Alliance, and Global Water Challenge. The process of selecting standard parameters for the platform is currently underway.

 

 

 

WPdx Launches New Rehabilitation Priority Tool

In an ongoing effort to support improved rural water access investment decision-making, WPdx  announces the launch of its updated Rehabilitation Priority Tool which enables users to immediately identify specific water points for prioritized rehabilitation or repair based on population.

The input for this updated analytical tool is the new WPdx+ dataset, a further enhanced and refined version of the original WPdx dataset (WPdx-Basic) which includes additional data cleaning and processing steps for more robust analysis.

Rehabilitation Priority Tool Overview:

  • A series of geospatial population-based analyses to prioritize water points based on potential impact.
  • Additional parameters to consider when prioritizing areas for rehabilitation, including:
    • Population within 1km – Total population within 1km of the water point
    • Users who would gain access – Estimated number of people who would gain access if a currently non-functional water point was rehabilitated. Population assigned to water point considers the existence of functional water points within a 1km radius. Populations are assigned based on relative distance between each population grid cell and the water points.
    • Likely current users – Estimated number of people who could be currently using a working water point. Population assigned to water point considers the existence of functional water points within a 1km radius. Populations are assigned based on relative distance between each population grid cell and the water points.
    • Crucialness score (0-100%) is the ratio of potential users to the total local population within a 1km radius of the water point. Crucialness provides a measure of water system redundancy. For example, if there is only 1 water point within a 1km radius, the water point crucialness score is 100%, meaning that there are no nearby alternatives. If there are two functional water points within 1km, the crucialness score for each point will be ~50% indicating there is some redundancy in the system, so if one water point is broken down, users have an alternative water point available. For non-functional water points, the crucialness score shows how important the water point would be if it were to be rehabilitated. See example here.
    • Pressure score (0-100%) is calculated based on the ratio of the number of people assigned to that water point over the theoretical maximum population which can be served based on the technology. If a point is serving less than the recommended maximum, the pressure score will be less than 100% (i.e., 250/500 = 0.5). If a point is serving more than the recommended maximum, the pressure score will be over 100% (i.e., 750/500 = 150%). The following recommended maximum values (extended from Sphere Guidelines) are currently in use:
      • 250 people per tap [tapstand, kiosk, rainwater catchment]
      • 500 people per hand pump [all hand pumps]
      • 400 people per open hand well [rope and bucket]
      • 1,000 people per mechanized well

Quick peak:

6 key tool features and options:

 

1. Users can filter based on country and administrative division name down to the administrative division 3 (adm3) level.

 

 

2. Users can filter water points by source (borehole, shallow well, spring, etc.), technology (handpump, mechanized pump, etc.) and management (community management, direct government operations, etc.)

3. The Top Water Points table shows the top 15 water points which would be recommended for priority consideration.

  • The default setting will show priority based on number of ‘Served Pop.’
  • For working water points, ‘Served Pop.’ represents ‘Likely Current Users’ and for non-functional points, ‘Served Pop.’ represents ‘Potential users who could regain access’.
  • Users can also click on ‘Population within 1km’, ‘Crucialness’ or ‘Pressure’ and the table will be updated to show the priority for each of these parameters.
  • Users can select to show/hide functional points and points in urban areas, and the table will update to reflect these choices.
 
 

 

 

4.  Users can select options to show/hide different layers, including functional points, population data and roads/buildings. Key options available in top selection bar, with additional options in Settings.

 

 

5. The Legend describes the different visualizations possible through various Settings selections.

 

 

 6. Users can download the full table of results by selecting ‘Download Data’.

  • If you have filtered to a specific location, all data in that administrative area will be included in the download.
  • If you have zoomed in to a sub-area of interest, the download will include all visible data or all filtered data.

Please feel free to ask questions and provide feedback on the new tool.

Introducing WPdx+

The Water Point Data Exchange (WPdx) is pleased to announce the launch of a new analysis-ready dataset, Water Point Data Exchange Plus (WPdx+). WPdx+ is further enhanced and refined version of the original WPdx dataset, now known as WPdx-Basic.

Through the online data playgrounds, all users can:

  • Sort and filter data
  • Create custom own sub-datasets based on location or other parameter of interest
  • Visualize data using charts, graphs and simple maps
  • Download/export data

The WPdx+ dataset is focused on a subset of countries for which WPdx has enough data for the decision support tools to be activated. The tools and dataset enhancements can be made available for any country if a representative dataset can be shared with WPdx.

The WPdx+ dataset is the input for the new suite of decision-support tools which are under development. Please check out the recently released updated Rehabilitation Priority tool. Additional updates to the remaining tools will be released in the coming months.

For more information on how to add a new country to WPdx+, email info@waterpointdata.org with “New Country Interest” in the subject line.

Please see below for a brief summary of the two datasets:

  • All data shared to WPdx is included in WPdx-Basic Global Data Repository.
  • There are five validation/data-cleaning steps which occur during the ingestion process:
    • Ensure that all records contain the required parameters. Records
      that do not contain required parameters are not uploaded. A summary of records included in the upload can be found in the WPdx Data Catalog.
    • Check to ensure that points are located within a country boundary
      per GADM boundaries. Points which do not fall within country boundaries (i.e., in the middle of the ocean) are not uploaded. A summary of records included in the upload can be found in the WPdx Data Catalog.
    • Formatting of entries for consistency. For example, for the Presence
      of Water When Assessed (#status_id) parameter, the repository will shown as “Yes” or “No”.
    • Addition of ‘clean’ version of country name, #adm1, #adm2, #adm3
      based on provided GPS coordinates and GADM boundaries. ‘Clean’ values are appended to the record, leaving all original data intact.
    • Addition of “water_source_clean”, “water_tech_clean” and “management_clean” columns. These new columns are created using fuzzy matching to organize entries into consistent categories. For more information on the cleaning process, please see here.
  • Focused dataset on countries for which WPdx has enough data for decision-support tools to be activated.
  • Further data processing steps including:
    • Removal of records identified as having location mismatches (i.e., data provided states that record is from Country X, but GPS location is in Country Y)
    • De-duplication for any records mistakenly uploaded twice (exact matches only).
    • Assignment of a WPdx_id which matches water point records shared by different organizations and on different dates, based on GPS location.
  • Addition of external relevant data sources which are used in the water point status predictions models, including:
    • Distance between water point and nearest road (primary, seconday and tertiary), town and city using OpenStreetMap data.
    • Additional external data coming soon!
  • Tabular access to results from advance decision support tools:
    • Rehabilitation Priority
      • Which non-functional water point should be prioritized for repair?
      • Population living within 1km of water point
      • Likely current and potential users
      • Crucialness of water point (are there alternate working points nearby?)
      • Pressure on water point (is the point over or under utilized?)
    • Water Point Status Predictions – which water points are at a higher risk of failure? (coming soon)
    • Construction Priority – which locations should be considered for new construction to reach unserved populations? (coming soon)
    • Measure water access by administrative division – how does coverage vary within different sub-national divisions? (coming soon)
  • Inclusion of additional key external parameters (coming soon)

Share your data with WPDx.. in 30 minutes or less!

Sharing data with WPDx has never been easier. In fall 2020, WPDx completed a major overhaul of our ingestion engine to streamline the process for data sharing. This blog will take you step-by-step through the upload process. In most cases, this will take less than 30 minutes to complete! If you have questions, please reach out to info@waterpointdata.org.

Before you start, please review our Data Submission Policy to ensure that you have the correct permissions to share the data.

The first step is to review the WPDx data standard and compare with your organization’s dataset. The ingestion notes file can help you document how to map your data to the standard which will save you time later in the process.

To upload data, the minimum requirements are for the dataset to include location (latitude, longitude in decimal degrees), presence of water when assessed (functional status), date of data inventory, data source (organization providing the data), and information on either/both the source and technology of the water point. While these are the minimum requirements, we highly encourage organizations to share as many parameters as possible to provide a more complete entry. These additional parameters, such as install year or management are utilized in the predict water point status tool.

Accessing the ingestion engine

Once you know which columns from your dataset you want to share, you are ready to start the upload process. Go to http://upload.waterpointdata.org to access the WPDx ingestion engine.

  • Click on “Login to the System.”
  • Please note, the ingestion engine requires a Google account.

After login, you’ll arrive at the ingestion engine dashboard:

Sharing your data file

There are two options for uploading data:

  • Upload a physical file (.xlsx, .xls, .csv) from your computer
  • Provide a web link to an API endpoint, Google Sheet, Dropbox or other online system

To upload a physical file:

Before you upload the file, please rename the file using the following format:

  • Organization Name_Countries included_Month Year of Data included
  • For example, Global Water Challenge_Uganda_Jan2020

Select the “Source Data” tab

  • Select “+ Upload Data File”
  • Click on “Select File”, browse to your organization’s data file and click “Open”
  • “File Upload Successful” message will appear at top of screen

Share data via weblink

To upload from a weblink, you must  provide a weblink with permissions. You will enter the weblink on the Data Import Workbench page after first providing some basic information about your dataset.

  • For Akvo Flow, request an API endpoint from your program manager. The API endpoint will be used in the direct URL box at the beginning of a processing task. For more details, please see here.
  • For mWater, create a datagrid formatted per the WPDx standard. This creates a permanent URL. Click on “Download as XLSX” and copy the download link. Use this in the direct URL box at the beginning of a new processing task. For more details, please see here.
  • For Dropbox, copy the download link (not the sharing link) to use in the WPDx ingestion engine. Use this link in the direct URL box at the beginning of a new processing task. Select the appropriate format from the dropdown.
  • For Google Sheets, ensure that the document is shared publicly (select “Anyone on the internet with this link can edit” from the share settings). Enter the URL for the Google Sheet in the direct URL box at the beginning of anew processing task. Be sure to select Google spreadsheet from the format dropdown. 
  • For custom data platforms, please contact us to determine how we can best connect.

Start New Processing Task

Select “Processing Tasks” tab

Select “+ New Processing Task”

Task Name and Description

Enter the Task Name in the following format:

OrgName_Country/Region_Month/Year of data

For example, Global Water Challenge_Global_2019

Provide the main purpose for the collected data under Description

Metadata

Complete the metadata prompts to provide a detailed overview of the data within your dataset.

The metadata will be visible on the data page for your dataset within the WPDx data catalog.

Point of Contact

Complete Point of Contact details for dataset.

To protect privacy, one option is to use an organizational level email (i.e., data@name.org) which can be forwarded by your organization to relevant contacts.

Agree to Data Sharing Terms

Check box to agree to Data Sharing Terms

Leave visibility as “Only Visible to Me”

Select: Save & go to Workbench

Data Import Workbench

Select your source file from the dropdown.

Allow data to process (this may take a few minutes). The Direct URL and format boxes will auto-populate.

If there are multiple sheets in your file, make sure the correct one is selected.

Scroll down to continue (the “Data is Processing” message may still appear)

If using a web address, enter directly in Direct URL text box and select the appropriate format option.

For JSON formats, be sure to leave the JSON Path field blank.

Data Structure

If your dataset is formatted to include only the column headers and the data, leave Skip Rows/Columns as “0”

If there are additional rows or columns which should be skipped (i.e., additional headers or title cells) enter the number of rows/columns to skip.

For the sample data shown below, you would enter “2” in Skip Rows. Leave Skip Columns at “0”

Ignored Values

If your dataset includes terms for blank/unknown values which should be ignored (i.e., Unknown, N/A, etc.), please enter those terms in the text box.

Use a comma as a separator between terms. Do NOT include any blank spaces between commas and terms.

For example: “unknown,Unknown,N/A,0,null,blank”

Data Mapping: Getting Started

There are two methods to complete the data mapping process:

Primary method..

  • Using the dropdown menu, scroll to select the column header from your dataset which matches the WPDx standard.
  • Some parameters may pre-populate, especially if your dataset is labeled with the WPDx #titles. Verify these selections.
  • Note: you cannot map the same column to two different standard parameters.

Optional method..

If there is a parameter which is not in your dataset, but for which a common value can be applied to all datapoints, Select “Constant…” from the dropdown.

  • Examples
  • #source – Data Source –> Constant: Name of Org
  • #country_id – Country –> Constant: “UG” or “GH”
  • #orig_lnk – Public Data Source URL –> Constant: URL

Data Mapping: Required Fields

There are 6 mandatory parameters:

  • #lat_deg – Latitude
  • #lon_deg – Longitude
  • #status_id – Presence of Water when Assessed
  • #report_date – Date of Data Inventory
  • #source – Organization providing data
  • #water_source – Water Source AND/OR
  • #water_tech – Water Point Technology

Data Mapping: #lat_deg and #lon_deg

Latitude and longitude must be in decimal degrees in WGS84.

Select the appropriate column header which matches with #lat_deg.

Go the next dropdown and make the selection to match #lon_deg

Data Mapping: #status_id

Select the appropriate column header from the dropdown

Default values include Yes/No. “Unknown” values (see slide 14) will be converted to a blank cell in the WPDx Global Data Repository

If your dataset does not include Yes/No, but instead terms such as “Functional/Partial/Non-functional” select “more settings..” and enter those terms.

True Values = terms which indicate the water point IS functional

False Values = terms which indicate the water point is NOT functional

Do not leave any spaces between terms, just a comma (i.e., Yes,functional)

Data Mapping: #report_date

Select the appropriate column header from the dropdown

The system will automatically detect the format of the dates in your dataset

If there are errors indicated, select “more settings…” and choose a specific format. (This should only be an issue in rare circumstance)

Data Mapping: #source

Provide the name of the organization providing the data.

If your dataset includes data from multiple sources, please map the parameter to the appropriate column header that lists each organization.

Otherwise, the entry for Data Source in the About the Data section will be applied to all uploaded records.

Data Mapping: #water_source & #water_tech

At least one of #water_source or #water_tech must be mapped for the upload to proceed.

Select the appropriate column header/s from the dropdown

If the information is constant for all values, you can instead select “Constant.. “ and enter in the appropriate value in the text box.

Data Mapping: Optional Fields

The “Optional Fields” are not required, but they do help to provide a more robust dataset for understanding the status of the local water sector.

Please map as many of the WPDx parameters as possible.

For any parameters which do not align with your dataset, you can select “No value for this field” (this is the default selection) and go on to the next parameter.

For example, if your dataset does not include any information on payment:

 

Data Mapping: #country_id

Select the ISO two letter country classification code, selected from a list of all ISO country codes.

If your dataset includes entries from different countries, this information should be included in your data file. Select the appropriate column header from the dropdown menu.

If your dataset only includes entries from a single file, you can select “Constant..” and enter a value to be applied to all rows.

Data Mapping: #adm1, #adm2, #adm3

#adm1, #adm2, and #adm3 are official administrative division designations

If you have questions, look at GADM.org (see tutorial on next slides) or statoids.com to determine the appropriate designations.

GADM.org: Check administrative divisions

1. Go to GADM.org and Select “Maps”

2. Click on country of interest

3. Select “Show sub-divisions”

4. This creates a map and a list of first-level subdivisions

5. Click on one of the first level sub-divisions

6. Click on “Show sub-divisions

7. This creates a map and list of second level subdivisions

Data Mapping: #activity_id

Select the appropriate column header from the dropdown

If a locally or globally recognized standardized identification number exists (i.e., a physical well ID number of barcode) within your dataset, please use that column

OR

If your organization has a unique id system which would allow water points to be matched within your organization over time, please use that column

 

Data Mapping: #scheme_id

Select appropriate column header from dropdown

Data Mapping: #install_year

Select the appropriate column header from the dropdown.

Note that this field accepts a four-digit year or a full installation date. Only the year will be extracted from full date entries.

Data Mapping: #installer

Select appropriate header from dropdown.

Data Mapping: #rehab_year

Select the appropriate column header from the dropdown.

Note that this field accepts a four-digit year or a full installation date. Only the year will be extracted from full date entries.

Data Mapping: #rehabilitator

Select appropriate header from dropdown.

Data Mapping: #management

Select appropriate column header from dropdown.

Select the management classification of the entity that directly manages the water point. Example management types include:

  • Direct Government Operation
  • Private Operator/Delegated Management
  • Community Management
  • School
  • Healthcare Facility
  • Other Institutional Management
  • Other

Data Mapping: #pay

Select appropriate column header from dropdown.

Data Mapping: #status

Select appropriate header from dropdown.

Please note that the system can not map the same column to two different WPDx parameters. If you would like to use the same column, please duplicate it in your dataset (and change one of the column headers). For example, it may be useful to use the a duplicated version of your functionality column for both #status_id and #status.

Data Mapping: #orig_lnk

If the data is available via a public link, select ‘Constant’ from the dropdown and enter it so that it can be applied to all rows.

If there is to a public link, leave as ‘No value for this field’

Data Mapping: #photo_lnk

Select appropriate column header from dropdown.

If there is to a public link, leave as ‘No value for this field’

Data Mapping: #fecal_coliform_presence

Select appropriate column header from the dropdown

Default values include Present/Presence and Absent/Absence. If your dataset include other terms, select ‘more settings…’ and enter the terms into the True Value and False Value text boxes.

Separate terms with a comma but do not include any spaces.

Complete associated metadata questions at the bottom of the page (see Water Quality Metadata section for more information).

 

Data Mapping: #fecal_coliform_value

Select appropriate column header from dropdown

Complete associated metadata questions

Data Mapping: #subjective_quality

Select appropriate column header from dropdown

Complete associated metadata questions

Data Mapping: #notes

Select appropriate column from header or apply Constant value is appropriate.

The #notes parameter can be used to enter custom data which the host country government or organization has selected.

For example, some organizations want to track seasonality, additional administrative districts, or some combination.

Multiple parameters can be included by creating a column that includes the parameters of interest, separated by a “;” or “…” delimeter.

Water Quality and Notes Metadata

If you mapped the #fecal_coliform_presence, #fecal_coliform_value or #notes columns, please complete the additional metadata question section.

Once mapping is complete

Select “Save” or “Save and Submit for Approval”

Select Save and Submit for Approval when your data has been fully mapped and is ready for upload

The status in the Processing Tasks tab will now show as “Pending”

An administrator will be notified and will complete the uploading process

Once approved, an email will be sent to the uploader’s email address

If the mapping was not successful, you will see an error message indicating which parameter was not mapped and explanation of why. Once the error has been fixed, you can submit the processing task for approval.

Successful Upload!

Once the data upload has been completed by an administrator, the status in the Processing Task will be marked as “Success”. An auto-generated email will also be sent to the account email address. 

You can view an overview of the dataset in the WPDx data catalog by clicking on the eye icon.

The data catalog dataset page includes:

  • Metadata and contact details
  • Ingestion report – summary statistics of the number of rows uploaded and any errors encountered
  • Link to download source file

Data will be visible on the WPDx data repository within 24 hours.

Need to make changes?

Users can edit their datasets and processing tasks to correct errors or make other additions (i.e., add a new column that was not previously mapped).

To remove data from WPDx, please contact the administrator at info@waterpointdata.org with “Request to remove data from WPDx” in the subject headline. Include the name of the source file and the reason for the removal request.

Source Data: Update Contents or Delete

If you realize you have made an error and/or need to edit or amend an existing dataset, go to the Source Data tab, select ‘Update Contents’ and upload a revised file.

Once the file has been updated, go back to the associated Processing Task and check/edit the Processing Task content and data mapping and hit “Save and Submit” at the bottom of the Data Import Workbench page.

Do not use ‘Update Contents’ to initiate a new dataset upload as this will replace any previously shared data. Instead upload a new file and start a new Processing Task.

Editing a Processing Task

If you want to add/edit the metadata for your dataset and/or make changes to the way that the data is mapped to the standard, select “Edit” from the Processing Task tab.

Make any changes and hit “Save and Submit” at the bottom of the Data Import Workbench page.

An admin will be alerted of your update and will review and process the upload.

Questions?

Please contact: info@waterpointdata.org

Check out our Resources and FAQs