Celebrating Open Data Day 2022: The Power of Rural Water Point Data for Evidence-Based Decisions

WPdx is excited to continue to promote transparent data sharing and use of open data in the rural water sector through our second annual Open Data Day celebration!

Launch of Decision Support Tools Web App

To celebrate Open Data Day 2022, we are pleased to announce the launch of the new beta WPdx Decision Support Tools web app.

The WPdx Decision Support Tools interactive web app allows users to view and explore available water point data and analytical results from the WPdx+ dataset and suite of decision-support tools.

For more information on the launch of the tools, please visit our related blog post and/or see the links below which provide detailed information on each of the available tools. 

  1. Administrative Region Analysis
  2. Rehabilitation Priority Analysis
  3. New Construction Priority Analysis
  4. Data Staleness Analysis
  5. Functional Status Prediction Analysis (coming soon)

Recognition of Leaders in Data Sharing

Over the past year over 50,000 new water point records have been uploaded to the WPdx platform from 14 different organizations. We want to take this opportunity recognize and celebrate the following entities that have demonstrated their commitment to
transparency and accountability by sharing data with the WPdx platform.

Interested in sharing data with WPdx? Please see here for more details or contact info@waterpointdata.org with questions.

Direct Contributors to WPdx

  • Inter Aide
  • IRC WASH
  • Uganda Water Project
  • USAID Lowland WASH Activity, implemented by DT Global
  • Village Water
  • Water4
  • Water & Sanitation for the Urban Poor (WSUP)
  • Water For People
  • World Serve International
  • YouthMappers

Contributors via Open Data Portals 

Africa GeoPortal

  • Grid3

Humanitarian Data Exchange (HDX)

  • iMMAP
  • United Nations Office for the Coordination of Humanitarian Affairs
  • REACH Initiative

Transforming Data to Action

Over the past year, WPdx has continued to work with government and NGO partners in Ethiopia, Ghana, Uganda and Sierra Leone.

During 2022 we continue to work to integrate the results from the WPdx decision support tools to strengthen existing decision-making processes.

Thank you to our generous funders and key partners:

 

 

Thank you to the entities which have shared data with WPdx in the past year:

 

Introducing WPdx+

The Water Point Data Exchange (WPdx) is pleased to announce the launch of a new analysis-ready dataset, Water Point Data Exchange Plus (WPdx+). WPdx+ is further enhanced and refined version of the original WPdx dataset, now known as WPdx-Basic.

Through the online data playgrounds, all users can:

  • Sort and filter data
  • Create custom own sub-datasets based on location or other parameter of interest
  • Visualize data using charts, graphs and simple maps
  • Download/export data

The WPdx+ dataset is focused on a subset of countries for which WPdx has enough data for the decision support tools to be activated. The tools and dataset enhancements can be made available for any country if a representative dataset can be shared with WPdx.

The WPdx+ dataset is the input for the new suite of decision-support tools which are under development. Please check out the recently released updated Rehabilitation Priority tool. Additional updates to the remaining tools will be released in the coming months.

For more information on how to add a new country to WPdx+, email info@waterpointdata.org with “New Country Interest” in the subject line.

Please see below for a brief summary of the two datasets:

  • All data shared to WPdx is included in WPdx-Basic Global Data Repository.
  • There are five validation/data-cleaning steps which occur during the ingestion process:
    • Ensure that all records contain the required parameters. Records
      that do not contain required parameters are not uploaded. A summary of records included in the upload can be found in the WPdx Data Catalog.
    • Check to ensure that points are located within a country boundary
      per GADM boundaries. Points which do not fall within country boundaries (i.e., in the middle of the ocean) are not uploaded. A summary of records included in the upload can be found in the WPdx Data Catalog.
    • Formatting of entries for consistency. For example, for the Presence
      of Water When Assessed (#status_id) parameter, the repository will shown as “Yes” or “No”.
    • Addition of ‘clean’ version of country name, #adm1, #adm2, #adm3
      based on provided GPS coordinates and GADM boundaries. ‘Clean’ values are appended to the record, leaving all original data intact.
    • Addition of “water_source_clean”, “water_tech_clean” and “management_clean” columns. These new columns are created using fuzzy matching to organize entries into consistent categories. For more information on the cleaning process, please see here.
  • Focused dataset on countries for which WPdx has enough data for decision-support tools to be activated.
  • Further data processing steps including:
    • Removal of records identified as having location mismatches (i.e., data provided states that record is from Country X, but GPS location is in Country Y)
    • De-duplication for any records mistakenly uploaded twice (exact matches only).
    • Assignment of a WPdx_id which matches water point records shared by different organizations and on different dates, based on GPS location.
  • Addition of external relevant data sources which are used in the water point status predictions models, including:
    • Distance between water point and nearest road (primary, seconday and tertiary), town and city using OpenStreetMap data.
    • Additional external data coming soon!
  • Tabular access to results from advance decision support tools:
    • Rehabilitation Priority
      • Which non-functional water point should be prioritized for repair?
      • Population living within 1km of water point
      • Likely current and potential users
      • Crucialness of water point (are there alternate working points nearby?)
      • Pressure on water point (is the point over or under utilized?)
    • Water Point Status Predictions – which water points are at a higher risk of failure? (coming soon)
    • Construction Priority – which locations should be considered for new construction to reach unserved populations? (coming soon)
    • Measure water access by administrative division – how does coverage vary within different sub-national divisions? (coming soon)
  • Inclusion of additional key external parameters (coming soon)

Share your data with WPDx.. in 30 minutes or less!

Sharing data with WPDx has never been easier. In fall 2020, WPDx completed a major overhaul of our ingestion engine to streamline the process for data sharing. This blog will take you step-by-step through the upload process. In most cases, this will take less than 30 minutes to complete! If you have questions, please reach out to info@waterpointdata.org.

Before you start, please review our Data Submission Policy to ensure that you have the correct permissions to share the data.

The first step is to review the WPDx data standard and compare with your organization’s dataset. The ingestion notes file can help you document how to map your data to the standard which will save you time later in the process.

To upload data, the minimum requirements are for the dataset to include location (latitude, longitude in decimal degrees), presence of water when assessed (functional status), date of data inventory, data source (organization providing the data), and information on either/both the source and technology of the water point. While these are the minimum requirements, we highly encourage organizations to share as many parameters as possible to provide a more complete entry. These additional parameters, such as install year or management are utilized in the predict water point status tool.

Accessing the ingestion engine

Once you know which columns from your dataset you want to share, you are ready to start the upload process. Go to http://upload.waterpointdata.org to access the WPDx ingestion engine.

  • Click on “Login to the System.”
  • Please note, the ingestion engine requires a Google account.

After login, you’ll arrive at the ingestion engine dashboard:

Sharing your data file

There are two options for uploading data:

  • Upload a physical file (.xlsx, .xls, .csv) from your computer
  • Provide a web link to an API endpoint, Google Sheet, Dropbox or other online system

To upload a physical file:

Before you upload the file, please rename the file using the following format:

  • Organization Name_Countries included_Month Year of Data included
  • For example, Global Water Challenge_Uganda_Jan2020

Select the “Source Data” tab

  • Select “+ Upload Data File”
  • Click on “Select File”, browse to your organization’s data file and click “Open”
  • “File Upload Successful” message will appear at top of screen

Share data via weblink

To upload from a weblink, you must  provide a weblink with permissions. You will enter the weblink on the Data Import Workbench page after first providing some basic information about your dataset.

  • For Akvo Flow, request an API endpoint from your program manager. The API endpoint will be used in the direct URL box at the beginning of a processing task. For more details, please see here.
  • For mWater, create a datagrid formatted per the WPDx standard. This creates a permanent URL. Click on “Download as XLSX” and copy the download link. Use this in the direct URL box at the beginning of a new processing task. For more details, please see here.
  • For Dropbox, copy the download link (not the sharing link) to use in the WPDx ingestion engine. Use this link in the direct URL box at the beginning of a new processing task. Select the appropriate format from the dropdown.
  • For Google Sheets, ensure that the document is shared publicly (select “Anyone on the internet with this link can edit” from the share settings). Enter the URL for the Google Sheet in the direct URL box at the beginning of anew processing task. Be sure to select Google spreadsheet from the format dropdown. 
  • For custom data platforms, please contact us to determine how we can best connect.

Start New Processing Task

Select “Processing Tasks” tab

Select “+ New Processing Task”

Task Name and Description

Enter the Task Name in the following format:

OrgName_Country/Region_Month/Year of data

For example, Global Water Challenge_Global_2019

Provide the main purpose for the collected data under Description

Metadata

Complete the metadata prompts to provide a detailed overview of the data within your dataset.

The metadata will be visible on the data page for your dataset within the WPDx data catalog.

Point of Contact

Complete Point of Contact details for dataset.

To protect privacy, one option is to use an organizational level email (i.e., data@name.org) which can be forwarded by your organization to relevant contacts.

Agree to Data Sharing Terms

Check box to agree to Data Sharing Terms

Leave visibility as “Only Visible to Me”

Select: Save & go to Workbench

Data Import Workbench

Select your source file from the dropdown.

Allow data to process (this may take a few minutes). The Direct URL and format boxes will auto-populate.

If there are multiple sheets in your file, make sure the correct one is selected.

Scroll down to continue (the “Data is Processing” message may still appear)

If using a web address, enter directly in Direct URL text box and select the appropriate format option.

For JSON formats, be sure to leave the JSON Path field blank.

Data Structure

If your dataset is formatted to include only the column headers and the data, leave Skip Rows/Columns as “0”

If there are additional rows or columns which should be skipped (i.e., additional headers or title cells) enter the number of rows/columns to skip.

For the sample data shown below, you would enter “2” in Skip Rows. Leave Skip Columns at “0”

Ignored Values

If your dataset includes terms for blank/unknown values which should be ignored (i.e., Unknown, N/A, etc.), please enter those terms in the text box.

Use a comma as a separator between terms. Do NOT include any blank spaces between commas and terms.

For example: “unknown,Unknown,N/A,0,null,blank”

Data Mapping: Getting Started

There are two methods to complete the data mapping process:

Primary method..

  • Using the dropdown menu, scroll to select the column header from your dataset which matches the WPDx standard.
  • Some parameters may pre-populate, especially if your dataset is labeled with the WPDx #titles. Verify these selections.
  • Note: you cannot map the same column to two different standard parameters.

Optional method..

If there is a parameter which is not in your dataset, but for which a common value can be applied to all datapoints, Select “Constant…” from the dropdown.

  • Examples
  • #source – Data Source –> Constant: Name of Org
  • #country_id – Country –> Constant: “UG” or “GH”
  • #orig_lnk – Public Data Source URL –> Constant: URL

Data Mapping: Required Fields

There are 6 mandatory parameters:

  • #lat_deg – Latitude
  • #lon_deg – Longitude
  • #status_id – Presence of Water when Assessed
  • #report_date – Date of Data Inventory
  • #source – Organization providing data
  • #water_source – Water Source AND/OR
  • #water_tech – Water Point Technology

Data Mapping: #lat_deg and #lon_deg

Latitude and longitude must be in decimal degrees in WGS84.

Select the appropriate column header which matches with #lat_deg.

Go the next dropdown and make the selection to match #lon_deg

Data Mapping: #status_id

Select the appropriate column header from the dropdown

Default values include Yes/No. “Unknown” values (see slide 14) will be converted to a blank cell in the WPDx Global Data Repository

If your dataset does not include Yes/No, but instead terms such as “Functional/Partial/Non-functional” select “more settings..” and enter those terms.

True Values = terms which indicate the water point IS functional

False Values = terms which indicate the water point is NOT functional

Do not leave any spaces between terms, just a comma (i.e., Yes,functional)

Data Mapping: #report_date

Select the appropriate column header from the dropdown

The system will automatically detect the format of the dates in your dataset

If there are errors indicated, select “more settings…” and choose a specific format. (This should only be an issue in rare circumstance)

Data Mapping: #source

Provide the name of the organization providing the data.

If your dataset includes data from multiple sources, please map the parameter to the appropriate column header that lists each organization.

Otherwise, the entry for Data Source in the About the Data section will be applied to all uploaded records.

Data Mapping: #water_source & #water_tech

At least one of #water_source or #water_tech must be mapped for the upload to proceed.

Select the appropriate column header/s from the dropdown

If the information is constant for all values, you can instead select “Constant.. “ and enter in the appropriate value in the text box.

Data Mapping: Optional Fields

The “Optional Fields” are not required, but they do help to provide a more robust dataset for understanding the status of the local water sector.

Please map as many of the WPDx parameters as possible.

For any parameters which do not align with your dataset, you can select “No value for this field” (this is the default selection) and go on to the next parameter.

For example, if your dataset does not include any information on payment:

 

Data Mapping: #country_id

Select the ISO two letter country classification code, selected from a list of all ISO country codes.

If your dataset includes entries from different countries, this information should be included in your data file. Select the appropriate column header from the dropdown menu.

If your dataset only includes entries from a single file, you can select “Constant..” and enter a value to be applied to all rows.

Data Mapping: #adm1, #adm2, #adm3

#adm1, #adm2, and #adm3 are official administrative division designations

If you have questions, look at GADM.org (see tutorial on next slides) or statoids.com to determine the appropriate designations.

GADM.org: Check administrative divisions

1. Go to GADM.org and Select “Maps”

2. Click on country of interest

3. Select “Show sub-divisions”

4. This creates a map and a list of first-level subdivisions

5. Click on one of the first level sub-divisions

6. Click on “Show sub-divisions

7. This creates a map and list of second level subdivisions

Data Mapping: #activity_id

Select the appropriate column header from the dropdown

If a locally or globally recognized standardized identification number exists (i.e., a physical well ID number of barcode) within your dataset, please use that column

OR

If your organization has a unique id system which would allow water points to be matched within your organization over time, please use that column

 

Data Mapping: #scheme_id

Select appropriate column header from dropdown

Data Mapping: #install_year

Select the appropriate column header from the dropdown.

Note that this field accepts a four-digit year or a full installation date. Only the year will be extracted from full date entries.

Data Mapping: #installer

Select appropriate header from dropdown.

Data Mapping: #rehab_year

Select the appropriate column header from the dropdown.

Note that this field accepts a four-digit year or a full installation date. Only the year will be extracted from full date entries.

Data Mapping: #rehabilitator

Select appropriate header from dropdown.

Data Mapping: #management

Select appropriate column header from dropdown.

Select the management classification of the entity that directly manages the water point. Example management types include:

  • Direct Government Operation
  • Private Operator/Delegated Management
  • Community Management
  • School
  • Healthcare Facility
  • Other Institutional Management
  • Other

Data Mapping: #pay

Select appropriate column header from dropdown.

Data Mapping: #status

Select appropriate header from dropdown.

Please note that the system can not map the same column to two different WPDx parameters. If you would like to use the same column, please duplicate it in your dataset (and change one of the column headers). For example, it may be useful to use the a duplicated version of your functionality column for both #status_id and #status.

Data Mapping: #orig_lnk

If the data is available via a public link, select ‘Constant’ from the dropdown and enter it so that it can be applied to all rows.

If there is to a public link, leave as ‘No value for this field’

Data Mapping: #photo_lnk

Select appropriate column header from dropdown.

If there is to a public link, leave as ‘No value for this field’

Data Mapping: #fecal_coliform_presence

Select appropriate column header from the dropdown

Default values include Present/Presence and Absent/Absence. If your dataset include other terms, select ‘more settings…’ and enter the terms into the True Value and False Value text boxes.

Separate terms with a comma but do not include any spaces.

Complete associated metadata questions at the bottom of the page (see Water Quality Metadata section for more information).

 

Data Mapping: #fecal_coliform_value

Select appropriate column header from dropdown

Complete associated metadata questions

Data Mapping: #subjective_quality

Select appropriate column header from dropdown

Complete associated metadata questions

Data Mapping: #notes

Select appropriate column from header or apply Constant value is appropriate.

The #notes parameter can be used to enter custom data which the host country government or organization has selected.

For example, some organizations want to track seasonality, additional administrative districts, or some combination.

Multiple parameters can be included by creating a column that includes the parameters of interest, separated by a “;” or “…” delimeter.

Water Quality and Notes Metadata

If you mapped the #fecal_coliform_presence, #fecal_coliform_value or #notes columns, please complete the additional metadata question section.

Once mapping is complete

Select “Save” or “Save and Submit for Approval”

Select Save and Submit for Approval when your data has been fully mapped and is ready for upload

The status in the Processing Tasks tab will now show as “Pending”

An administrator will be notified and will complete the uploading process

Once approved, an email will be sent to the uploader’s email address

If the mapping was not successful, you will see an error message indicating which parameter was not mapped and explanation of why. Once the error has been fixed, you can submit the processing task for approval.

Successful Upload!

Once the data upload has been completed by an administrator, the status in the Processing Task will be marked as “Success”. An auto-generated email will also be sent to the account email address. 

You can view an overview of the dataset in the WPDx data catalog by clicking on the eye icon.

The data catalog dataset page includes:

  • Metadata and contact details
  • Ingestion report – summary statistics of the number of rows uploaded and any errors encountered
  • Link to download source file

Data will be visible on the WPDx data repository within 24 hours.

Need to make changes?

Users can edit their datasets and processing tasks to correct errors or make other additions (i.e., add a new column that was not previously mapped).

To remove data from WPDx, please contact the administrator at info@waterpointdata.org with “Request to remove data from WPDx” in the subject headline. Include the name of the source file and the reason for the removal request.

Source Data: Update Contents or Delete

If you realize you have made an error and/or need to edit or amend an existing dataset, go to the Source Data tab, select ‘Update Contents’ and upload a revised file.

Once the file has been updated, go back to the associated Processing Task and check/edit the Processing Task content and data mapping and hit “Save and Submit” at the bottom of the Data Import Workbench page.

Do not use ‘Update Contents’ to initiate a new dataset upload as this will replace any previously shared data. Instead upload a new file and start a new Processing Task.

Editing a Processing Task

If you want to add/edit the metadata for your dataset and/or make changes to the way that the data is mapped to the standard, select “Edit” from the Processing Task tab.

Make any changes and hit “Save and Submit” at the bottom of the Data Import Workbench page.

An admin will be alerted of your update and will review and process the upload.

Questions?

Please contact: info@waterpointdata.org

Check out our Resources and FAQs

Increasing water point data sharing for evidence-based decision-making in Ethiopia: The start of a journey

Guest Blog by Laura Brunson, Ph.D., Deputy Director, Millennium Water Alliance

Ethiopia is home to 112 million people with more than 81 million living in rural areas. According to a 2017 Joint Monitoring Program report, in rural areas of Ethiopia, only 4% have access to safely managed water services, 30% have basic water service, and 26% have only limited water service, with the rest consuming water from unimproved or surface water sources. Many households spend 30 minutes or longer to obtain drinking water daily. Despite Ethiopia’s achievement of the 2015 Millennium Development Goal on water, there are several unaddressed challenges that hinder safe and sustainable water service delivery for millions of rural Ethiopians.

The One WASH National Program (OWNP) of Ethiopia is a flagship program that has enabled the WASH sector to collaborate more broadly and achieve substantial progresses since its launching in 2013. The evaluation of the first phase of the OWNP (2013-2017) commissioned by Ministry of Water, Irrigation and Electricity (MoWIE) revealed a set of bottlenecks encountered by the water supply sector in Ethiopia. These include: lack of an independent regulatory entity, inadequate involvement and resource for the private sector including microfinance institutions, absence of harmonization between water inventory and other sector data, and the absence of an operational Management Information System (MIS) for data.

Challenges with Data Management

Cognizant of the gap in water point data availability and utilization, a series of workshops conducted through the MWA partners in 2018-2019 in the Amhara Region provided clarity on the type and extent of challenges faced by WASH sector partners due to this data challenge. Some of the major problems associated with the lack of up-to-date water point data included: inability on the side of water actors to make evidence-based decisions on operation and maintenance, increased non-functionality rate of water points, and difficulties with financial resource allocation. Rural water point data is housed in a multitude of formats and places with woreda (district) governments, national government, and NGOs which all have their own sets of data for particular geographic areas that are often outdated.

MWA and its members and partners recognize the urgent need to strengthen the government’s capacity in water supply data management, analysis, and evidence-based decision making. Strengthening the government-led monitoring system is one of the priorities identified for MWA and partners in the on-going Sustainable WASH Program, 2019-2024, funded by the Conrad N. Hilton Foundation. This program operates in the Dera, Farta and North Mecha woredas; MWA members in Ethiopia, via other programs and funders, are working on water in more than 100 woredas.

A Platform for Data Sharing, Access and Analysis

WPDx is a free and open-source platform that serves as a repository for rural water point data and provides decision-making tools to support governments and other stakeholders. Over time the WPDx Working Group developed a short list of standard parameters that are required in any data upload, alongside a much larger list of optional parameters which support a more robust data set. Recent updates to WPDx have resulted in a fast and easy method by which data can be shared and is collection tool agnostic (e.g. data collected in ODK, mWater, Akvo and other forms can all be used). WPDx provides maps and easy ways to search for and access data by organization, location, and other parameters. One of the strengths of WPDx is its ability to compile data from multiple entities or platforms (e.g. government and NGO data from one district) and make all of it available for use. WPDx provides four decision-support tools, three using geospatial analytics and one using machine learning to make status predictions. The three geospatial tools include: assessment of basic water access by district, prioritized locations for implementation of new water points, and prioritized locations where repairs are most impactful. The predictive tool provides insights on the probability that any given water point will be functioning or not.

WPDx can serve as a helpful tool for governments and NGOs as many are seeking to improve their data collection, use and analysis. With quick and easy data uploading, dispersed and fragmented data sets can be combined for easy visualization and then free access to the analysis and decision-making tools. The Government of Sierra Leone has already incorporated WPDx as one of its focus tools and has mandated that all rural water point data collected across the country must be shared into the WPDx platform. The national Ministry of Water Resources has a national directive in place to require the use of WPDx decision support tools in all investment decisions for rural water services. You can see more about the role of WPDx in Sierra Leone here.

A New Partnership

In 2020 MWA and the Global Water Challenge (GWC) developed a partnership with the goal to provide support in response to these identified rural water point data challenges in Ethiopia. As a starting point, MWA had a series of discussions with the Water Development Commission (WDC), which sits within MoWIE, about their challenges and whether or not WPDx could be a useful tool to help strengthen rural water service delivery, functionality and informed decision-making in line with the Sustainable Development Goal target 6.1. The WDC expressed interest in using WPDx and getting support from MWA, its members, and other stakeholders to use WPDx in Ethiopia.

Several resulting actions took place:

  • WDC issued a formal letter requesting NGOs to share rural water point data to WPDx
  • WDC issued a formal letter of support to GWC to collaborate to implement WPDx
  • WDC assigned a focal WPDx person to support these efforts.

Building on this expressed interest and formal requests from the Ministry, MWA provided a series of training opportunities for NGOs and government partners to learn more about WPDx. This first set of trainings included:

  1. Purpose and value-add of WPDx in the water sector
  2. How to easily upload data using the new WPDx ingestion engine
  3. Example from Sierra Leone showing use of WPDx by national government
  4. Noting that MoWIE, via WDC, has approved partnership with WPDx and encourages organizations to share their data to WPDx.

A second training was developed to provide information on the decision-making tools. This training included:

  1. Overview of the available decision support tools
  2. How to use the tools
  3. Discussion on how the decision support tools can be useful to regional and woreda governments in prioritizing locations for new water points or rehabilitation and for monitoring basic service delivery levels.

Training sessions were guided by presentations delivered by MWA with support from GWC and then followed by interactive question and reflection sessions by the participants. Participants were encouraged to share data during the trainings to practice data uploading and then return to their home organizations or offices and share larger recent data sets to the WPDx platform.

Select Lessons Learned from the Training Process

  • The introduction of the WPDx initiative to NGO partners and their willingness to engage in trainings and upload data has demonstrated the desire and readiness of WASH stakeholders in Ethiopia for a robust rural water point data platform for planning and decision-making. The importance of monitoring data for the WASH sector has been highlighted in several workshops convened by the government for some years. Nevertheless, an open-source platform like WPDx that can be used by each and every WASH stakeholder has never been planned or practiced in Ethiopia. The reflection from the trained focal persons about its accessibility and ease of application has indicated potential for continued use and ability to add value.
  • The WPDx initiative and the training provided motivated NGO partners to start compiling the most recent data they have and updating existing water point data. On average, most organizations required about a month to obtain necessary permissions to share data, from government or headquarter offices, clean their data, and upload it to WPDx.
  • The reflection and expressions of demand from the Water Development Commission of Ethiopia implies the potential for WPDx to be aligned with the National MIS for water supply.

Just within a three-month period following the trainings, NGOs and government entities have uploaded more than 20,000 data points.

While this is great progress, particularly during a time of pandemic and political unrest, this is only the beginning. The MWA, GWC, and WDC partnership continues with next steps including another training series for more organizations and government entities, support for a critical mass of uploads in target districts to support use of the analysis and decision-making tools and then development of a case study, and support to increase the geographic scope in which WPDx data is shared within the country. Stay tuned for future updates on the uptake and use of WPDx by National and local government in Ethiopia.

 

Funding for the work discussed in the post was generously provided by the DT Institute and the Conrad N. Hilton Foundation.

Photos: Credit to Tedla Mulatu Millennium Water Alliance. Photos show government and NGO partners engaged in WPDx training sessions.

 

The Millennium Water Alliance is a permanent global alliance of leading humanitarian and private organizations that convenes opportunities and partnerships, accelerates learning and effective models, and influences the WASH space by leveraging the expertise and reach of its members and partners to scale quality, sustained WASH services. MWA’s 20 members work in more than 90 countries around the world. MWA serves as a hub for major programs in Kenya and Ethiopia.

The Water Point Data Exchange is an initiative of the Global Water Challenge (GWC). GWC is a coalition of leading organizations committed to achieving universal access to safe water, sanitation, and hygiene (WASH) and women’s empowerment. With companies, civil society partners, and governments, GWC accelerates the delivery of safe water and sanitation and supports gender equality through partnerships that catalyze financial support and drive innovation for sustainable solutions.

How to use machine learning to predict water point status

Guest Blog by Lars Heemskerk, Consultant for Akvo

< The water point you selected is probably no longer functional > 

If you’re responsible for providing drinking water to as many people as possible, this is the kind of information you want to have access to – especially when you’re hundreds of miles from the water point in question. Thanks to the support of the Dutch Ministry of Foreign Affairs and the Coca-Cola Foundation (TCCF) Akvo, together with WPDx and DataRobot, was able to conduct a pilot in Sierra Leone with machine learning algorithms to automate decision intelligence.

Improving water services in Sierra Leone

 As of 2012, the government of Sierra Leone has been monitoring water points through a large-scale national inventory, as well as small-scale monitoring efforts by NGOs. Data has been collected on the functionality, year of construction, type of pump, type of management, distance to village, etc. to calculate the percentage of the population that have access to drinking water. This data provides a global insight into the state of WASH infrastructure in the country and, because Sierra Leone is at the forefront of African countries sharing data openly, a lot of this data is available on platforms like WASH data Sierra Leone and WPDx.

 Unfortunately, this data is not regularly enriched, so the information on these portals is quickly outdated and therefore less reliable. Thanks to various efforts from WPDx, among others, the importance of regular uploading of data has been emphasised in the National Digital Monitoring Approach. The recent signing of a letter by the director of the Water Directorate, which states the mandatory sharing of water point data by every organisation or government body in Sierra Leone, is an indispensable step in this process.

 In addition, Akvo, in collaboration with WPDx and the Ministry of Water Resources, has started to explore how more can be done with the existing data, at local and national level, to generate data-driven insights that can improve decision making. Machine learning is relatively new in the water sector, but can be applied very well to historical data to predict outcomes and uncover patterns not easily spotted by humans.

Setting up the foundation for advanced analytics

Machine Learning is about recognising patterns in data. Using data collected in the past, machine learning techniques can recognise patterns and make predictions for the future. This can be applied to historical water point data, too. 

Based on the available data, and with the help of DataRobot software, we have been able to determine a number of indicators that are related to the predictable metric – functionality. By combining functionality with other indicators, such as district, county, management, age, water source, and type, the system can teach itself to predict the probability that a water point will be functional now or in the future. The tool is made available on the Water Point Data Exchange.

By using the DataRobot platform, we were able to predict which water points are going to break with an accuracy of 85%. By applying these machine learning models, it’s possible to determine which broken water point, out of thousands, should be fixed first to help the most people. On top of this tool, decision makers can also make use of other geospatial information services (GIS) tools that have been developed to analyse water points to determine high impact locations for rehabilitation, construction and estimating basic water coverage aligned with the Sustainable Development Goals (SDGs).

Pilot training and support 

Implementing these new advanced analytics techniques, it is just as important to involve and train stakeholders. This is not an easy process because it involves major process changes and the involvement of various governmental and non-governmental organisations. In 2019, the Global Water Challenge already held a three day training session with all district water directorates to discuss the transformation of the WASH sector to improve efficiency through the use of data. Following this session, a meeting was held to brief NGOs on the WPDx approach. Building on this general training, more focused training was provided to district mapping officers and NGOs. The next step was to set up a plan on how to use and implement the decision support tool. At the moment of writing this blog, a draft plan has been created and a workshop has been organised to dig deeper into how the decision support tools can contribute to safe water for all in Sierra Leone.

The need for more accurate data

Beside the involvement of NGOs and government bodies, reliable and up-to-date data is crucial for making correct predictions. Since the last national inventory dates back to 2016, it’s important that the water points are structurally monitored. With the letter from the above mentioned Water Directory, there will be a boost of more recent data which will certainly have a positive effect.

We also encourage stakeholders to test whether the machine learning predictions correspond to reality. This can be done on a small scale. There are talks with the Ministry of Water Resources and InterAide to carry this out and test whether the outcomes of the tools are correct and usable in the daily life of decision makers. We would like to continue with this in 2021, in order to prove the power of advanced analytics, but above all to provide drinking water to as many Sierra Leoneans as possible.

Celebrating Open Data Day 2021: The Power of Rural Water Point Data to Improve Decisions

WPDx is excited to promote transparent data sharing in the rural water sector through our first Open Data Day celebration!

Bringing Together the Pieces of the Puzzle

Across the WASH sector there is growing recognition that regular monitoring, data collection, and evidence-based decision making can improve water access program outcomes, and many organizations and governments are working diligently to collect data in their areas of operation. However, unless data is openly shared, entities are only able to utilize their own data to make decisions – which is only one piece of the puzzle.

Sharing data through the WPDx platform enables the puzzle pieces to come together to show the entire landscape and provide a more comprehensive understanding of the water sector.  This link shows how WPDx works to harmonize data regardless of which organization collected the data or which collection platform was utilized. The harmonized dataset, available on the WPDx Data Repository, also serves as a starting point for robust decision-support analysis.

New Predict Water Point Status Tool… Coming Soon

To demonstrate the power of using open data to improve rural water decisions, we will soon be launching an updated version of our Predict Water Point Status tool. The results from this tool provide insights about which water points may break down in the near future, which can be used to inform decisions around preventative maintenance, increased monitoring and resource allocations. We are working on similar updates to our remaining tools which will launch later in 2021.

Recognition of Leaders in Data Sharing

To mark our first celebration of Open Data Day, we take this opportunity to recognize the entities that have demonstrated their commitment to transparency and accountability by sharing data with the WPDx platform, contributing over 40,000 new water point records from 28 countries in the past year. Special recognition goes to the following organizations: 

Countries with the most water point records uploaded in the last year

  • Ethiopia
  • Sierra Leone

Governments with demonstrated national commitment to collecting and using WPDx data for decisions

  • Ministry of Water Resources, Sierra Leone
  • Water Development Commission in the Ministry of Water, Irrigation and Energy, Ethiopia

Government Agencies that have shared most data in the past year

  • Ministry of Basic and Secondary School Education of Sierra Leone (in partnership with the Ministry of Water Resources of Sierra Leone)
  • Dera, Farta, and North Mecha Water and Energy Offices (Ethiopia)

Organizations that shared the most data in the past year

  • Community-Led Accelerated WASH program (COWASH)
  • Living Water International

Organizations that shared data from the most countries in the past year

  • Living Water International
  • WaterAid

Organization that demonstrated their commitment with automated updates

  • Ugandan Water Project

 

Thank you to our generous funders and key partners

 

 

 

Entities which have shared data with WPDx in the past year

 

Upcoming Open Data Day 2021 Celebration

*Please see here for our updated post and special recognitions!*

March 6th, 2021 is International Open Data Day, an opportunity to promote awareness and use of open data.

The Water Point Data Exchange (WPDx), the world’s largest open data repository for rural water point data, is going to celebrate Open Data Day by sharing information about how data use can improve decisions, encouraging data sharing with the WPDx platform, and recognizing contributing organizations.

The celebration will include an updated post on the WPDx website to appreciate the organizations which have shared data to WPDx in the past year, demonstrating their commitment to open and transparent data sharing. A few organizations and countries will be given special recognition for categories such as:

– Organization and country with the most water point records uploaded in the past year
– Organization which has shared data across the most countries

Thank you to those of you who have recently shared data to WPDx!

If you have not shared data yet or have more data to share, please do so by February 28th to have your organization recognized by WPDx on Open Data Day!

If you need any help sharing data, please contact info@waterpointdata.org

Please pass along this information to others in your network who may also wish to share data from other programs or countries.

Updates to the WPDx Data Standard

In January, 2021, the WPDx Working Group voted to approve the addition of three new parameters, plus some minor edits and clarifications to the WPDx Data Standard. The new parameters include:

• Tertiary Administrative Division (#adm3)
Description: Provide the name of the tertiary administrative division. The correct unit can be
found at http://www.statoids.com. This corresponds to “Third Order” and “Third Level”
administrative units at http://Geonames.org and http://www.gadm.org respectively.
Format: Open Text

• Rehabilitation Year (#rehab_year)
Description: Provide the 4-digit year when the most recent major rehabilitation (not just regular maintenance) occurred.
Format: Four numbers (ex. 1994)

• Rehabilitator (#rehabilitator)
Description: Provide the name of the entity or entities that completed the most recent
rehabilitation of the water system. This should be the entities that complete or were directly
responsible for the construction, rather than a donor or other involved stakeholder.
Format: Open Text, with multiple entities separated with a “;”

The addition of #adm3 allows users to provide details on the “district” level in countries, such as Ethiopia which have additional administrative divisions. The addition of #rehab_year and #rehabilitator allows users to differentiate between installation and rehabilitation events. The full updated standard can be found here.

We welcome comments, questions, and suggestions from the sector. Please feel free to leave a comment on this blog or email info@waterpointdata.org

All comments received will be compiled, shared, and discussed with the WPDx Working Group.

Reflecting on 2020: The need for WASH

2020 was a year we will not forget.

Our global community faced new challenges which changed day-to-day life in unprecedented ways.  However, the importance of consistent access to water, sanitation, and hygiene (WASH) services in communities and especially at health care facilities (HCFs) did not change, but was instead underscored, as we encountered many unanswered questions about the new coronavirus pandemic.

Handwashing remains one of the primary barriers to preventing the spread of infectious disease. Access to WASH services is key to the resilience of communities as we seek to mitigate the health and economic impacts of coronavirus and ensure the continuation and expansion of sustainable water services. Beyond the linkage to preventing the spread of coronavirus, consistent access to WASH services results in substantial time savings for women and girls, and is associated with positive health, economic, and educations opportunities.

For WPDx, our goal to support governments and their partners in using data-based decisions to improve rural water access, is more pressing. As such, our work last year gained momentum including: 

  • Making data sharing even easier.  The launch of our new website and ingestion engine enables organizations to more easily share, access, and analyze water point data. The new ingestion engine allows for organizations to share data in a variety of formats, with minimal processing, and in just minutes, removing one of the major barriers to sharing data. Training materials can be found on the resources section of the new website.
  • Initiating a WASH in HCF platform. To support a better understanding of the current status of WASH in HCFs and to help optimize investments, WPDx, together with the Millennium Water Alliance launched a new effort to build a WASH in HCF open data sharing platform.
  • Continuing our partnership with the Ministry of Water Resources in Sierra Leone. Over the past several years, Sierra Leone has been on an ambitious path to digitize their rural water data to enable improved decisions from national budgeting to district work-planning. The Ministry of Water Resources issued a letter of support requesting that all NGOs share data directly with the WPDx platform. Together with Akvo, WPDx continued to support the Ministry of Water Resources in these efforts, providing data on a shared platform, and using cutting-edge analysis to inform priority locations for investment, maintenance, and repair.
  • Launching a new partnership with the Ministry of Water, Irrigation, and Energy in Ethiopia. Almost 70 percent of the rural population of Ethiopia lacks access to at least basic water services. In partnership with the Millennium Water Alliance, we initiated a new workstream to support the Government of Ethiopia to harmonize diverse datasets and transform that data into estimates of basic water coverage across the country to help track SDG progress. The Ministry of Water, Irrigation and Energy provided letters of support to work with WPDx on their national monitoring efforts and for working directly with NGOs to share data with the platform.
  • Improving our decision-support tools. We continue our work to refine our suite of decision-support tools. In a collaboration with DataRobot and Akvo, we are continuing efforts to improve our machine learning models for our Current Water Point Status tool.
  • Sharing lessons learned. WPDx sponsored a Rural Water Supply Network (RWSN) webinar featuring progress and lessons learned from Sierra Leone and Ethiopia.

Looking forward in 2021: Continuing momentum

As we embark on 2021, WPDx has an ambitious slate of activities planned to help governments and their partners optimize their limited resources for maximum reach, including:

  • Celebrating Open Data Day (March 6th 2021). WPDx is encouraging organizations to share data ahead of Open Data Day. Data sharing is a key aspect of transparency and accountability for the water sector. More details to follow.  
  • Continuing existing and building new partnerships. WPDx will continue work in Ethiopia and Sierra Leone and seek to replicate and scale successes in additional geographies, including Ghana and Uganda.
  • Responding to stakeholder needs. Collaborating closely with governments and NGOs to regularly share data and refine and improve our suite of decision-support tools to help optimize rural water investments remains a top priority.
  • Launching an improved dataset. In the coming months we will launch WPDx+, a subset of the larger WPDx database for geographies with full national or district coverage. The WPDx+ dataset will include additional data processing steps, including data cleaning and addition of new geospatially derived parameters to bolster decision-support analysis. 
  • Establishing a new data standard for WASH in HCF. For the WASH in HCF platform, we will be working with a group of sector leaders to define a data standard and build a new platform to allow WASH in HCF data sharing.

The activities described above could not have been completed without generous support and partnership from The Coca-Cola Foundation, the Conrad N. Hilton Foundation, DT Institute, and the Vitol Foundation and in-kind contributions from DataRobot and Esri.

 

Ministry of Water Resources in Sierra Leone Requests NGOs Share Data with WPDx

On December 3rd, 2020 the Ministry of Water Resources in Sierra Leone issued a letter to all international and local non-governmental organizations (NGOs) requesting water point data be shared with the Water Directorate via the Water Point Data Exchange (WPDx). The letter establishes that District Mapping Officers, working under the District WASH Engineer, will be the key point of contact for data sharing, uploads, and analysis. The letter also encourages organizations to work with Akvo to collect water point data using standardized WASH surveys. All data collected should be shared with the District Mapping Officer within 30 days of collection.

Compiled water point data will be analyzed to inform annual workplans and other items using the WPDx decision-support tools.