How to use machine learning to predict water point status

Guest Blog by Lars Heemskerk, Consultant for Akvo

< The water point you selected is probably no longer functional > 

If you’re responsible for providing drinking water to as many people as possible, this is the kind of information you want to have access to – especially when you’re hundreds of miles from the water point in question. Thanks to the support of the Dutch Ministry of Foreign Affairs and the Coca-Cola Foundation (TCCF) Akvo, together with WPDx and DataRobot, was able to conduct a pilot in Sierra Leone with machine learning algorithms to automate decision intelligence.

Improving water services in Sierra Leone

 As of 2012, the government of Sierra Leone has been monitoring water points through a large-scale national inventory, as well as small-scale monitoring efforts by NGOs. Data has been collected on the functionality, year of construction, type of pump, type of management, distance to village, etc. to calculate the percentage of the population that have access to drinking water. This data provides a global insight into the state of WASH infrastructure in the country and, because Sierra Leone is at the forefront of African countries sharing data openly, a lot of this data is available on platforms like WASH data Sierra Leone and WPDx.

 Unfortunately, this data is not regularly enriched, so the information on these portals is quickly outdated and therefore less reliable. Thanks to various efforts from WPDx, among others, the importance of regular uploading of data has been emphasised in the National Digital Monitoring Approach. The recent signing of a letter by the director of the Water Directorate, which states the mandatory sharing of water point data by every organisation or government body in Sierra Leone, is an indispensable step in this process.

 In addition, Akvo, in collaboration with WPDx and the Ministry of Water Resources, has started to explore how more can be done with the existing data, at local and national level, to generate data-driven insights that can improve decision making. Machine learning is relatively new in the water sector, but can be applied very well to historical data to predict outcomes and uncover patterns not easily spotted by humans.

Setting up the foundation for advanced analytics

Machine Learning is about recognising patterns in data. Using data collected in the past, machine learning techniques can recognise patterns and make predictions for the future. This can be applied to historical water point data, too. 

Based on the available data, and with the help of DataRobot software, we have been able to determine a number of indicators that are related to the predictable metric – functionality. By combining functionality with other indicators, such as district, county, management, age, water source, and type, the system can teach itself to predict the probability that a water point will be functional now or in the future. The tool is made available on the Water Point Data Exchange.

By using the DataRobot platform, we were able to predict which water points are going to break with an accuracy of 85%. By applying these machine learning models, it’s possible to determine which broken water point, out of thousands, should be fixed first to help the most people. On top of this tool, decision makers can also make use of other geospatial information services (GIS) tools that have been developed to analyse water points to determine high impact locations for rehabilitation, construction and estimating basic water coverage aligned with the Sustainable Development Goals (SDGs).

Pilot training and support 

Implementing these new advanced analytics techniques, it is just as important to involve and train stakeholders. This is not an easy process because it involves major process changes and the involvement of various governmental and non-governmental organisations. In 2019, the Global Water Challenge already held a three day training session with all district water directorates to discuss the transformation of the WASH sector to improve efficiency through the use of data. Following this session, a meeting was held to brief NGOs on the WPDx approach. Building on this general training, more focused training was provided to district mapping officers and NGOs. The next step was to set up a plan on how to use and implement the decision support tool. At the moment of writing this blog, a draft plan has been created and a workshop has been organised to dig deeper into how the decision support tools can contribute to safe water for all in Sierra Leone.

The need for more accurate data

Beside the involvement of NGOs and government bodies, reliable and up-to-date data is crucial for making correct predictions. Since the last national inventory dates back to 2016, it’s important that the water points are structurally monitored. With the letter from the above mentioned Water Directory, there will be a boost of more recent data which will certainly have a positive effect.

We also encourage stakeholders to test whether the machine learning predictions correspond to reality. This can be done on a small scale. There are talks with the Ministry of Water Resources and InterAide to carry this out and test whether the outcomes of the tools are correct and usable in the daily life of decision makers. We would like to continue with this in 2021, in order to prove the power of advanced analytics, but above all to provide drinking water to as many Sierra Leoneans as possible.

Celebrating Open Data Day 2021: The Power of Rural Water Point Data to Improve Decisions

WPDx is excited to promote transparent data sharing in the rural water sector through our first Open Data Day celebration!

Bringing Together the Pieces of the Puzzle

Across the WASH sector there is growing recognition that regular monitoring, data collection, and evidence-based decision making can improve water access program outcomes, and many organizations and governments are working diligently to collect data in their areas of operation. However, unless data is openly shared, entities are only able to utilize their own data to make decisions – which is only one piece of the puzzle.

Sharing data through the WPDx platform enables the puzzle pieces to come together to show the entire landscape and provide a more comprehensive understanding of the water sector.  This link shows how WPDx works to harmonize data regardless of which organization collected the data or which collection platform was utilized. The harmonized dataset, available on the WPDx Data Repository, also serves as a starting point for robust decision-support analysis.

New Predict Water Point Status Tool… Coming Soon

To demonstrate the power of using open data to improve rural water decisions, we will soon be launching an updated version of our Predict Water Point Status tool. The results from this tool provide insights about which water points may break down in the near future, which can be used to inform decisions around preventative maintenance, increased monitoring and resource allocations. We are working on similar updates to our remaining tools which will launch later in 2021.

Recognition of Leaders in Data Sharing

To mark our first celebration of Open Data Day, we take this opportunity to recognize the entities that have demonstrated their commitment to transparency and accountability by sharing data with the WPDx platform, contributing over 40,000 new water point records from 28 countries in the past year. Special recognition goes to the following organizations: 

Countries with the most water point records uploaded in the last year

  • Ethiopia
  • Sierra Leone

Governments with demonstrated national commitment to collecting and using WPDx data for decisions

  • Ministry of Water Resources, Sierra Leone
  • Water Development Commission in the Ministry of Water, Irrigation and Energy, Ethiopia

Government Agencies that have shared most data in the past year

  • Ministry of Basic and Secondary School Education of Sierra Leone (in partnership with the Ministry of Water Resources of Sierra Leone)
  • Dera, Farta, and North Mecha Water and Energy Offices (Ethiopia)

Organizations that shared the most data in the past year

  • Community-Led Accelerated WASH program (COWASH)
  • Living Water International

Organizations that shared data from the most countries in the past year

  • Living Water International
  • WaterAid

Organization that demonstrated their commitment with automated updates

  • Ugandan Water Project

 

Thank you to our generous funders and key partners

 

 

 

Entities which have shared data with WPDx in the past year

 

Upcoming Open Data Day 2021 Celebration

*Please see here for our updated post and special recognitions!*

March 6th, 2021 is International Open Data Day, an opportunity to promote awareness and use of open data.

The Water Point Data Exchange (WPDx), the world’s largest open data repository for rural water point data, is going to celebrate Open Data Day by sharing information about how data use can improve decisions, encouraging data sharing with the WPDx platform, and recognizing contributing organizations.

The celebration will include an updated post on the WPDx website to appreciate the organizations which have shared data to WPDx in the past year, demonstrating their commitment to open and transparent data sharing. A few organizations and countries will be given special recognition for categories such as:

– Organization and country with the most water point records uploaded in the past year
– Organization which has shared data across the most countries

Thank you to those of you who have recently shared data to WPDx!

If you have not shared data yet or have more data to share, please do so by February 28th to have your organization recognized by WPDx on Open Data Day!

If you need any help sharing data, please contact info@waterpointdata.org

Please pass along this information to others in your network who may also wish to share data from other programs or countries.

Updates to the WPDx Data Standard

In January, 2021, the WPDx Working Group voted to approve the addition of three new parameters, plus some minor edits and clarifications to the WPDx Data Standard. The new parameters include:

• Tertiary Administrative Division (#adm3)
Description: Provide the name of the tertiary administrative division. The correct unit can be
found at http://www.statoids.com. This corresponds to “Third Order” and “Third Level”
administrative units at http://Geonames.org and http://www.gadm.org respectively.
Format: Open Text

• Rehabilitation Year (#rehab_year)
Description: Provide the 4-digit year when the most recent major rehabilitation (not just regular maintenance) occurred.
Format: Four numbers (ex. 1994)

• Rehabilitator (#rehabilitator)
Description: Provide the name of the entity or entities that completed the most recent
rehabilitation of the water system. This should be the entities that complete or were directly
responsible for the construction, rather than a donor or other involved stakeholder.
Format: Open Text, with multiple entities separated with a “;”

The addition of #adm3 allows users to provide details on the “district” level in countries, such as Ethiopia which have additional administrative divisions. The addition of #rehab_year and #rehabilitator allows users to differentiate between installation and rehabilitation events. The full updated standard can be found here.

We welcome comments, questions, and suggestions from the sector. Please feel free to leave a comment on this blog or email info@waterpointdata.org

All comments received will be compiled, shared, and discussed with the WPDx Working Group.

Reflecting on 2020: The need for WASH

2020 was a year we will not forget.

Our global community faced new challenges which changed day-to-day life in unprecedented ways.  However, the importance of consistent access to water, sanitation, and hygiene (WASH) services in communities and especially at health care facilities (HCFs) did not change, but was instead underscored, as we encountered many unanswered questions about the new coronavirus pandemic.

Handwashing remains one of the primary barriers to preventing the spread of infectious disease. Access to WASH services is key to the resilience of communities as we seek to mitigate the health and economic impacts of coronavirus and ensure the continuation and expansion of sustainable water services. Beyond the linkage to preventing the spread of coronavirus, consistent access to WASH services results in substantial time savings for women and girls, and is associated with positive health, economic, and educations opportunities.

For WPDx, our goal to support governments and their partners in using data-based decisions to improve rural water access, is more pressing. As such, our work last year gained momentum including: 

  • Making data sharing even easier.  The launch of our new website and ingestion engine enables organizations to more easily share, access, and analyze water point data. The new ingestion engine allows for organizations to share data in a variety of formats, with minimal processing, and in just minutes, removing one of the major barriers to sharing data. Training materials can be found on the resources section of the new website.
  • Initiating a WASH in HCF platform. To support a better understanding of the current status of WASH in HCFs and to help optimize investments, WPDx, together with the Millennium Water Alliance launched a new effort to build a WASH in HCF open data sharing platform.
  • Continuing our partnership with the Ministry of Water Resources in Sierra Leone. Over the past several years, Sierra Leone has been on an ambitious path to digitize their rural water data to enable improved decisions from national budgeting to district work-planning. The Ministry of Water Resources issued a letter of support requesting that all NGOs share data directly with the WPDx platform. Together with Akvo, WPDx continued to support the Ministry of Water Resources in these efforts, providing data on a shared platform, and using cutting-edge analysis to inform priority locations for investment, maintenance, and repair.
  • Launching a new partnership with the Ministry of Water, Irrigation, and Energy in Ethiopia. Almost 70 percent of the rural population of Ethiopia lacks access to at least basic water services. In partnership with the Millennium Water Alliance, we initiated a new workstream to support the Government of Ethiopia to harmonize diverse datasets and transform that data into estimates of basic water coverage across the country to help track SDG progress. The Ministry of Water, Irrigation and Energy provided letters of support to work with WPDx on their national monitoring efforts and for working directly with NGOs to share data with the platform.
  • Improving our decision-support tools. We continue our work to refine our suite of decision-support tools. In a collaboration with DataRobot and Akvo, we are continuing efforts to improve our machine learning models for our Current Water Point Status tool.
  • Sharing lessons learned. WPDx sponsored a Rural Water Supply Network (RWSN) webinar featuring progress and lessons learned from Sierra Leone and Ethiopia.

Looking forward in 2021: Continuing momentum

As we embark on 2021, WPDx has an ambitious slate of activities planned to help governments and their partners optimize their limited resources for maximum reach, including:

  • Celebrating Open Data Day (March 6th 2021). WPDx is encouraging organizations to share data ahead of Open Data Day. Data sharing is a key aspect of transparency and accountability for the water sector. More details to follow.  
  • Continuing existing and building new partnerships. WPDx will continue work in Ethiopia and Sierra Leone and seek to replicate and scale successes in additional geographies, including Ghana and Uganda.
  • Responding to stakeholder needs. Collaborating closely with governments and NGOs to regularly share data and refine and improve our suite of decision-support tools to help optimize rural water investments remains a top priority.
  • Launching an improved dataset. In the coming months we will launch WPDx+, a subset of the larger WPDx database for geographies with full national or district coverage. The WPDx+ dataset will include additional data processing steps, including data cleaning and addition of new geospatially derived parameters to bolster decision-support analysis. 
  • Establishing a new data standard for WASH in HCF. For the WASH in HCF platform, we will be working with a group of sector leaders to define a data standard and build a new platform to allow WASH in HCF data sharing.

The activities described above could not have been completed without generous support and partnership from The Coca-Cola Foundation, the Conrad N. Hilton Foundation, DT Institute, and the Vitol Foundation and in-kind contributions from DataRobot and Esri.

 

Ministry of Water Resources in Sierra Leone Requests NGOs Share Data with WPDx

On December 3rd, 2020 the Ministry of Water Resources in Sierra Leone issued a letter to all international and local non-governmental organizations (NGOs) requesting water point data be shared with the Water Directorate via the Water Point Data Exchange (WPDx). The letter establishes that District Mapping Officers, working under the District WASH Engineer, will be the key point of contact for data sharing, uploads, and analysis. The letter also encourages organizations to work with Akvo to collect water point data using standardized WASH surveys. All data collected should be shared with the District Mapping Officer within 30 days of collection.

Compiled water point data will be analyzed to inform annual workplans and other items using the WPDx decision-support tools.

 

WPDx launches new website and ingestion engine: making data sharing, access, and analysis easier

In November 2020, WPDx launched a re-designed website to make it even easier to for users to share, access and analyze water point data.

A key feature of the new site is a new and improved “ingestion engine” which simplifies the process for  sharing data with WPDx.  The ingestion engine can accept data through a direct file upload or through a connection to a URL or API endpoint in any of the following formats: csv, xls(x), Google Sheets, JSON, or HTML table. After an organization has reviewed the data standard, the upload process can be completed in minutes.

Please check out our step-by-step how-to guide and short video for more information on how to share data with WPDx today!

DataRobot partners with WPDx to pilot transformative Artificial Intelligence for Good initiative

DataRobot, an automated machine learning platform that helps build and deploy accurate predictive models, has partnered with the Global Water Challenge (GWC) to pilot their Artificial Intelligence (AI) for Good: Powered by DataRobot initiative. DataRobot created AI for Good because they believe artificial intelligence has the potential to help solve the greatest challenges facing society. Whether it’s improving healthcare and education, mitigating the effects of climate change, or fighting cybercrime, DataRobot trusts that AI has the power to make meaningful change.

In 2018, DataRobot and GWC, the Water Point Data Exchange’s (WPDx) parent organization, worked closely to develop sustainable water solutions and advance WPDx’s capacity. More than 500,000 data points were analyzed to create WPDx’s advanced analytics tools. Building onthe successful results achieved with GWC’s pilot collaboration, DataRobot launched AI for Good: Powered by Data Robot. Now, they are looking for other non-profit organizations to partner with them and discover how AI can also help them reinforce their work. DataRobot’s goal is simple: Impact one million lives in the first year. At WPDx, we are thrilled that our partnership not only led to the enhancement of our platform’s power but that its effective results have inspired a company like DataRobot to use data to create sustainable and lasting impacts. 

To learn more about GWC and DataRobot’s collaboration, please watch this video

WPDx is highlighted as a global leader in WASH standards

In their first Global Report on Water, Sanitation, and Hygiene (WASH), the Open Government Partnership (OGP) selected the Water Point Data Exchange (WPDx) as the leader in setting a standard for mapping and collecting rural water data. WPDx’s standard was classified under “Basic level and quality of service data.” The OGP established that the nominees in this category must provide a clear and globally endorsed data standard, technical guidance, and a global data repository that enables all stakeholders to access and analyze data about water services easily. “For OGP countries looking to address WASH through their action plans, standards around water reporting build on the experience of other systems and allow for learning and comparison. Using existing standard reporting processes reduces conceptual work and makes systems compatible and comparable across service providers and countries,” the report added. WPDx is honored that OGP recommended our standard for collecting rural water data. We continue committed to improving water access for millions of people as a result of better standarized open-source data. 

To read the full report, click here.

WPDx Featured in Gartner Case Study

The Water Point Data Exchange was featured in a recent Gartner research paper. “Four Real-World Case Studies: Implement Augmented DSML to Enable Expert and Citizen Data Scientists,” authored by Carlie Idoine and Jim Hare of Gartner, explored how augmented data science and machine learning (DSML) not only gives citizen data scientists access to DSML capabilities, but it also makes experts more efficient and productive. Idoine and Hare selected four cases that they concluded data and analytics leaders should study to understand the business impact of augmented DSML. WPDx was honored to be one of the exemplary cases chosen. Specifically, the authors utilized WPDx as the example to “incorporate governance to manage and guide your augmented data science approach, with significant focus on data access and data quality.” According to the report, “the data provided [by WPDx] guides the Ministries’ planning and providing facts, not just assumptions. And it ultimately provides more trust in the approach and in the results. As Mohamed Small Jueah Bah, Program Officer for Monitoring and Evaluation, Sierra Leone Ministry of Water Resources explained, an augmented approach has made his job “easier, easier, easier, easier, easier. It’s as simple as eating a banana!”

You can read the full report here.