[Welcome Skyler Young of Connect211 back to the blog!]
When it comes to human services, comparing apples to apples is harder than you might think.
At Connect211, we work with human service referral providers in more than twenty states to “orchestrate” resource data pipelines, and support directory services. In areas where we serve multiple clients, and/or help them collaborate with their partners, we also facilitate collaboration across redundant yet siloed directory systems.
There are many potential benefits of merging siloed resource data sets – including better service coverage, improved analytics, reduced maintenance costs, crowdsourced quality, and more.
Usually, the first step in this process of resource dataset alignment (after reaching agreement to cooperate, of course) is determining who is maintaining data about which organizations. Toward that end, we need to compare the contents of different datasets.
This is where we encounter a surprising amount of difficulty. That’s because the same organization might be described with many different names.
Sometimes this is obvious at a glance (i.e., Young Men’s Christian Association obviously means YMCA) but often local knowledge is needed to discern (i.e., a local YWCA program might actually be part of the YMCA organization… or might not). This means that identifying matching organizations among overlapping datasets can take days or even weeks of painstaking manual labor.
We decided to build our own record-matching tool, and we’re excited to share it as an open-source technology that can benefit the entire field.
The Record-Matcher is an AI-powered, human-assured tool for comparing overlapping resource directory datasets.
The tool works specifically with data in Open Referral’s HSDS format. Standards are necessary to enable this kind of apples-to-apples comparison, because they enable data from different sources to be translated into a common language.
Once multiple datasets are loaded, Record Matcher identifies potential matches, and assigns each a confidence rating to indicate the likelihood that they refer to the same thing.
This tool leverages a library of algorithms that we have developed over years of pattern analysis and user feedback in order to do rapid “identity resolution” – that is, finding records that might describe the same institution even if their names and other information look different. This library is where we train our AI model on a growing set of patterns that enable the model to ‘know’ what to look for in the hunt for matching records. The best part is that the more we run the tool, the more insights we generate about common patterns – so that our library of patterns keeps growing, and we keep getting better at tackling this challenge.
Ultimately, the machines can do a lot of “dumb” work very fast – and we want to privilege the humans’ time to focus on the hard work of exercising judgment.
Once the user has reviewed the most likely matches – and “mapped” together the entities that are actually the same – the user can then report on overlapping and unique records for deeper analysis.
We’ve found that this tool can help reduce the labor of comparing datasets by days or even weeks. One client had their trainee complete a comparison that previously might have taken days – in just 30 minutes, with very favorable results.
We are also testing the ability to deduplicate datasets before we publish them to public-facing directory interfaces, applying a given set of rules that establish “official” sources and remove redundant records. As a result, this tool can even help users improve the quality of their own data.
This technology is useful for our work with our own clients but we’re happy to share it so that others can benefit too.
Not only that – we think this is just the next step on an epic journey for the entire field.Through this Record-Matching progress, we see our first glimpse of a promising new future: federation of resource data information systems. By federation, we mean a system of systems – in which responsibilities for data management can be divvied up, and the resulting data supply shared, across a distributed network of information managers. Federation begins with the ability to identify overlaps among datasets managed by different players in the field. From there we can start developing agreements about who should manage which records.
Toward that end, beyond the horizon of this Record Matching challenge, we can anticipate additional objectives on our roadmap, including the ability to delegate stewardship responsibilities, route feedback, and resolve conflicts in ways that can propagate across many systems.
Right now, we’re eager to get more people involved in this process – piloting new features, building capacity, and further developing these tools as shared infrastructure that can improve the whole human service sector. We believe these models, patterns, and tools are most useful when the widest set of people are able to use them and contribute knowledge to them. So we’re going to make the Record-Matcher available under an open license.
Email us to learn more. You can join us in discussion about this topic in the Open Referral Forum. (And if you’re interested in the technical details of our model, stay tuned for a followup post in which I’ll dive into the nitty gritty 🙂


Leave a Reply