Markaaz Data Matching Solutions
Data Matching is the process of identifying and merging duplicate data records. This can be done across databases to ensure matching data is aligned, for example matching public data from global registries to the business records in the Markaaz Directory. Additionally, it is leveraged to match customer accounts or companies to be onboarded to the Markaaz Directory to determine Business Verification and Business Risk.
Matching Methodology and Strategies
There are 2 API options for matching, Advanced Match and the Standard Match. Advanced Match offers options for additional metadata to be returned in addition to multiple "matches" based on how many entities are requested. The "best match", the one with the highest match confidence score, is always returned as the first entity in the response. Additional information on the Advanced Match API can be found in the section on integration and in the API Reference.
Matching Methodology
Standardization of Data (Structure, Business Name, Address and other identifying data):
- Company Name Formatting and Standardization—Global and Regional views of Company Forms, Address Formatting and parsing of data into proper fields to maximize the effectiveness of the Matching Engine.
- Leveraging Multiple Addresses of Companies—Registered Address, Physical Address, Mailing Address and Former Locations.
- Evaluation of Match Candidates to Identify Best Match—Each identifying company attribute is scored to determine the best match candidate based on the input record submitted. This process narrows down the match candidate pool. The best match candidate is identified from the assessment of the grades of each attribute used for matching.
Matching Strategies
Many different business attributes are used to identify the best candidates based on the input record submitted.
- Identifying Attributes—Name (Including Legal Name, AKA/DBA and Former Names), Address (Including Registered, Physical, Mailing and Former Addresses)
- Location Attributes—Phone Number, URL
- National IDs and Other IDs—National ID (ex. VAT), TaxID (FEIN, EIN, TaxID), LEI
Markaaz Data Matching Algorithm
-
Clean and normalize input data as a method to improve the accuracy and speed of the algorithm:
-
As a part of this step, we perform the following operations on both the original string and string to be compared. Some examples include:
- Lowercase all the letters.
- Remove any special characters.
- Remove any additional white space.
- Replace the words with abbreviations with original words.
-
-
Check common tokens:
- This step will set the flag hasCommonTokens; if both strings have at least one common full text match.
- For example, if we are comparing two strings (string1 & string2):
- string1 = ABC Manufacturing Company & string2 = ABC Company will set the flag as true.
- The tokens for string1 are: ["ABC", "Manufacturing", "Company"]
- And the tokens for string2 are: ["ABC", "Company"]
- string1 = ABC Manufacturing Company & string2 = XYZ Ltd will set the flag as false as there are no matching tokens.
- For example, if we are comparing two strings (string1 & string2):
- This step will set the flag hasCommonTokens; if both strings have at least one common full text match.
-
Calculate Match Grade:
- The scoring algorithm will be configurable and can be updated if needed.
- We are using Jaro-Winkler string similarity algorithm for calculating the Match Grade.
- The threshold score for each grade will be configurable and can be updated as per need.
Match Score/Grade Output
- The output will include a Match Grade at candidate/entity level.
- Each input field will be provided a grade against each field in response payload. The Advanced Match API will return those field grades if the option is enabled.
- Entities that match more parameters will have a higher match confidence and a better chance of being discovered in the directory.
Updated 5 months ago