The Challenges of Cleaning, De-Duping and Cross-Referencing Legal Entity Databases
Doing business in today’s world requires working with legal entity data from multiple sources, including data vendors, exchanges, regulators, rating agencies and LOUs. As a result, companies are finding it difficult to keep a golden record of information to support their compliance and risk management needs.
High-quality data is the foundation of good business, but cleaning, de-duping and cross-referencing hundreds of data sources poses significant challenges. There are four main obstacles affecting entity data management.
1. Lack of Unique Identifiers
Trying to match together the information from different entity data sources can be particularly challenging when there are no unique identifiers. Even with a clean, well-formatted set of information, there is plenty of work to be done to match up each record accurately and precisely.
For example, while the IRS provides a free download of all the entities with GIIN identifiers, it is not enough to help identify a match between an entity with a GIIN and the corresponding entity in your database. Not only does the GIIN not link to any proprietary identifiers, it doesn’t link to any public identifiers, such as the LEI (Legal Entity Identifier) or CIK (Central Index Key).
2. Lack of Consistency
In the absence of a consistent legal entity identifier, if all entities were listed by their formal legal name, name matching would still be a fairly simple task. However, in reality, matching entities based on the names provided is particularly difficult. Names may frequently be “as entered” by the relationship manager, customer or counterparty, so there is plenty of inconsistency around the name that is registered.
Some entities may be recorded using their local name, some with abbreviations, others with old names. Furthermore, many databases have entries in the name field that are “overloaded” with geographical and other modifiers, producing a concatenation of disparate information, for example “BlackRock MultiAsset Portfolio III (Exclusively for Qualified Institutional Investors with ReSale Restrictions for the Japanese Investors).” Without any normalization or validation to match up the entity with its formal legal name there is no consistent name to compare to existing counterparties.
3. Regional and Language Variations
Name matching can get even more complicated once you observe name variations based on language. In the IRS GIIN database, for example, for countries with languages written in the Latin alphabet, the entity names are mostly written in the native language: French in France, German in Germany and Turkish in Turkey (e.g. BNP Paribas Obli Revenus).
However, the information about Chinese, Greek and Saudi Arabian entities is generally presented in English (e.g. Zhejiang Yongkang Rural Cooperative Bank). Russian and Ukrainian entities, however, are somewhere in between – written mostly in English, though proper nouns and other non-translatable words are in the native language (e.g. PrJSC IC PZU Ukraine Life Insurance).
4. Limited Attribute Coverage
When the name is not enough to provide quality record matching, the absence of other data attributes, such as registration address, city, state or province, further reduces confidence in making matches. If you are working with the GIIN dataset, all the IRS provides is the name and country.
Data is at the core of accurately identifying clients, counterparties and vendors. Even a small data error can have ripple effects across an organization. Investing in data management tools to keep up with expanding data sources and maintain accurate records will help to resolve data challenges and pave the way to future business growth.
Get in touch with our team to learn how we can help you address these and other entity data management risks.