DAta Cleansing

Understanding Data: Normalization Procedures with U.S. Customs Data

PIERS has an excellent graphic that explains the process by which a reputable TI Provider should handle the U.S. Customs Waterborne Import Manifest (bill of lading) data. Many other TI providers, particularly the newcomers to the market, jump from collection to publishing. After all, if you eliminate cleansing, standardization, verification, validation and enhancing you’re bound to save time and money.  That’s why there are some data providers offering access to the U.S. Customs data for as little as $30.10 per month.

During my tenure as founder/CEO at CenTradeX, we worked very hard to make sense of data, connect it in innovative ways and provide easy access and graphic delivery.  We had a lot of smart and creative people working a long time toward those objectives. We spent over two years working out the bugs before the first interface integrating and utilizing the U.S. Customs data could be launched.  It’s complex and obtuse.  It’s also perhaps the most valuable single source of Trade Data available.

The inherent treasures buried within the data have only begun to be unearthed. PIERS has gotten the furthest, particularly with their acquisition of key CenTradeX applications and technologists, but even they have a long, long way to go.  It is my hope that beyond succumbing to the recent and base marketplace inertia that has led to the commoditization and devaluation of the data, that necessary capital and creativity will be applied to the task of furthering innovation in the trade intelligence field by those with the vision and resources enough to carry it further which may or may not be PIERS.

The following illustrations address one of the necessary aspects in the normalization, integration and enhancement processes involved with U.S. Customs Data. The first two diagrams (which can be clicked upon to display full size) relate to the identification, normalization and enhancement of the Foreign Shipper and U.S. Importer of record.

The respective U.S. Customs data fields containing Shipper and Importer names and addresses need to be normalized (including standardization of the many iterations of those names) and broken down into separate “tokens” such as zip code, state, phone, city, etc. These tokens are matched against a refined and dependable company data repository derived from third-party sources (we used Hoovers, D&B, Kompass, PIERS and others) as well as the perfected names collected over time from the Manifests themselves. The number of tokens matched are then scored on a reliability or veracity scale that is internally developed.

The same procedure can be utilized to normalize and match several “silos” of disparate data held between different companies or divisions of the same company such as marketing, operations and finance.  CenTradeX was once consulted by Maersk for a project in which they wanted to normalize, standardize and match their own internal company information, after which time they then could connect it to the individual shipment manifests for themselves or their competitors.

Comments are closed.