U.S. Customs Data: Parsing & Normalization. The First Steps in its Long, Transformational Journey.

It took us several years (at CenTradeX) to develop an intelligent system by which to quickly and seamlessly assimilate the daily Customs feeds.

Over time we developed and incorporated automated procedures and administrated them under an umbrella control panel. Statistical data update processes from U.S. Census and U.N. Comtrade were initiated from this centralized control panel.  U.S. Customs data, initial processing and normalization as well as company, parent and location matching, were also conducted from the same control panel.

A detailed diagram of the individual components that make up the control panel (as a constituent part of the A.I.Engine) can be downloaded from Google Docs by clicking this link.

Component of the A.I. Engine. Control panel by which to initiate data import

Company data collections (from sundry vendors because each contained its own unique non-standardized characteristics) were initially processed utilizing different arrays of queries and procedures.  They were then integrated into the combined company repository which, in turn, were correlated with the U.S. Customs and statistical data.  See U.S. Customs Data Primer Part 4: Enlightenment Through Graphics & Diagrams for illustrative diagrams.

Data sources. Processing schedule

U.S. Customs Data that we referred to as AMS – automated manifest system – went through six distinct processes which are depicted below.  An illustrative diagram of all six processes and eight sequential databases (or collections) can be viewed by clicking this link.

Customs data is received and processed on a daily basis, but the final, resultant databases utilized to serve up web reports were refreshed weekly to allow for enhancements (beauty treatments) and interconnectivity with other data collections.

AMS (Customs) Data requires many steps of parsing, normalizing, refining, integrating and optimizing before it is ready for “prime time”.

Let’s look at the steps from the beginning.  Roughly speaking, the first task is to import all the data properly – correctly parsing all the elements contained in the original “flat file” and organizing them within a relational database.  Every data element and every permutation and aberration must be accounted for.  The diagram below depicts the second of seven databases (the first “DB” is really just a collection of the all the raw AMS or Customs data itself). This database is resultant and refreshed daily from the first processing step.

Parsed Customs data sorted and organized within a relational database structure. Click to open image in a new window.

A high(er) resolution depiction of the above diagram can be obtained from our Google Docs site, by clicking this link.

Next comes the “normalization” process, wherein each element of parsed data is refined and standardized. For instance, a simple Port code, whether foreign or domestic,  has its corresponding state, province /region, country and normalized name.  Each container code is translated into presentable information about its type such as refrigerated or non, height, length and particular identifying number. Within this normalization process company name, address, and contact iterations are resolved as well.

Below is a diagram depicting the third of eight databases after the second step along the Customs data transformation journey.  A high(er) resolution image is available for download from our Google Docs site.

The second step in the Customs data transformation process. Click to open in a new window.

No comments yet... Be the first to leave a reply!

We appreciate your comments and perspective

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s