About eight years ago, I wrote a paper for CUED (now the International Economic Development Council) entitled “Methods for Generating Useful Databases when Industry Codes Fail.” The gist of the paper was that most industry and market data is available based upon NAICS industry classifications or some similar proxy. Forecasted and estimated employment, occupations, wages, gross national/regional product, even revenue, productivity, regional specialization and many other indicators can be derived from federal or commercial sources using NAICS definitions for the market or industry you are interested in.

However, Kurzweil’s Law of Accelerating Returns states that we are reshaping technologies, redifining and obsoleting industries, and inventing whole new markets at an exponential rate - but the NAICS system is only updated through a linear, periodic, regulatory process that guarantees that it is inevitably and increasingly inappropriate for cutting-edge, research and innovation-based, or technology- and knowledge-oriented industries.

The result is that researchers try to generate useful statistics on industries like biotech, agritech, biofuels, modeling and simulation, photonics, digital media, nanotechnology or nanomaterials using NAICS that is comparable to what they can learn about more traditional industries (I have been asked to research each of these industries recently). As long as nobody looks “under the hood”, statistics get published that are wildly inconsistent across geographies or researchers and nobody is the wiser.
The Wrong Way

The worst method I’ve seen for tackling this problem is blind aggregation. A novel “industry cluster” is defined by adding together all the NAICS that the targeted technology companies might be included in. The result is a bloated estimate that only vaguely resembles the target industry. For example, I had access to the research behind a study on digital media performed by one of the big 4 accounting firms, which simply rolled up all the public relations, advertising, newspaper and magazine, film and television, arts and entertainment related NAICS together. While these sectors are all potential users and creators of digital media, it was a big stretch (especially five years ago when the report was authored) to claim they were all digital media companies. This was validated when many of these companies were contacted for follow-on research and collaboration. This definitional methodology is probably the most innaccurate, but I suspect it may be the most prevalent because it’s the easiest solution for analysts.

The Right Way That Might Not Fit

A second category of solutions requires a bit of mathematical wizardry. My favorite is a fuzzy-logic based system developed by my colleague Dr. Ed Feser, which identifies “core” industries, “supporting” industries, and assigns various (and non-mutually-exclusive) values to how strongly each NAICS category is linked for a series of pre-identified clusters. His cluster definitions have held up in empirical, applied research projects in which I’ve implemented them, and produce comparable to superior results to many of the other strategies. Alternative solutions in this category include linear algebra and statistical solutions like factor analysis, statistical cluster analysis (hierarchical, genetic, K-means, etc.), or neural networks and pattern recognition software. Definitional strategies like these have the advantage of producing self-consistent and empirical statistics and data manipulations. However, if you can’t match up one of the clusters generated by one of these methods with your target industry (such as nanotech), then you are still completely out of luck.

The Best Way Is Difficult

The method that remains is the most difficult to implement - empirical self-identification of a cluster’s members. The precedent is called snowball sampling:

  1. Start with a solid, pre-identified core
  2. For each company or organization, conduct an interview or survey to confirm their membership, ask them to identify their competitors, customers, vendors and suppliers
  3. For each competitor, customer, vendor, or supplier, repeat until the respondents no longer self-identify or associate themselves with the targeted industry or market.

It’s obvious that you could integrate the sampling process with a census or social network analysis. The converse of snowball sampling is often performed by research call centers, who will work from a pre-generated list (such as the aggregated cluster above) and call each candidate to remove those that do not match the target profile - I generally refer to this as “casting a wide net” . Unfortunately, both methods are difficult to scale up (eg nationally), and are time- and labor-intensive.

I find that often snowball or “wide net” sampling methods are the only viable methods for identifying a research population that stands up to the “common sense” and “newspaper” tests. I’m keeping my eyes out for social media tools that will allow groups of people, organizations or industries to self-organize in a way that’s useful for public relations, marketing, and economic research, but tools like discussion forum member lists, email opt-in and newsletter systems still have a long way to go. They are still ineffective for getting consistent and comprehensive participation from businesses and organizations, particularly when time constraints exist (as they always do for research projects).

If your community is promoting a cutting-edge technology or research industry, some of the best analytical investments you can make are ways to keep a comprehensive and up-to-date “opt-in” directory of members!


 



 


Leave a new comment

(required)
(required)