Knowledge Base

Data-Driven Supplier Scouting - Why Web Data is the Best Source of Information for Automated Supplier Scouting

While we are living in the digital age for many years now, many processes in the business world are still done almost like decades ago. Meanwhile, we can observe a growing availability of digital information on all aspects of our life. With our series of articles we want to show how this enables new solutions for old challenges and how you can benefit from a more and more data-driven world.

The challenge of finding new suppliers for certain parts or materials is something every large manufacturer or distributor faces on a regular base. Besides manual research supported by the standard digital toolkit of every corporation (Excel, Google & Email), trade shows were still the preferred way to find new business partners - and they will most likely come back after things go back to normal. But with an increasingly complex and volatile world, it becomes difficult to analyze supply markets with such traditional approaches. Leveraging the potential that our digital technologies hold becomes an increasingly attractive alternative, but how does it actually work?

What is supplier scouting at scale and why do it?

A traditional supplier scouting process works like a funnel, similar to a sales funnel. In the first step, an initial set of potential suppliers is collected and in each step, additional validation steps are performed in order to eventually end up with at least one very well suited company. Like with a sales funnel, the more candidates you can insert into the top of the funnel, the more likely it is that you find one or more good matches. The challenge, however, is that this whole process is very time consuming so that in many cases it starts only with a handful of potential suppliers.


With every validation step in the scouting funnel, the number of remaining candidates decreases so that at the end of the funnel, only suitable suppliers are left.

While the latter steps will still require human interaction for the foreseeable future, the upper part of the funnel can be well automated using external data sources (i.e. data that is not collected inside of your company) and machine learning. The biggest advantage of this approach is that, due to its scalability, it is possible to scan through the whole universe of suppliers and condense it down to a list that can then be used as an input for the following scouting steps. This reduces the time and resources required for this initial step while reducing the probability of missing relevant candidates significantly.


What data is relevant and why is website data the most useful?

There are many data sources that can be very valuable throughout the whole supplier scouting process. Especially for filtering and validating the candidates, data from various sources and/or providers on risk, compliance, news, financials, certifications and more can be well utilized. For this first step of the funnel, where the main goal is to find out if a company could produce what we are looking for, the website content from the company homepages is the most valuable source of information.

Company website data provides many advantages that makes it the best source of information for the upper funnel of supplier scouting:

It is the richest source of information on what the company is doing:

  • The goal of creating a good company website is providing relevant information on products and services to potential buyers - exactly the information we are looking for;
  • The information is not restricted to  the structure of a directory so the companies are free to provide as much information as they want;
  • There is a comparatively large budget allocated to presenting the relevant information on the website, other than for keeping directory profiles up-to-date;
  • Directories and also data providers usually struggle to correctly and extensively cover the products offered by companies. In particular, the data collected from national registers quite poorly represents the actual business areas and product offerings of a company.

It is usually the most fresh source of information for a company:

  • Companies publish new products, company news and other information first of all  on their website;
  • Directories are often not updated at all, at least not updated on a regular basis or in detail.

It is generally free and (comparatively) easily accessible:

  • There is no direct paywall and as long as it is used for search or analysis, you don’t violate copyright;
  • Information in directories or provided by data vendors can be expensive, especially when you need it of high quality.


However, there is one big challenge that comes with this type of data: it is unstructured (or better semi-structured) and extremely heterogeneous over countries, companies and industries. This means that, while company websites contain a lot of fresh and deep information, extracting this information in good quality is extremely hard and expensive. 

With more simple approaches like pure keyword matching we can already retrieve potential candidates for certain products, materials or services quite effectively. But many cases require more advanced, machine learning based algorithms and deep knowledge bases (e.g. knowledge trees or graphs [link] of products, preliminary products, materials, and processing steps), as companies might not promote all the different products and services they could provide or the relevant keywords are too ambiguous. 

The good news is that a good process around data usage can often balance out a lack of quality or structure weaknesses (Augmented Intelligence). For example, if we have a human analyst reviewing the output of a machine learning model, we have a much higher tolerance for errors while at the same time we can leverage the huge gains in value provided by the scalability of the model. Identifying the actual hits from a handful of candidates (or even 100) is quite an easy task for a human while searching through the full initial dataset of thousands or millions of records is just impossible. Sounds familiar? Yes, that’s exactly how Google works!


So is it just like using Google?

Well, not exactly, but it is a very similar approach. At the beginning, Google was competing with many other companies that had the same goal: making the web more easily accessible. A common approach back then was to build extensive directories of all the websites that exist, similar to today’s company directories. Such a directory has many advantages including the availability of structured information that can be easily used for searching and filtering. But the amount of websites was growing so fast that it was impossible to maintain those directories in a sufficient quality. Thus, Google turned the approach around and worked on collecting all the available content first and then developing techniques for filtering and ranking the websites based on the search criteria (mainly search terms) provided by the user. The results might not be perfect as they are returned, but as the users have to deal with a well condensed set of potential results, they are able to perform the last validation step quite easily in most cases. And as we all know, history has shown that this is a very good approach to information discovery when you have huge amounts of data.

Quite naturally, Google became a common tool also for “digital supplier scouting”, as most of us use it heavily in our private life and you can theoretically discover information from any company website in the world. But is it actually a good tool for finding suppliers? Not really. Google is not primarily made for finding (B2B) companies but rather for consumers to find content, products and services. Therefore, you often have to skim through hundreds of irrelevant results in order to find the companies you are looking for -  with many or even most good candidates never showing up within an acceptable number of result pages.


A more narrow focus is the key

The key for effectively and efficiently leveraging web data for supplier scouting is creating a more narrow focus. Instead of searching through all the web data in the world, we only want to search through the relevant data, i.e. websites of companies - or even better relevant companies. To build such a dataset you can always choose between two approaches: Top-Down or Bottom-Up. The first requires collecting a huge initial dataset and filtering it down using suitable algorithms until you end up with the relevant subset. For the Bottom-Up approach we gradually build up our dataset from various sources of information until we reach our desired full dataset. Both approaches have their own up- and downsides (we will cover them in a separate article), which leads us back to the one challenge that always comes with web data: it is huge and unstructured, which makes creating this focused subset of websites very challenging and resource consuming. But when we manage to overcome this challenge, we get a very powerful, information-rich and fresh data foundation for supplier scouting at scale and in depth.

Luckily, you don’t have to solve your data challenges on your own. The number of free data sources, data vendors, and data-driven solution providers is constantly growing so your options are manifold. No matter if you need a turn-key solution or want to build your own, powerful data foundation, we support you as your end-to-end partner for external data. Just get in touch with us via info@alpha-affinity.com or use our contact form.