Documentation Index
Fetch the complete documentation index at: https://docs.brandfetch.com/llms.txt
Use this file to discover all available pages before exploring further.
Methodology
Measuring accuracy in open-web brand data has real theoretical and practical limits. We want to be explicit about how the numbers in this document should be interpreted.
A defining property of our dataset is that it must work across the full range of domains on the open internet. Input quality varies materially, and output quality is a direct function of it. When a site is slow, broken, or content-thin, the resulting data footprint is thinner, which reduces coverage. We do not compensate for this. A thin footprint is itself a strong indicator of low brand maturity or weak digital footprint.
For practical purposes, this evaluation is sample-based. Our goal is to construct samples that are random enough to avoid gaming, but representative enough to reflect real production usage.
We therefore built two datasets:
- Core distribution. A baseline sample across public and private brands, primarily in the US and EU.
- Long-tail distribution. An uncurated global sample of micro-businesses and local service providers.
The results should be read with three principles in mind:
- First-party sources. We rely primarily on first-party surfaces, the brand’s own website and managed social profiles, and cross-check across them when possible to increase accuracy.
- Long-tail coverage. We deliver strong performance on the long tail, showing that the pipeline works beyond curated cases.
- Non-padding. When we return
null, it reflects weak, missing, or ambiguous public signals. In open-web data, missing information often correlates with low brand maturity or weak digital presence.
The attributes reported here are the primary inputs used to compute both production and evaluation signals in the Signal Catalog.
Core distribution
Coverage was measured on an expanded sample of 542 brands built from existing financial-industry customer datasets. The sample is 30–40% public and 60–70% private (startups and SMBs). It is mostly US and EU, with extra weight on SMBs, local businesses, and long-tail domains to stress test performance outside large enterprise brands.
Core attributes
Core attributes used to identify and resolve merchant entities.
| Datapoint | Coverage |
|---|
| Logo (Any) | 95% |
| Logo (Dark) | 87.5% |
| Icon (Any) | 83.8% |
| Logo (Light) | 58.3% |
| Symbols | 33.4% |
| Colors | 97% |
| Banner | 84% |
Key takeaway: Core identity coverage remains above 95%. This indicates the system is driven by algorithmic discovery and resolution. Lower coverage for symbols and specific variants provides an organic signal of a merchant’s digital and brand operation.
Firmographic and financial data
Data density supporting KYB and compliance workflows.
| Datapoint | Coverage |
|---|
| Description | 94.1% |
| longDescription | 95% |
| Social Links | 86.5% |
| Country | 77.1% |
| City | 76% |
| Kind | 71% |
| Founded year | 66.8% |
| Employees Count | 79.3% |
| ISIN | 32.1% |
| Stock | 31.7% |
Key takeaway: Coverage drops in a predictable way as fields rely more on formal disclosure and public-company status. This is expected in a mixed enterprise and long-tail sample. Financial identifiers remain limited to public companies.
Long-tail distribution
Coverage was measured on a global sample of 397 micro-businesses. These entities were randomly selected from Google Maps across a diverse set of cities and regions (without filtering for brand maturity, technical sophistication, or even the presence of a working domain).
The sample spans 16 cities worldwide, including Europe, North America, Latin America, Africa, the Middle East, and Asia (e.g., Lausanne, Rotterdam, London, Barcelona, Tashkent, Dubai, Hanoi, Fukuoka, Dallas, New York, Nagpur, Accra, Lima, Cali, Durban).
This dataset is designed to test performance at the extreme long tail: local, non-tech, offline-first businesses that may lack a formal brand, a maintained website, or any structured public footprint. It represents a lower bound on expected coverage.
Core attributes
Core attributes used to identify and resolve merchant entities.
| Datapoint | Coverage |
|---|
| Logo (Any) | 85.9% |
| Logo (Dark) | 65.7% |
| Icon (Any) | 58.4% |
| Logo (Light) | 20.4% |
| Symbols | 0.76% |
| Colors | 79.6% |
| Banner | 36.3% |
Key takeaway: Even at the extreme long tail, 70–82% of merchants still expose at least one core identity signal. When coverage is lower (for example, for symbols), this reflects a weak or immature public footprint, which is itself a meaningful signal.
Firmographic and financial data
Data density supporting KYB and compliance workflows.
| Datapoint | Coverage |
|---|
| Description | 80.6% |
| longDescription | 88.9% |
| Social Links | 70.0% |
| Country | 35.8% |
| City | 34.5% |
| Kind | 34.0% |
| Founded year | 27.7% |
| Employees Count | 36.5% |
Key takeaway: Given that these are random micro-businesses with very little formal data, this level of coverage is strong. With ~70–80% descriptive and social signals, most businesses still expose enough digital surface to allow identity resolution, even when traditional firmographics are missing.