How We Source and Verify Contact Data

The most common question we get from new clients is some version of: "Where does this data actually come from, and how do I know the phone number is real?" It is a fair question. The lead data industry has a long history of recycled lists, inflated hit rates, and vague sourcing language. This piece explains exactly what we do, step by step, including the parts that add cost and time but produce a meaningfully different product.

Step 1: Direct Pulls from Secretary of State Records

Every record starts at the Secretary of State level. We pull LLC formation filings from official state databases across our 12 covered states. This is not web scraping. Each state offers some combination of official data feeds, bulk download services, or structured APIs for commercial data users. We access the primary source, not a third party who already processed it.

Why this matters: scraping introduces lag and error. A site-scraped record from Texas might be 3 to 7 days behind a direct pull. It may also miss filings that appear in the bulk data but not yet on the public search interface, or pick up test records and duplicates that the official feed resolves correctly. We get the filing data the same day it becomes available through official channels. For most states, that means same-day or next-business-day.

Each filing record at this stage contains the company name, formation date, registered agent, organizer name and address, and the principal office address when provided. This is the legal record. It is accurate as of filing, but it does not contain the phone number or email address you actually need to reach someone.

Step 2: Parsing the Filing for Identity Signals

Most LLC filings are thin. The minimum required information is the company name, a registered agent, and an organizer. The registered agent is almost always a commercial service, not the owner. The organizer is sometimes an attorney. The principal address might be a home address, a UPS store, or a coworking space.

We parse each filing to extract three identity signals: the organizer name, the principal address, and the company name itself. These three pieces, taken together, give us enough to run enrichment. The company name tells us what kind of business this is. The principal address tells us where the owner is operating. The organizer name is usually the owner, and when it is, that is the contact we are looking for.

When the organizer is clearly an attorney or registered agent mill (identifiable by name patterns and address matching our internal blocklist), we mark the organizer field as non-contactable and shift enrichment to the principal contact instead.

Step 3: Contact Enrichment

This is where the filing data becomes lead data. We append phone and email through commercial enrichment providers, cross-referencing against multiple sources before settling on a contact record.

The process is not a single lookup. We run each organizer name and address combination against multiple data providers and score the results by match confidence and source agreement. A phone number that appears in three independent data sources with consistent name matching gets a high confidence score. A phone number that appears in one source with a partial name match gets flagged for additional verification.

We do not output records with no contact data. If enrichment fails to find a phone or email with sufficient confidence, the record is held and we note it as contact-incomplete. These records are not sent to clients. You only receive records where we have at least one contactable field.

What the Enrichment Sources Include

Commercial data providers aggregate contact information from a range of inputs: business registrations, self-reported professional directories, public records, and opt-in data sources. No single provider covers every individual. Running against multiple providers substantially improves coverage without proportionally increasing cost on our end. The specific providers we use are not something we publish, both for competitive reasons and because our provider mix changes as we evaluate quality over time.

Step 4: Verification

Enrichment gives us a candidate contact record. Verification tells us whether that record is live.

Phone Verification

Every phone number goes through a carrier-level ping before delivery. This is not a call. It is a lookup that confirms the number is assigned to an active line and identifies whether it is a mobile or landline. Numbers that return as disconnected, unassigned, or ported out of service are removed.

After carrier verification, we cross-reference against a DNC registry lookup and flag any numbers on federal or state do-not-call lists. We pass those flags to clients in the delivery file so you can make your own decision on outreach method rather than having us silently suppress the record.

Email Verification

Email addresses go through a three-stage check: syntax validation, domain MX record verification, and mailbox-level deliverability testing. The mailbox check confirms that the address exists on the mail server without sending a message. Addresses that fail any stage are removed from the deliverable record.

Quality Metrics

Based on records delivered in 2024 and Q1 2025:

Metric Rate
Phone connect rate (carrier-verified active) 89%
Email deliverability (mailbox-level verified) 94%
Hard bounce rate (email) <3%
Records delivered with both phone and email 71%
Records held for insufficient contact data ~12% of raw filings

The 89% phone connect rate is a carrier-level metric, not a conversation rate. It means the number is active and can receive calls. Actual answer rates depend on timing, your caller ID reputation, and the individual owner's habits. We report what we can measure objectively.

What We Filter Out

Not every LLC filing represents an actual small business owner who needs services. A significant portion of filings are noise for our clients' purposes, and we filter them before delivery.

Registered Agent Mills

Companies like Northwest Registered Agent, Incfile, ZenBusiness, and dozens of others act as the registered agent and often the organizer for LLCs they form on behalf of clients nationwide. A filing where the organizer is one of these services has no useful owner contact information at the filing level. We maintain a blocklist of known formation mills and their address patterns. Filings that match are filtered unless we can identify and verify an underlying owner through enrichment.

Franchise Filings

A new Subway franchise or Anytime Fitness location forming an LLC is not an independent business owner who needs business insurance or a new CPA. It is a franchisee who already has vendor relationships dictated by the franchise agreement. These filings are identifiable by company name patterns and registered agent information. We filter them.

Shelf Companies and Serial Filers

Some individuals file multiple LLCs in a short period, often for asset protection structures, real estate holdings, or speculative purposes. A person who files 6 LLCs in 30 days is not 6 separate new business prospects. We identify serial filer patterns and consolidate or flag these records accordingly.

Large Business Subsidiaries

Publicly traded companies and large private businesses routinely form subsidiary LLCs for specific projects, acquisitions, or liability segmentation. These are identifiable through registered agent patterns (national corporate services firms) and company naming conventions. They are not the small business market our clients serve.

Why Exclusivity Matters

Data quality is partly a function of how many people are using the same records to contact the same individuals. A new LLC owner who receives 12 calls in their first week from insurance agents, accountants, and attorneys is going to stop answering numbers they do not recognize. The data does not degrade in the file, but its usability degrades in the market.

We sell exclusive access by territory and vertical. One insurance agent per county, one CPA per county, and so on. This is not a sales pitch for scarcity. It is a recognition that the product stops working if we sell it to everyone who asks. A contact record with a 94% deliverability rate that 10 different people have already emailed is worth considerably less than one that 1 person has.

The exclusivity model also gives us better signal on data quality. When a client in a territory reports that a contact bounced or a number was wrong, we know it came from us and not from a competing provider. That feedback loop improves the product over time in a way that is not possible when we are one of many sources a client is mixing together.

If you want to see what a sample file looks like for your area, the request link below will get you actual records from the most recent filing batch, not a demo with fake names.

See what filed this week.

Free sample from your area. Real data, real contact info.

Request Sample