How to clean a messy lead list: the 7-step playbook small sales teams actually use

A practical, repeatable workflow for turning a chaotic CSV of inbound leads into a clean, deduplicated, follow-up-ready list — with the formulas, gotchas, and AI shortcuts that save you the afternoon you'd otherwise lose to it.

May 14, 20269 min read

The hidden cost of a messy lead list

Most small sales teams quietly bleed 10–30% of their pipeline to data hygiene problems and never put a number on it. The leaks are mundane, which is why no one fixes them: a duplicate gets called by two reps, the lead with no phone number gets skipped, the row that says na, na, nasits in the queue forever, and the messy CSV from last Tuesday's campaign breaks the import into HubSpot so you give up halfway through.

You don't need a CRM consultant to fix this. You need a 30-minute ritual you actually run. Here's the 7-step playbook teams actually use, in order, to turn a chaotic spreadsheet of inbound leads into a clean, deduplicated, follow-up-ready list.

Step 1: Get every lead into one place

Cleanup starts with consolidation. Before you touch a single row, pull every place leads currently live into one master file: the CSV export from your ad platform, the Google Sheet your receptionist maintains, the inbox where form submissions land, the trade-show scanner export, the rough notes you took on the phone last week.

Two practical rules for this step:

Date-stamp the master file — leads_master_2026-05-14.csv. You'll want a snapshot if anything goes wrong later, and future-you needs to know which version is current.
Don't clean partial copies.If you start deduping the Google Sheet without merging in the trade-show export first, you'll find the duplicates again the next time you consolidate. Always consolidate first, then clean once.

If you're in Google Sheets, the fastest way to stack two sheets vertically is the array literal: ={SheetA!A:G; SheetB!A:G}. The semicolon stacks rows; the comma would put them side by side. Same headers in both sheets is a precondition — which is conveniently the next step.

Step 2: Standardise your column headers

The most common header chaos: Name, Full Name, Lead Name, First, Customer Name, and Contactall meaning the same field across five different sources. You can't dedupe what isn't aligned.

The canonical schema small sales teams should standardise on:

first_name
last_name
email
phone
source
status
notes
created_at
next_follow_up

These names map cleanly to HubSpot, Salesforce, Pipedrive, and Zoho import schemas, so future-you doesn't have to rename anything. One nuance: AI-powered cleaners read row content rather than relying on column names, so if you go that route you can technically skip this step. But if you're cleaning manually, do it now — everything downstream depends on it.

Step 3: Normalise phones and emails

You can't dedupe what isn't normalised. Two rows with (604) 555-1234 and +1-604-555-1234 will look like two different leads to a string match. Same with Mike@Example.com and mike@example.com.

Phones. Pick one format and convert everything to it. The international standard is E.164: +1XXXYYYZZZZ(no spaces, no dashes, no parens, country code in front). In Google Sheets, strip everything that isn't a digit with =REGEXREPLACE(A2, "[^0-9]", ""), then prepend the country code with ="+1" & A2. Leads from outside North America need their country code preserved — if every lead is local, you can hardcode it.

Emails. Lowercase and trim whitespace. =LOWER(TRIM(B2)). That's it — don't try to validate or correct typos in this step, you'll surface those later.

One trap to avoid: don't overwrite the original column. Add a phone_clean and email_clean alongside the raw fields so you can audit what changed if a row looks wrong.

Step 4: Remove duplicate leads

Now you can dedupe. The match priority that works in practice:

Email match (strict). Same normalised email = same person. Auto-merge.
Phone match (strict). Same normalised phone = same person. Auto-merge.
Name + locality match (probable).Same first + last name and same area code or city, different email/phone — flag for human review, don't auto-merge.

The simplest spreadsheet method: in a helper column, =COUNTIF(B$2:B2, B2)>1 will return TRUE on every row whose email already appeared in a row above. Filter for TRUE and review.

Merge, don't delete. If row A has (jane@example.com, no phone) and row B has (jane@example.com, +16045551234), delete row A and keep row B with the phone preserved. You almost never want to lose a known contact method.

The edge case worth thinking about: same first name, same last name, same company, two different emails. That can be the same person (work + personal) or two literal different humans (siblings, common name). Surface those for review before merging — calling the wrong "Mike Brown" is the kind of mistake that ends a conversation.

Step 5: Filter junk rows

Some rows shouldn't be in the pipeline at all. Junk falls into three buckets:

No contact method. No email, no phone, no social handle. There is no way to follow up. Skip.
Test entries. test test test 555-555-5555, asdf, asdf, asdf@asdf.com. Common when a developer was checking the form last week.
Bot/spam form fills. Random Cyrillic name, gibberish email, no phone. Easy to spot in batch.

Don't auto-delete junk.Move it to a separate "rejects" sheet so you can audit volume by week. A sudden spike in junk usually means the form has been picked up by a bot or the "phone is required" rule got dropped from the landing page during a redesign. The reject sheet is your early warning system.

Step 6: Enforce required fields before CRM import

Most CRM imports fail because rows are missing the fields the CRM insists on. The non-negotiables, by tool:

HubSpot— first name, last name, and at least one of email or phone.
Salesforce— last name and assigned owner. Email and company are usually required by org-level validation rules even when the field is technically optional.
Pipedrive— person name, plus a contact method. Stage and owner if you're importing as deals.

Add a single helper column called _ready_for_importthat returns TRUE only when every required field is non-empty. Filter the sheet to TRUE rows for the import; keep FALSE rows in a separate sheet you can come back to once you've enriched the missing data manually.

This single step prevents the import-rejected-rows nightmare where you spend an afternoon hand-fixing a CSV the CRM keeps spitting back at you.

Step 7: Save an audit copy and import

Before you run the import, save two files side by side:

leads_clean_2026-05-14.csv — the CRM-ready sheet you're about to import.
leads_rejects_2026-05-14.csv — everything you removed: dupes, junk, missing-required rows.

Two reasons this matters. One: if the CRM import goes sideways (bad mapping, wrong owner assigned, accidentally tagged the wrong campaign), you have the original to reimport from. Two: when someone asks "hey, what happened to that lead from Bob's referral last month?" you have the trail to answer them. Without an audit copy, the only honest answer is "I don't know."

When the manual playbook is too slow

The seven steps above work. They're also slow. A clean cycle on 200 messy rows is 30–60 minutes if you're fluent in the formulas, longer if you're looking up syntax as you go. If you run a campaign every week, that's an afternoon a month spent deduping spreadsheets instead of selling.

The natural next step is automation. An AI CSV lead cleaner runs the same playbook in one pass: extracts fields from messy rows (even when headers are missing or merged), normalises phones and emails, detects duplicates by email and phone, flags junk rows for review, and outputs a CRM-ready CSV. Same canonical schema, same merge-don't-delete logic, same "flag for review" behaviour on probable matches. The difference is it takes 60 seconds instead of an afternoon.

The other half of the equation, once your list is clean, is making sure no lead falls through the follow-up cracks — which is a different problem with a different toolset. If you're feeling that pain, the companion playbook is tracking sales follow-ups without a full CRM.

Honest advice: if you only run one campaign a quarter and your list is small, manual cleanup is fine. The time-to-automation tipping point is usually around "I'm doing this every week and starting to dread it."

The 7-step checklist (save this)

The whole playbook in scannable form:

Consolidate. Pull every lead source into one date-stamped master file before you touch any row.
Standardise headers. first_name, last_name, email, phone, source, status, notes, created_at, next_follow_up.
Normalise. Phones to E.164, emails lowercased and trimmed, into clean helper columns.
Dedupe.Match on email, then phone, flag probable name+locality matches for review. Merge, don't delete.
Filter junk. No-contact rows, test entries, bot spam to a separate rejects sheet.
Required fields. Helper column _ready_for_import; only import TRUE rows.
Save audit copies. Date-stamped clean and rejects files before you import.

Run this every time a campaign drops fresh leads. Or skip the spreadsheet altogether and let the AI do it — the workflow is the same either way.

Stop fighting your spreadsheet.

Lead Sorted does the cleanup, dedup, and follow-up tracking you keep meaning to get to. Free demo, no card, 14-day trial on every paid plan.

Try the demo See pricing

Discord