Helper Function: Normalize Phone Numbers

Published

May 15, 2025

This is the first in a series of helpful R functions that I’ve developed, massaged, and used over the years. I’m titling these “helpR” posts in the naming style of many R packages.

This first function allows for the normalization of phone numbers. I created this function as I was trying to match user-entered numbers from our patient guarantor data with those from our fundraising database.

•   XXX-XXXX for 7-digit numbers
•   XXX-XXX-XXXX for 10-digit numbers

Invalid numbers (not 7 or 10 digits) are replaced with a custom value (default NA).

Obviously this doesn’t work for numbers that have extensions or country codes, but as my use case is primarily for home and cell phone numbers, that hasn’t been a big issue so far.

# phone is a character vector of phone numbers to be cleaned
# invalid is what to assign to numbers that don't have 7 or 10 digits
normalize_phone <- function(phone, invalid = NA) {
  # remove punctuation
  phone <- gsub("[[:punct:]]", "", phone)
  # remove spaces
  phone <- gsub(" ", "", phone)
  # mark invalid numbers
  phone[!nchar(phone) %in% c(7, 10)] <- invalid
  # format 7-digit numbers
  phone[nchar(phone) %in% 7] <- gsub(
    "(^\\d{3})(\\d{4}$)",
    "\\1-\\2",
    phone[nchar(phone) %in% 7]
  )
  # format 10-digit numbers
  phone[nchar(phone) %in% 10] <- gsub(
    "(^\\d{3})(\\d{3})(\\d{4}$)",
    "\\1-\\2-\\3",
    phone[nchar(phone) %in% 10]
  )
  # return the cleaned number
  phone
}