# phone is a character vector of phone numbers to be cleaned
# invalid is what to assign to numbers that don't have 7 or 10 digits
<- function(phone, invalid = NA) {
normalize_phone # remove punctuation
<- gsub("[[:punct:]]", "", phone)
phone # remove spaces
<- gsub(" ", "", phone)
phone # mark invalid numbers
!nchar(phone) %in% c(7, 10)] <- invalid
phone[# format 7-digit numbers
nchar(phone) %in% 7] <- gsub(
phone["(^\\d{3})(\\d{4}$)",
"\\1-\\2",
nchar(phone) %in% 7]
phone[
)# format 10-digit numbers
nchar(phone) %in% 10] <- gsub(
phone["(^\\d{3})(\\d{3})(\\d{4}$)",
"\\1-\\2-\\3",
nchar(phone) %in% 10]
phone[
)# return the cleaned number
phone }
This is the first in a series of helpful R functions that I’ve developed, massaged, and used over the years. I’m titling these “helpR” posts in the naming style of many R packages.
This first function allows for the normalization of phone numbers. I created this function as I was trying to match user-entered numbers from our patient guarantor data with those from our fundraising database.
• XXX-XXXX for 7-digit numbers
• XXX-XXX-XXXX for 10-digit numbers
Invalid numbers (not 7 or 10 digits) are replaced with a custom value (default NA
).
Obviously this doesn’t work for numbers that have extensions or country codes, but as my use case is primarily for home and cell phone numbers, that hasn’t been a big issue so far.