Blog

Notes and tutorials on R, data science, and prospect development.
Written for my future self and hopefully useful to you too.

Stopping Scripts in R

January 15, 2026

I use a series of scripts to generate regular reports, and I often want the process to pause at key points—either to check progress or to flag issues that need my attention. One common example involves a lookup table that maps each fundraiser to their team. I append that team information to my reports so I can easily segment the data by team and individual fundraiser.

But what happens when something changes? A new team member joins and isn’t in the lookup table yet. Someone gets married and their last name changes. Suddenly, that person is no longer associated with the correct team, and the report quietly becomes wrong. Here’s how I catch those issues before they slip through.

Build the Data

Portfolio

#| warning: false
#| message: false

# Load the dplyr library
library(dplyr)

# Build a sample portfolio
df <- tibble(fundraiser = c("Carol", 
                            "Bob", 
                            "Rachel"),
             constituent = c("Charles Dickens", 
                             "Victor Hugo", 
                             "Jane Austen")
             )

# Look at the data
df

Lookup Table

# Create a fundraiser lookup table
lookup <- tibble(fundraiser = c("Carol", "Rachel"),
                 team = c("Major Giving", "Annual Giving"))

# Look at the data
lookup

Join the Data

Now, let’s see what happens when I join lookup to df.

df <- left_join(df, lookup, by = c("fundraiser" = "fundraiser"))

df

If I wasn’t paying attention, I might keep going, produce a beautifully formatted report, save it to Excel, and only then realize that Bob isn’t connected with a team.

Stopping the Script

There are a few ways to have R stop the script. The basic structure is if a condition exists, then do something. Here, we check for empty strings or NA’s in the data and, if they’re found, stop and display the message.

#| error: true

if(any(df$team == "" | is.na(df$team))) {
  stop(paste("Someone is missing a team. Does the lookup table need to be updated?"))
}

This is a simple way to throw an error if something doesn’t look right, but we can take it a step further by listing the names of the fundraisers who are missing a team.

#| error: true

if(any(df$team == "" | is.na(df$team))) {
  stop(
    paste(
      "These fundraisers have no team:",
      paste(df$fundraiser[df$team == "" | is.na(df$team)], collapse = ", ")
    )
  )
}

Wrapping it in a Function

It is conceiveable that we might want to run similar checks on other columns in the data. Here is a function that does the same process, but wrapped in a more flexible package that can be pointed at any data set, name column, and target column.

check_missing <- function(data, name_col, target_col) {
  
  # Convert column names to symbols
  name_col <- rlang::ensym(name_col)
  target_col <- rlang::ensym(target_col)
  
  # Identify rows where the target column is blank or NA
  missing_rows <- data |> 
    dplyr::filter(is.na(!!target_col) | !!target_col == "") |> 
    dplyr::pull(!!name_col)
  
  # If any are missing, stop with a helpful message
  if(length(missing_rows) > 0) {
    stop(
      paste(
        "These entries have missing values in", rlang::as_string(target_col), "-", paste(missing_rows, collapse = ", ")
      )
    )
  }
  invisible(TRUE)
}

#| error: true

check_missing(df, fundraiser, team)

→ The Most Important Machine Learning Equations ★

September 01, 2025

link machine learning

Whether you’re building models, optimizing algorithms, or simply trying to understand how ML works under the hood, mastering the core equations is essential. This blog post is designed to be your go-to resource, covering the most critical and “mind-breaking” ML equations—enough to grasp most of the core math behind ML.

I love seeing people dig into the math behind machine learning algorithms. For each one, the blog shows the:

Equation
Explanation
Practical use
Python implementation

→ P-Hacking in Startups ★

June 26, 2025

link statistics

You looked at the results, picked the one that cleared the bar, and ignored the fact that the bar was never calibrated for this setup.

This is a nice, real world example of the use case for the Bonferroni correction.

Helper Function: Load Files

June 12, 2025

This function looks at the files inside of a specified folder and find the most recent file. I often do regular reporting and, depending on the project, I’ll organize my files in one of two ways:

By date I ran the data (i.e 2025-06-12 Report Data/2025-06-12 File 1.csv)
By file type (i.e. Address Data/2025-06-12 Address Data.csv)

In the case of the latter, I want to be able to get the most recent file without manually typing in the date into my code. Doing things that way ensures I will inevitably forget to adjust the date the next time and end up with the wrong results.

Here is the code:

#| eval: false
#| message: false
#| warning: false
#| include: true

library(tidyverse)

load_file_list <- function(folder) {
  
  # Load the list of files into a tibble called 'files' 
  files <- tibble(filename = list.files(folder))
  
  # Extract the dates from the file names in their own column
  # These represent different ways of formatting dates
  files <- files |> 
    mutate(
      filedate = case_when(
        str_detect(filename, "\\d{4}-\\d{2}-\\d{2}") ~ ymd(str_extract(filename, "\\d{4}-\\d{2}-\\d{2}")),
        str_detect(filename, "\\d{2}.\\d{2}.\\d{4}") ~ mdy(str_extract(filename, "\\d{2}.\\d{2}.\\d{4}")),
        str_detect(filename, "\\d{2}.\\d{1}.\\d{4}") ~ mdy(str_extract(filename, "\\d{2}.\\d{1}.\\d{4}")),
        str_detect(filename, "\\d{1}.\\d{1}.\\d{4}") ~ mdy(str_extract(filename, "\\d{1}.\\d{1}.\\d{4}")),
        str_detect(filename, "\\d{8}")               ~ mdy(str_extract(filename, "\\d{8}")),
        TRUE ~ NA_Date_
      )
    )
  
  # Extract the most recent report date for later file naming
  file_date <- files |> 
    filter(filedate == max(filedate)) |> 
    distinct()
}

How do I use it? The first thing I usually do is extract the date of the most recent file. I use this when naming my output files so that the date of my file matches the dates of the data.

The here package is incredibly helpful for referencing files relative to the project you’re working in and prevent things from breaking when switching machines or sharing code.

#| eval: false
#| message: false
#| warning: false
#| include: true

# Get the list of files from the specified folder
address_data <- load_file_list(here::here("data - input", "Address Data"))

# Extract the most recent date
address_date <- pull(pyr_date[,1])

# Ensure the date is formatted as a date
address_date <- ymd(pyr_date)

# Read in the files from the folder
address_df <- read_csv(here("data - input", 
                            "Address Data", 
                            address_date, 
                            "Address Data.csv")
                       )

Getting Time in Stage from Blackbaud’s SKY API

May 28, 2025

prospect development blackbaud r

Background

There are many things I find frustrating about Blackbaud’s Raiser’s Edge NXT product, but one that has caused more headaches than anything else has been the inability to query when a prospect status was applied to a record. I can click on a prospect’s record and see when a status was last changed, but there is no way to get that data for a list of prospects from within lists, queries, reports, or even a SQL backup of the database.

The only place to access that information is via Blackbaud’s SKY API. The specific API - Prospect status (Get) - returns five data points:

comments - comments made when the status was changed
constituent_id - the system record ID of the prospect
days_elapsed - the number of days the prospect has been in the status
start - the start date of the prospect status
status - the current prospect status

Setup

Here are the libraries I’m using.

#| label: libraries
#| eval: false
#| message: false
#| warning: false
#| include: true

# Load libraries
library(tidyverse)
library(httr)
library(jsonlite)

To set up the application, you will need five pieces of information:

Authentication URL
Token URL
Application ID
Client Secret
API Key

1 & 2 are straightforward; the URLS can be found in the code below. 3-5 are a bit more tricky, but Blackbaud has a good walkthrough to get started.

Setting up an Application

When setting up your application, you may need to work with your organization’s database manager to get it connected and authorized with your system.

Tip

I recommend setting all of your application permissions to read only. That way there is no danger of accidentally overwriting or deleting information.

Tip

The website URL is listed as a required field, but that’s more for publishing your application on the marketplace. I used https://example.com in mine.

While I was focused on accessing the prospect status history, I was also interested in playing with Blackbaud’s Power BI Connector. If you do that, adding these redirect URIs to your application can be helpful:

https://oauth.powerbi.com/views/oauthredirect.html
http://localhost:8080
http://localhost:1410

The Code

Authentication

While there are ways to securely store these data points in R, I left mine in the code since this script was just for me. Obviously, you’ll need to paste your own application id, client secret, and API key.

#| label: app-details
#| eval: false
#| message: false
#| warning: false
#| include: true

auth_url <- "https://app.blackbaud.com/oauth/authorize"
token_url = "https://oauth2.sky.blackbaud.com/token"
application_id <- "paste application id here"
client_secret <- "paste client secret"
api_key_primary <- "paste api key"

Now we’ll build the client that will connect to the SKY API.

#| label: client-setup
#| eval: false
#| message: false
#| warning: false
#| include: true

# Build the client 
blackbaud_client <- function() { 
  oauth_app(appname = "sky-api-demo", 
            key = application_id, 
            secret = client_secret) }

Building the authentication function is honestly what took me the longest to figure out.

#| label: auth-setup
#| eval: false
#| message: false
#| warning: false
#| include: true

# Function to get the token 
get_blackbaud_token <- function() { 
  
  # Perform the OAuth 2.0 authorization flow 
  token <- oauth2.0_token( 
    endpoint = oauth_endpoint(authorize = auth_url, 
                              access = token_url), 
    app = blackbaud_client(), 
    scope = "constituent_read", 
    cache = FALSE ) 
  
  # Store the token in the global environment 
  assign("blackbaud_token", token, envir = .GlobalEnv)
  
  # Print token details 
  print(paste("Token: ", 
              token$credentials$access_token)) 
  print(paste("Expires in: ", 
              token$credentials$expires_in, 
              " seconds")) }

This code looks for an existing authorization token. If one doesn’t exist, it opens up a web browser and requests you sign in to authenticate. Once authenticated, the token will be stored in your R environment and you can close your browser.

#| label: authenticate
#| eval: false
#| message: false
#| warning: false
#| include: true
# Fetch the token if it's not already in the environment 
if (!exists("blackbaud_token")) { get_blackbaud_token() 
} else { 
  # Deserialize the token JSON back to an R object 
  blackbaud_token <- fromJSON(blackbaud_token) 
  
  # Print token details 
  print(paste("Token: ",
              blackbaud_token$credentials$access_token))
  print(paste("Expires in: ",
              blackbaud_token$credentials$expires_in, "
              seconds")) 
}

Accessing SKY API

Here is a function to call the prospect status API, pass it the authentication token and record ID and get the response back.

Warning

The API references constituent_id, but be aware that what it actually wants is the system_record_id.

#| label: fun-get-status
#| eval: false
#| message: false
#| warning: false
#| include: true

# Define the function to get the prospect status
get_prospect_status <- function(constituent_id) {
  url <- paste0(
    "https://api.sky.blackbaud.com/constituent/v1/constituents/",
    constituent_id,
    "/prospectstatus"
  )
  response <- GET(
    url,
    add_headers(
      Authorization = paste("Bearer", blackbaud_token$credentials$access_token),
      Accept = "application/json",
      `Bb-Api-Subscription-Key` = api_key_primary
    )
  )

  if (status_code(response) == 200) {
    return(content(response, "parsed"))
  } else {
    stop(
      "Failed to retrieve prospect status. Status code: ",
      status_code(response)
    )
  }
}

To be polite, we need to pay attention to the rate limits that Blackbaud places on calls to its API so that our function isn’t limited or cut off.

These options will be used in the call to break up how many IDs we sent in a batch and to keep track of how many calls we’ve made.

#| label: api-process
#| eval: false
#| message: false
#| warning: false
#| include: true
# Create API Call Process -------------------------------------------------

# Define parameters
batch_size <- 10 # Number of calls per batch
daily_limit <- 25000
calls_made <- 0

Finally, we’re ready to call the API and get our data. Here I pass a tibble - sql_re_statuses - that contains the system record IDs for everyone in Raiser’s Edge who has any prospect status. Since our teams are more familiar with the Constituent ID, I also include any other fields that my team might find helpful when working with these results including Constituent ID and Constituent Name.

#| label: run-api
#| eval: false
#| message: false
#| warning: false
#| include: true

# Process IDs with batching and rate limiting
time_in_stage <- sql_re_statuses |>
  mutate(batch = ceiling(row_number() / batch_size)) |>
  group_by(batch) |>
  group_map(~ {
    if (calls_made >= daily_limit) {
      stop("Daily API limit reached.")
    }
    Sys.sleep(1) # Pause for 1 second between batches
    calls_made <<- calls_made + nrow(.x) # Update the call counter
    .x |> mutate(response = map(record_id, ~ get_prospect_status(.))) # Use map to apply function
  }, .keep = TRUE) %>% # Keep grouping structure
  bind_rows()

The Results

Once the API runs (about 15-20 minutes for ~6,500 constituent records), we can take the results, apply a small amount of cleanup, and store them in a tibble for futher analysis or saving.

#| label: clean-results
#| eval: false
#| message: false
#| warning: false
#| include: true

# Combine all results
time_in_stage <- time_in_stage |>
  select(-batch) # Clean up temporary batch column

# Unlist results into a tibble
time_in_stage <- time_in_stage |>
  unnest_wider(response, names_sep = "_") |>
  mutate(response_start = as.Date(response_start))

# Clean up data before saving
time_in_stage <- time_in_stage |>
  select(
    record_id,
    constituent_id,
    constituent_name,
    response_status,
    response_comments,
    response_start,
    response_days_elapsed
  ) |>
  rename(
    prospect_status = response_status,
    ps_comments = response_comments,
    ps_start = response_start,
    ps_days_elapsed = response_days_elapsed
  ) |>
  mutate(ps_start = ymd(ps_start))

Something I’ve started doing is immediately saving the results out to an .RData file. This ensures I have a clean copy in case I accidentally filter the data or restart the session. I can simply load the file and the results will be back in my environment exactly as they were at the end of the previous step.

#| label: save-results
#| eval: false
#| message: false
#| warning: false
#| include: true
# Save an explicit copy of this data since it is so time consuming to refresh
save(time_in_stage, file = here("data - output",
                                sql_refresh_date,
                                paste(sql_refresh_date, 
                                      "Time in Stage - SkyAPI.RData")))

Disqualifying Prospects

May 28, 2025

prospect development

I imagine that most of us are familiar with the traditional prospect statuses that make up the donor lifecycle.

What About Prospects Who Won’t Become Donors?

Not every prospect will become a donor—and that’s where disqualification comes in.

Disqualified prospects are not considered active or eligible for portfolio assignment. Any disqualification must be accompanied by an action that clearly documents the reason(s) behind the decision. These records should be reviewed regularly to determine whether a prospect should be re-evaluated and possibly reintroduced into the prospect pool.

The Importance of Status and Documentation

Many organizations use different disqualification statuses. The success of any status depends on the quality of the documentation supporting it. Each record should summarize:

What status is being applied
Why it is being applied
Specific details justifying the decision

In cases like non-responsiveness, documentation may include a history of contact attempts. In sensitive cases (e.g., medical or legal issues), less specific but still explanatory documentation may be appropriate.

The goal is to maintain a clear history of the prospect’s engagement to guide future staff. Ask yourself: if you left your role tomorrow, would others understand your decisions and actions?

Important

When writing documentation, always keep in mind that a prospect can request their records at any time. Are you comfortable with them seeing what you’ve written?

Disqualification Status Examples

Non-Responsive

A prospect who has not responded to repeated, documented contact attempts—across multiple channels over a defined period—may be marked as “Non-Responsive.” The criteria for frequency, channels, and timing should be defined by your organization.

Before applying this status, fundraisers are expected to have exhausted all known outreach methods.

Not a Current Prospect

These individuals may be willing to give but are currently unable due to timing—such as financial obligations, life events, or other circumstances.

They may remain in a portfolio if active cultivation is ongoing, or be unassigned and revisited later.

Not a Major Giving Prospect

These prospects have the capacity for a major gift but have indicated they will not give at that level. Definitions of “major gift” vary by organization.

Although they may continue giving at lower levels, they’re not currently suitable for major gift portfolios. They may be stewarded elsewhere, with the hope of future capacity-aligned engagement.

Not a Prospect

These are individuals who should not be included in a fundraiser’s portfolio, even if they remain donors. This status may result from the prospect’s own request, staff recommendation, or external input.

Documentation must include the rationale and who recommended the status. In some cases, solicit codes may be added to reflect communication preferences.

Permanent Stewardship

Reserved for a small group, this status applies to individuals who have made a lifetime gift and should not be solicited again (e.g., the widow of a major donor).

Use this designation only with leadership approval, as it signifies a permanent shift from solicitation to appreciation.

Relationship Assignment

These individuals are in the portfolio due to their relationship to a primary prospect (e.g., a prospect’s child or a foundation’s executive director).

They are not prospects themselves but are tracked for context and relational strategy.

Self-Sustaining Donor

Some donors give consistently but are unlikely to increase their giving, regardless of wealth indicators. Examples include year-end donors or those who give during events like Giving Tuesday.

These donors don’t require portfolio assignment, as their giving is independent and recurring.

Helper Function: Normalize Phone Numbers

May 15, 2025

This is the first in a series of helpful R functions that I’ve developed, massaged, and used over the years. I’m titling these “helpR” posts in the naming style of many R packages.

This first function allows for the normalization of phone numbers. I created this function as I was trying to match user-entered numbers from our patient guarantor data with those from our fundraising database.

•   XXX-XXXX for 7-digit numbers
•   XXX-XXX-XXXX for 10-digit numbers

Invalid numbers (not 7 or 10 digits) are replaced with a custom value (default NA).

Obviously this doesn’t work for numbers that have extensions or country codes, but as my use case is primarily for home and cell phone numbers, that hasn’t been a big issue so far.

#| eval: false
#| message: false
#| warning: false
#| include: true

# phone is a character vector of phone numbers to be cleaned
# invalid is what to assign to numbers that don't have 7 or 10 digits
normalize_phone <- function(phone, invalid = NA) {
  # remove punctuation
  phone <- gsub("[[:punct:]]", "", phone)
  # remove spaces
  phone <- gsub(" ", "", phone)
  # mark invalid numbers
  phone[!nchar(phone) %in% c(7, 10)] <- invalid
  # format 7-digit numbers
  phone[nchar(phone) %in% 7] <- gsub(
    "(^\\d{3})(\\d{4}$)",
    "\\1-\\2",
    phone[nchar(phone) %in% 7]
  )
  # format 10-digit numbers
  phone[nchar(phone) %in% 10] <- gsub(
    "(^\\d{3})(\\d{3})(\\d{4}$)",
    "\\1-\\2-\\3",
    phone[nchar(phone) %in% 10]
  )
  # return the cleaned number
  phone
}

Excel to R: VLOOKUP

September 20, 2022

excel r fundraising

One of the advantages to moving away from Microsoft Excel to something like R is the ability to script out exactly what you want the program to do. You might be thinking, “That sounds like a lot of work. I’m already fast in Excel and even have some macros I use to automate my work.” That’s fantastic, and in the short term you may well be quicker in Excel. But what happens if the steps to the macro change or the report is requested more frequently?

Here, we’ll look at recreating the functionality of a VLOOKUP formula in R. In this example, we’ll take a list of gift officers and append their fundraising team. You might imagine using this to create a report segmented by team.

First we need some data. This could come from a .csv or .xlsx file, but for this example I’ll create some sample data.

library(dplyr)
set.seed(123)

fundraisers <- data.frame(PM = randomNames::randomNames(
  n = 10,
  name.order = "first.last",
  name.sep = " "
))

fundraisers

Next, I’ll create a reference table containing my list of fundraisers along with which team they’re a part of. In my regular work, this table usually lives as an Excel file that I update as people come and go from the organization.

lookup <- data.frame(
  Fundraiser = fundraisers$PM,
  Team = c(
    "Major Giving",
    "Annual Giving",
    "Corporate Giving",
    "Corporate Giving",
    "Principal Gifts",
    "Planned Giving",
    "Major Giving",
    "Major Giving",
    "Leadership",
    "Annual Giving"
  )
)

lookup

To join our fundraisers with their respective teams, we can use the left_join function from the dplyr package.

left_join(fundraisers, lookup, by = c("PM" = "Fundraiser"))

Browse by Tag

Reuse

CC BY-SA 4.0