Graphically Pawsitive Vibes

Proposal

This project will graphically depict trends, patterns, and analysis of Animal Shelter data derived from the TidyTuesday initiative. The dataset includes information about animals taken in and released from the shelter; there are variables such as intake type, outcome type, animal type, breed, age, and more. The primary goal of this project is to identify key factors that influence an animals outcome.
Author
Affiliation

Isle of Dogs

School of Information, University of Arizona

Dataset

longbeach <- tt_load("2025-03-04")
longbeach = longbeach$longbeach

This TidyTuesday dataset is a conglomeration of different species retrieved in Long Beach, CA. It contains 22 columns and 29,787 rows from 2021-2025, providing potential insights into rescued animals at the city’s shelter. The underlying data originate from the City of Long Beach Animal Care Services Open Data portal, which publishes daily intake and outcome records for the Long Beach Animal Shelter. The data includes variables such as intake type, outcome type, animal type, age, breed, and geographical jurisdiction. We chose this dataset because it offers an opportunity to explore animal shelters and outcomes. Meaningful and relevant analysis could have direct implications for animal welfare policies and shelter resource management.

Questions

  1. How do intake conditions and animal types affect the animal’s outcome?

    • Hypothesis: Animals with poor intake conditions (sick or injured) will have a higher likelihood of negative outcomes (euthanasia). Certain species, such as dogs, tend to have higher adoption rates compared to others.
  2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?

    • Hypothesis: Healthy animals will more commonly experience a shorter duration from intake to adoption.

Analysis plan

Variables

The following variables will be used in our analysis:

  • intake_condition: The condition of the animal at the time of intake
  • animal_type: The type of animal
  • outcome_type: The outcome of the animal
  • dob: Date of birth of the animal
  • days_to_outcome: Date the animal had an outcome outcome_date - income_date: Date the animal arrived
  • intake_year: Year of intake
  • species_group: A simplified grouping of animal_type into “Dog,” “Cat,” and “Other”
  • days_to_adoption: The number of days from intake to adoption

Cleaning

We go ahead and clean and mutate the data we know the team will need.

longbeach_clean <- longbeach |>
  
  # standardize intake_condition
  mutate(
    intake_condition = intake_condition |>
      str_squish()             # collapse multiple spaces
      |> str_replace_all("/", " ")  # replace slashes with spaces
      |> str_to_lower()             # convert to lowercase
  ) |>
  
  # create a simplified species_group
  mutate(
    species_group = case_when(
      animal_type == "dog" ~ "Dog",
      animal_type == "cat" ~ "Cat",
      TRUE ~ "Other"
    ) |> 
      factor(levels = c("Dog", "Cat", "Other"))
  ) |>
  
  # compute age_at_intake and flag unknowns
  mutate(
    age_at_intake = as.numeric(intake_date - dob) / 365,
    age_unknown   = is.na(age_at_intake)
  ) |>
  
  # compute days_to_adoption
  mutate(
    days_to_adoption = as.integer(outcome_date - intake_date)
  )

Other Species Breakdown

Dogs and cats account for the majority of shelter intakes, so we recode animal_type into a three-level factor—Dog, Cat, and Other. Because the “Other” category still comprises thousands of records, we include a descending horizontal bar chart of animal_type—filtered to species_group == "Other"—to show the most frequent non-dog/cat species. We order bars by count and label each directly. This helps readers understand which rabbits, birds, reptiles, and other species fall into the “Other” category before we proceed with Questions 1 and 2.

Data Diagnostics

Let’s take a look at the data to see any issues:

longbeach_clean |>
  diagnose() |>
  filter(variables %in% c("intake_condition",
                          "animal_type",
                          "outcome_type",
                          "dob",
                          "days_to_outcome",
                          "intake_year",
                          "species_group",
                          "days_to_adoption")) |>
  formattable()
variables types missing_count missing_percent unique_count unique_rate
animal_type character 0 0.0000000 10 0.0003357169
dob Date 3591 12.0555947 5656 0.1898814919
intake_condition character 0 0.0000000 17 0.0005707188
outcome_type character 187 0.6277906 19 0.0006378622
species_group factor 0 0.0000000 3 0.0001007151
days_to_adoption integer 177 0.5942190 360 0.0120858092

Potential Issues

DOB (date of birth) has a seemingly significant number of missing entries. Upon inspection, this seems not to be a factor in our analysis because of how data will be processed. We will need to check this number and see if the effects are meaningful.

Project Goals

1. How do intake conditions and animal types affect the animal’s outcome?

We will evaluate how intake_condition and animal_type influence the likelihood of outcomes for animals in a shelter environment. We hope to identify disparities to inform shelter strategies. For clarity and consistency, animal_type has been re-coded into a species_group category to capture generalized trends across dogs, cats, and other species. For exploratory purposes, we’ll use all 19 of outcome_type’s unique values. Based on the outcome_type distribution, we’ve decided to mutate outcome_type into three categorical values of death, non-death, and adopted.

longbeach_clean |>
  mutate(outcome_category = case_when(
    outcome_type %in% c("adoption", "foster to adopt", 
                        "homefirst") ~ "adopted",
    outcome_type %in% c("euthanasia", "died", 
                        "disposal") ~ "death",
    outcome_type %in% c("rescue", "transfer", 
                        "return to owner", "shelter, neuter, return", 
                        "return to rescue", "transport", 
                        "community cat", "return to wild habitat", 
                        "foster", "trap, neuter, release", 
                        "missing", "NA", 
                        "duplicate") ~ "non-death",
    TRUE ~ "non-death"
  )) |>
  count(outcome_category)
# A tibble: 3 × 2
  outcome_category     n
  <chr>            <int>
1 adopted           6544
2 death             6285
3 non-death        16958

Independent Variables:

  • intake_condition

  • animal_type

Dependent Variables:

  • outcome_type

Proposed Visualizations

Proportion of Outcomes by Animal Type: Bar plot showing the proportion of outcome_types to each faceted species_group. This will highlight which outcomes dominate each species category.

Distribution of Outcomes Across Intake Condition: Bar plot showing distribution of intake_condition for a specific faceted outcome_type. This will identify which intake conditions are more likely to lead to a certain outcome.

2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?

We’ll mutate() a new variable days_to_adoption as the time difference between intake and outcome dates, then analyze its variation by intake_year and outcome_type across years. Because dogs and cats dominate the data, we collapse all other species into a three-level factor species_group (“Dog,” “Cat,” “Other”) for clearer comparisons.

Independent Variables:

  • animal_type

  • intake_condition

  • intake_year (derived from intake_date)

  • outcome_type

Dependent Variables:

  • days_to_adoption (computed by outcome_date - intake_date)

Proposed Visualizations

Time‐Series Lines: Median days_to_adoption by intake_year (2021–2025), faceted by species_group.

Violin Distribution: Yearly distribution of days_to_adoption, split by intake_condition

Weekly Plan of Attack

Task Name Status Due Priority Summary
Analysis of dataset and project decision Complete 2025-06-13 High Team found dataset, selected questions. Discussed approach and division of work
Data ingestion & cleaning Complete 2025-06-13 High Load data; inspect structure; derive and clean data
Q1 & Q2 plot exploration and decision Complete 2025-06-13 High Decide on plots and narrative
Draft narrative & interpretation Complete 2025-06-20 High Summarize results; write narrative for both questions
Finalize report & slides Complete 2025-06-30 High Refine visuals; polish text; build and rehearse presentation