Graphically Pawsitive Vibes

Proposal

This project will graphically depict trends, patterns, and analysis of Animal Shelter data derived from the TidyTuesday initiative. The dataset includes information about animals taken in and released from the shelter; there are variables such as intake type, outcome type, animal type, breed, age, and more. The primary goal of this project is to identify key factors that influence an animals outcome.

Author

Affiliation

Isle of Dogs

School of Information, University of Arizona

Dataset

longbeach <- tt_load("2025-03-04")
longbeach = longbeach$longbeach

This TidyTuesday dataset is a conglomeration of different species retrieved in Long Beach, CA. It contains 22 columns and 29,787 rows from 2021-2025, providing potential insights into rescued animals at the city’s shelter. The underlying data originate from the City of Long Beach Animal Care Services Open Data portal, which publishes daily intake and outcome records for the Long Beach Animal Shelter. The data includes variables such as intake type, outcome type, animal type, age, breed, and geographical jurisdiction. We chose this dataset because it offers an opportunity to explore animal shelters and outcomes. Meaningful and relevant analysis could have direct implications for animal welfare policies and shelter resource management.

Questions

How do intake conditions and animal types affect the animal’s outcome?
- Hypothesis: Animals with poor intake conditions (sick or injured) will have a higher likelihood of negative outcomes (euthanasia). Certain species, such as dogs, tend to have higher adoption rates compared to others.
Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?
- Hypothesis: Healthy animals will more commonly experience a shorter duration from intake to adoption.

Analysis plan

Variables

The following variables will be used in our analysis:

intake_condition: The condition of the animal at the time of intake
animal_type: The type of animal
outcome_type: The outcome of the animal
dob: Date of birth of the animal
days_to_outcome: Date the animal had an outcome outcome_date - income_date: Date the animal arrived
intake_year: Year of intake
species_group: A simplified grouping of animal_type into “Dog,” “Cat,” and “Other”
days_to_adoption: The number of days from intake to adoption

Cleaning

We go ahead and clean and mutate the data we know the team will need.

longbeach_clean <- longbeach |>
  
  # standardize intake_condition
  mutate(
    intake_condition = intake_condition |>
      str_squish()             # collapse multiple spaces
      |> str_replace_all("/", " ")  # replace slashes with spaces
      |> str_to_lower()             # convert to lowercase
  ) |>
  
  # create a simplified species_group
  mutate(
    species_group = case_when(
      animal_type == "dog" ~ "Dog",
      animal_type == "cat" ~ "Cat",
      TRUE ~ "Other"
    ) |> 
      factor(levels = c("Dog", "Cat", "Other"))
  ) |>
  
  # compute age_at_intake and flag unknowns
  mutate(
    age_at_intake = as.numeric(intake_date - dob) / 365,
    age_unknown   = is.na(age_at_intake)
  ) |>
  
  # compute days_to_adoption
  mutate(
    days_to_adoption = as.integer(outcome_date - intake_date)
  )

Other Species Breakdown

Dogs and cats account for the majority of shelter intakes, so we recode animal_type into a three-level factor—Dog, Cat, and Other. Because the “Other” category still comprises thousands of records, we include a descending horizontal bar chart of animal_type—filtered to species_group == "Other"—to show the most frequent non-dog/cat species. We order bars by count and label each directly. This helps readers understand which rabbits, birds, reptiles, and other species fall into the “Other” category before we proceed with Questions 1 and 2.

Data Diagnostics

Let’s take a look at the data to see any issues:

longbeach_clean |>
  diagnose() |>
  filter(variables %in% c("intake_condition",
                          "animal_type",
                          "outcome_type",
                          "dob",
                          "days_to_outcome",
                          "intake_year",
                          "species_group",
                          "days_to_adoption")) |>
  formattable()

variables	types	missing_count	missing_percent	unique_count	unique_rate
animal_type	character	0	0.0000000	10	0.0003357169
dob	Date	3591	12.0555947	5656	0.1898814919
intake_condition	character	0	0.0000000	17	0.0005707188
outcome_type	character	187	0.6277906	19	0.0006378622
species_group	factor	0	0.0000000	3	0.0001007151
days_to_adoption	integer	177	0.5942190	360	0.0120858092

Potential Issues

DOB (date of birth) has a seemingly significant number of missing entries. Upon inspection, this seems not to be a factor in our analysis because of how data will be processed. We will need to check this number and see if the effects are meaningful.

Project Goals

1. How do intake conditions and animal types affect the animal’s outcome?

We will evaluate how intake_condition and animal_type influence the likelihood of outcomes for animals in a shelter environment. We hope to identify disparities to inform shelter strategies. For clarity and consistency, animal_type has been re-coded into a species_group category to capture generalized trends across dogs, cats, and other species. For exploratory purposes, we’ll use all 19 of outcome_type’s unique values. Based on the outcome_type distribution, we’ve decided to mutate outcome_type into three categorical values of death, non-death, and adopted.

longbeach_clean |>
  mutate(outcome_category = case_when(
    outcome_type %in% c("adoption", "foster to adopt", 
                        "homefirst") ~ "adopted",
    outcome_type %in% c("euthanasia", "died", 
                        "disposal") ~ "death",
    outcome_type %in% c("rescue", "transfer", 
                        "return to owner", "shelter, neuter, return", 
                        "return to rescue", "transport", 
                        "community cat", "return to wild habitat", 
                        "foster", "trap, neuter, release", 
                        "missing", "NA", 
                        "duplicate") ~ "non-death",
    TRUE ~ "non-death"
  )) |>
  count(outcome_category)

# A tibble: 3 × 2
  outcome_category     n
  <chr>            <int>
1 adopted           6544
2 death             6285
3 non-death        16958

Independent Variables:

intake_condition
animal_type

Dependent Variables:

outcome_type

Proposed Visualizations

Proportion of Outcomes by Animal Type: Bar plot showing the proportion of outcome_types to each faceted species_group. This will highlight which outcomes dominate each species category.

Distribution of Outcomes Across Intake Condition: Bar plot showing distribution of intake_condition for a specific faceted outcome_type. This will identify which intake conditions are more likely to lead to a certain outcome.

2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?

We’ll mutate() a new variable days_to_adoption as the time difference between intake and outcome dates, then analyze its variation by intake_year and outcome_type across years. Because dogs and cats dominate the data, we collapse all other species into a three-level factor species_group (“Dog,” “Cat,” “Other”) for clearer comparisons.

Independent Variables:

animal_type
intake_condition
intake_year (derived from intake_date)
outcome_type

Dependent Variables:

days_to_adoption (computed by outcome_date - intake_date)

Proposed Visualizations

Time‐Series Lines: Median days_to_adoption by intake_year (2021–2025), faceted by species_group.

Violin Distribution: Yearly distribution of days_to_adoption, split by intake_condition

Weekly Plan of Attack

Task Name	Status	Due	Priority	Summary
Analysis of dataset and project decision	Complete	2025-06-13	High	Team found dataset, selected questions. Discussed approach and division of work
Data ingestion & cleaning	Complete	2025-06-13	High	Load data; inspect structure; derive and clean data
Q1 & Q2 plot exploration and decision	Complete	2025-06-13	High	Decide on plots and narrative
Draft narrative & interpretation	Complete	2025-06-20	High	Summarize results; write narrative for both questions
Finalize report & slides	Complete	2025-06-30	High	Refine visuals; polish text; build and rehearse presentation

--- title: "Graphically Pawsitive Vibes" subtitle: "Proposal" author: - name: "Isle of Dogs" affiliations: - name: "School of Information, University of Arizona" description: "This project will graphically depict trends, patterns, and analysis of Animal Shelter data derived from the TidyTuesday initiative. The dataset includes information about animals taken in and released from the shelter; there are variables such as intake type, outcome type, animal type, breed, age, and more. The primary goal of this project is to identify key factors that influence an animals outcome." format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: true editor: visual code-annotations: hover execute: warning: false --- ```{r} #| label: load-pkgs #| message: false #| echo: false if (!require("pacman")) install.packages("pacman") pacman::p_load(tidyverse, tidytuesdayR, formattable, dlookr) ``` ### Dataset ```{r} #| label: load-dataset longbeach <- tt_load("2025-03-04") longbeach = longbeach$longbeach ``` This TidyTuesday dataset is a conglomeration of different species retrieved in Long Beach, CA. It contains 22 columns and 29,787 rows from 2021-2025, providing potential insights into rescued animals at the city's shelter. The underlying data originate from the City of Long Beach Animal Care Services Open Data portal, which publishes daily intake and outcome records for the Long Beach Animal Shelter. The data includes variables such as `intake type`, `outcome type`, `animal type`, `age`, `breed`, and `geographical jurisdiction`. We chose this dataset because it offers an opportunity to explore animal shelters and outcomes. Meaningful and relevant analysis could have direct implications for animal welfare policies and shelter resource management. ### Questions 1. How do intake conditions and animal types affect the animal's outcome? - Hypothesis: Animals with poor intake conditions (sick or injured) will have a higher likelihood of negative outcomes (euthanasia). Certain species, such as dogs, tend to have higher adoption rates compared to others. 2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption? - Hypothesis: Healthy animals will more commonly experience a shorter duration from intake to adoption. ## Analysis plan ### Variables The following variables will be used in our analysis: - `intake_condition`: The condition of the animal at the time of intake - `animal_type`: The type of animal - `outcome_type`: The outcome of the animal - `dob`: Date of birth of the animal - `days_to_outcome`: Date the animal had an outcome `outcome_date` - `income_date`: Date the animal arrived - `intake_year`: Year of intake - `species_group`: A simplified grouping of `animal_type` into "Dog," "Cat," and "Other" - `days_to_adoption`: The number of days from intake to adoption ### Cleaning We go ahead and clean and mutate the data we know the team will need. ```{r} #| label: data-cleaning longbeach_clean <- longbeach |> # standardize intake_condition mutate( intake_condition = intake_condition |> str_squish() # collapse multiple spaces |> str_replace_all("/", " ") # replace slashes with spaces |> str_to_lower() # convert to lowercase ) |> # create a simplified species_group mutate( species_group = case_when( animal_type == "dog" ~ "Dog", animal_type == "cat" ~ "Cat", TRUE ~ "Other" ) |> factor(levels = c("Dog", "Cat", "Other")) ) |> # compute age_at_intake and flag unknowns mutate( age_at_intake = as.numeric(intake_date - dob) / 365, age_unknown = is.na(age_at_intake) ) |> # compute days_to_adoption mutate( days_to_adoption = as.integer(outcome_date - intake_date) ) ``` #### Other Species Breakdown Dogs and cats account for the majority of shelter intakes, so we recode `animal_type` into a three-level factor—Dog, Cat, and Other. Because the “Other” category still comprises thousands of records, we include a descending horizontal bar chart of `animal_type`—filtered to `species_group == "Other"`—to show the most frequent non-dog/cat species. We order bars by count and label each directly. This helps readers understand which rabbits, birds, reptiles, and other species fall into the “Other” category before we proceed with Questions 1 and 2. #### Data Diagnostics Let's take a look at the data to see any issues: ```{r} #| label: data-diagnostics longbeach_clean |> diagnose() |> filter(variables %in% c("intake_condition", "animal_type", "outcome_type", "dob", "days_to_outcome", "intake_year", "species_group", "days_to_adoption")) |> formattable() ``` ### Potential Issues DOB (date of birth) has a seemingly significant number of missing entries. Upon inspection, this seems not to be a factor in our analysis because of how data will be processed. We will need to check this number and see if the effects are meaningful. ## Project Goals ### 1. How do intake conditions and animal types affect the animal's outcome? We will evaluate how `intake_condition` and `animal_type` influence the likelihood of outcomes for animals in a shelter environment. We hope to identify disparities to inform shelter strategies. For clarity and consistency, `animal_type` has been re-coded into a `species_group` category to capture generalized trends across dogs, cats, and other species. For exploratory purposes, we'll use all 19 of `outcome_type`'s unique values. Based on the `outcome_type` distribution, we've decided to mutate `outcome_type` into three categorical values of `death`, `non-death`, and `adopted`. ```{r} #| label: outcome-type-distribution #| echo: false longbeach_clean |> count(outcome_type) |> ggplot(aes(x = reorder(outcome_type, n), y = n)) + geom_bar(stat = "identity", fill = "cornsilk4") + coord_flip() + labs( title = "Distribution of Outcome Types", x = "Outcome Type", y = "Count" ) + theme_minimal() ``` ```{r} #| label: outcome-category-classification longbeach_clean |> mutate(outcome_category = case_when( outcome_type %in% c("adoption", "foster to adopt", "homefirst") ~ "adopted", outcome_type %in% c("euthanasia", "died", "disposal") ~ "death", outcome_type %in% c("rescue", "transfer", "return to owner", "shelter, neuter, return", "return to rescue", "transport", "community cat", "return to wild habitat", "foster", "trap, neuter, release", "missing", "NA", "duplicate") ~ "non-death", TRUE ~ "non-death" )) |> count(outcome_category) ``` Independent Variables: - `intake_condition` - `animal_type` Dependent Variables: - `outcome_type` #### Proposed Visualizations Proportion of Outcomes by Animal Type: Bar plot showing the proportion of `outcome_types` to each faceted `species_group`. This will highlight which outcomes dominate each species category. Distribution of Outcomes Across Intake Condition: Bar plot showing distribution of `intake_condition` for a specific faceted `outcome_type`. This will identify which intake conditions are more likely to lead to a certain outcome. ### 2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption? We'll mutate() a new variable `days_to_adoption` as the time difference between intake and outcome dates, then analyze its variation by `intake_year` and `outcome_type` across years. Because dogs and cats dominate the data, we collapse all other species into a three-level factor `species_group` (“Dog,” “Cat,” “Other”) for clearer comparisons. ```{r} #| label: species-distribution #| echo: false longbeach_clean |> count(animal_type) |> ggplot(aes(x = reorder(animal_type, n), y = n)) + geom_bar(stat = "identity", fill = "cornsilk4") + coord_flip() + labs( title = "Distribution of Species", x = "Species Type", y = "Count" ) + theme_minimal() ``` Independent Variables: - `animal_type` - `intake_condition` - `intake_year` (derived from `intake_date`) - `outcome_type` Dependent Variables: - `days_to_adoption` (`computed by outcome_date` - `intake_date`) #### Proposed Visualizations Time‐Series Lines: Median `days_to_adoption` by `intake_year` (2021–2025), faceted by `species_group`. Violin Distribution: Yearly distribution of `days_to_adoption`, split by `intake_condition` ## Weekly Plan of Attack | Task Name | Status | Due | Priority | Summary | |---------------|---------------|---------------|---------------|---------------| | Analysis of dataset and project decision | Complete | 2025-06-13 | High | Team found dataset, selected questions. Discussed approach and division of work | | Data ingestion & cleaning | Complete | 2025-06-13 | High | Load data; inspect structure; derive and clean data | | Q1 & Q2 plot exploration and decision | Complete | 2025-06-13 | High | Decide on plots and narrative | | Draft narrative & interpretation | Complete | 2025-06-20 | High | Summarize results; write narrative for both questions | | Finalize report & slides | Complete | 2025-06-30 | High | Refine visuals; polish text; build and rehearse presentation |