Graphically Pawsitive Vibes
Proposal
Dataset
This TidyTuesday dataset is a conglomeration of different species retrieved in Long Beach, CA. It contains 22 columns and 29,787 rows from 2021-2025, providing potential insights into rescued animals at the city’s shelter. The underlying data originate from the City of Long Beach Animal Care Services Open Data portal, which publishes daily intake and outcome records for the Long Beach Animal Shelter. The data includes variables such as intake type
, outcome type
, animal type
, age
, breed
, and geographical jurisdiction
. We chose this dataset because it offers an opportunity to explore animal shelters and outcomes. Meaningful and relevant analysis could have direct implications for animal welfare policies and shelter resource management.
Questions
How do intake conditions and animal types affect the animal’s outcome?
- Hypothesis: Animals with poor intake conditions (sick or injured) will have a higher likelihood of negative outcomes (euthanasia). Certain species, such as dogs, tend to have higher adoption rates compared to others.
Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?
- Hypothesis: Healthy animals will more commonly experience a shorter duration from intake to adoption.
Analysis plan
Variables
The following variables will be used in our analysis:
intake_condition
: The condition of the animal at the time of intakeanimal_type
: The type of animaloutcome_type
: The outcome of the animaldob
: Date of birth of the animaldays_to_outcome
: Date the animal had an outcomeoutcome_date
-income_date
: Date the animal arrivedintake_year
: Year of intakespecies_group
: A simplified grouping ofanimal_type
into “Dog,” “Cat,” and “Other”days_to_adoption
: The number of days from intake to adoption
Cleaning
We go ahead and clean and mutate the data we know the team will need.
longbeach_clean <- longbeach |>
# standardize intake_condition
mutate(
intake_condition = intake_condition |>
str_squish() # collapse multiple spaces
|> str_replace_all("/", " ") # replace slashes with spaces
|> str_to_lower() # convert to lowercase
) |>
# create a simplified species_group
mutate(
species_group = case_when(
animal_type == "dog" ~ "Dog",
animal_type == "cat" ~ "Cat",
TRUE ~ "Other"
) |>
factor(levels = c("Dog", "Cat", "Other"))
) |>
# compute age_at_intake and flag unknowns
mutate(
age_at_intake = as.numeric(intake_date - dob) / 365,
age_unknown = is.na(age_at_intake)
) |>
# compute days_to_adoption
mutate(
days_to_adoption = as.integer(outcome_date - intake_date)
)
Other Species Breakdown
Dogs and cats account for the majority of shelter intakes, so we recode animal_type
into a three-level factor—Dog, Cat, and Other. Because the “Other” category still comprises thousands of records, we include a descending horizontal bar chart of animal_type
—filtered to species_group == "Other"
—to show the most frequent non-dog/cat species. We order bars by count and label each directly. This helps readers understand which rabbits, birds, reptiles, and other species fall into the “Other” category before we proceed with Questions 1 and 2.
Data Diagnostics
Let’s take a look at the data to see any issues:
longbeach_clean |>
diagnose() |>
filter(variables %in% c("intake_condition",
"animal_type",
"outcome_type",
"dob",
"days_to_outcome",
"intake_year",
"species_group",
"days_to_adoption")) |>
formattable()
variables | types | missing_count | missing_percent | unique_count | unique_rate |
---|---|---|---|---|---|
animal_type | character | 0 | 0.0000000 | 10 | 0.0003357169 |
dob | Date | 3591 | 12.0555947 | 5656 | 0.1898814919 |
intake_condition | character | 0 | 0.0000000 | 17 | 0.0005707188 |
outcome_type | character | 187 | 0.6277906 | 19 | 0.0006378622 |
species_group | factor | 0 | 0.0000000 | 3 | 0.0001007151 |
days_to_adoption | integer | 177 | 0.5942190 | 360 | 0.0120858092 |
Potential Issues
DOB (date of birth) has a seemingly significant number of missing entries. Upon inspection, this seems not to be a factor in our analysis because of how data will be processed. We will need to check this number and see if the effects are meaningful.
Project Goals
1. How do intake conditions and animal types affect the animal’s outcome?
We will evaluate how intake_condition
and animal_type
influence the likelihood of outcomes for animals in a shelter environment. We hope to identify disparities to inform shelter strategies. For clarity and consistency, animal_type
has been re-coded into a species_group
category to capture generalized trends across dogs, cats, and other species. For exploratory purposes, we’ll use all 19 of outcome_type
’s unique values. Based on the outcome_type
distribution, we’ve decided to mutate outcome_type
into three categorical values of death
, non-death
, and adopted
.
longbeach_clean |>
mutate(outcome_category = case_when(
outcome_type %in% c("adoption", "foster to adopt",
"homefirst") ~ "adopted",
outcome_type %in% c("euthanasia", "died",
"disposal") ~ "death",
outcome_type %in% c("rescue", "transfer",
"return to owner", "shelter, neuter, return",
"return to rescue", "transport",
"community cat", "return to wild habitat",
"foster", "trap, neuter, release",
"missing", "NA",
"duplicate") ~ "non-death",
TRUE ~ "non-death"
)) |>
count(outcome_category)
# A tibble: 3 × 2
outcome_category n
<chr> <int>
1 adopted 6544
2 death 6285
3 non-death 16958
Independent Variables:
intake_condition
animal_type
Dependent Variables:
outcome_type
Proposed Visualizations
Proportion of Outcomes by Animal Type: Bar plot showing the proportion of outcome_types
to each faceted species_group
. This will highlight which outcomes dominate each species category.
Distribution of Outcomes Across Intake Condition: Bar plot showing distribution of intake_condition
for a specific faceted outcome_type
. This will identify which intake conditions are more likely to lead to a certain outcome.
2. Over the years 2021–2025, how have different species and intake conditions influenced the duration from intake to adoption?
We’ll mutate() a new variable days_to_adoption
as the time difference between intake and outcome dates, then analyze its variation by intake_year
and outcome_type
across years. Because dogs and cats dominate the data, we collapse all other species into a three-level factor species_group
(“Dog,” “Cat,” “Other”) for clearer comparisons.
Independent Variables:
animal_type
intake_condition
intake_year
(derived fromintake_date
)outcome_type
Dependent Variables:
days_to_adoption
(computed by outcome_date
-intake_date
)
Proposed Visualizations
Time‐Series Lines: Median days_to_adoption
by intake_year
(2021–2025), faceted by species_group
.
Violin Distribution: Yearly distribution of days_to_adoption
, split by intake_condition
Weekly Plan of Attack
Task Name | Status | Due | Priority | Summary |
---|---|---|---|---|
Analysis of dataset and project decision | Complete | 2025-06-13 | High | Team found dataset, selected questions. Discussed approach and division of work |
Data ingestion & cleaning | Complete | 2025-06-13 | High | Load data; inspect structure; derive and clean data |
Q1 & Q2 plot exploration and decision | Complete | 2025-06-13 | High | Decide on plots and narrative |
Draft narrative & interpretation | Complete | 2025-06-20 | High | Summarize results; write narrative for both questions |
Finalize report & slides | Complete | 2025-06-30 | High | Refine visuals; polish text; build and rehearse presentation |