Examining Childcare Costs and Financial Welfare in US Counties from 2008-2018

Project Proposal

Author

Camilla Hanson, Hanrui Huang, Laura Peng

library(tidyverse)

Dataset

childcare_costs <- read_csv("data/childcare_costs.csv")
counties <- read_csv("data/counties.csv")

Our data came from the National Database of Childcare Prices. The data ranges from years 2008 - 2018. The data was created by the ICF and the Women’s Bureau. It was designed to meet the requirements of the Paperwork Reduction Act. Between December 2019 and February 2020, the ICF reached out to each US state and DC for access to all available MRS reports from 1998-2018. 35 states provided MRS reports. The data was collected through state child care market price surveys(MRS) and county-level demographic and labor market data from the American Community Survey 5-year estimates. The childcare price information was created entirely from the market rate survey final reports that states produced. The market price surveys were usually collected from regulated childcare centers and regulated family childcare homes. Moreover, the ICF used MRS reports on state childcare agency websites.

We found the dataset on a collection of #TidyTuesday events for the year 2023.

https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-05-09/readme.md

The childcare_costs dataset has 61 variables and 34567 observations. The counties dataset has 4 variables and 3144 observations.

nrow(childcare_costs)
[1] 34567
nrow(counties)
[1] 3144
ncol(childcare_costs)
[1] 61
ncol(counties)
[1] 4

We chose this dataset because of the large amount of data available in it, as well as the intuitive nature by which we could plot variables over time, by county, or by state level. By using this data, we hope to offer key insights into the causes of childcare costs and the social impacts it has. 

Questions

  1. How does the relationship between median household income and childcare expenses differ across regions of the United States and between Family Childcare versus Center-Based Care?
  2. How has the unemployment rate and poverty rate varied in the US between 2008-2018?

Analysis Plan

We need to merge childcare_costs.csv and counties.csv which we will do using county_fips_code, a variable both datasets share. We also wanted to group the counties by state regions (Northeast, Southeast, Midwest, Southwest, and West). To answer our first question we plan to first set the x variable to median household income (mhi_2018). From there, we will stack two scatter plots together, one using mcsa, the median center-based cost of childcare for school-aged children, as the y-variable, and one using mfcssa, the median family childcare cost, as the y-variable on the right axis. To differentiate between these two variables, we will set the points as different colors. We will also facet by region, to show a total of 5 plots.

For our second question, we will start by taking the data and using group_by(study_year) to calculate summary statistics for the individual poverty rate (pr_p), the family poverty rate (pr_f) and the unemployment rate of people aged 16 or older (unr_16) for each year. The summary statistics we will want to calculate are the mean and standard deviation of each year for each of the 3 variables. For our visualization, we will stack geom_line() and geom_ribbon() plots of all 3 variables onto one plot, setting the variables as the y variable, and x as study_year. In the plot, the x-axis will be shown the year, and the y-axis will be the ‘rate’ or percentage, as unemployment rate and poverty rate are both calculated based on percent of the total population. The visualization will show 3 lines and a shaded around them, documenting the average individual poverty rate, family poverty rate, and unemployment rate, and their standard deviations over 2008-2018.