Part 1: Set Up Your R Markdown File (1 point):

Create a new R Markdown filed called “Assignment 1: Census Data Analysis”
Include your name and date in the YAML header.
Set the output format to ‘html_document’ with a theme of your choice.

Part 2: Retrieving Census Data (1 points):

Retrieve 2020 ACS 5-year data for any state of your choosing at the county level. Include the following variables:
- Median household income (B19013_001)
- Total Population (B01003_001)
Add a chunk option to suppress messages and warnings.

pop2022 <- get_acs(
  geography="county",
  variables=c(median_income ="B19013_001",
              tot_pop= "B01003_001"),
  state="NY",
  year=2020,
  survey= "acs5",
  output="wide"
)

Part 3: Processing and Analyzing Data (4 points):

Add a new column named MOE_Percentage that calculates the percentage of the margin of error relative to the estimate for median household income (MOE / Estimate * 100).
Identify counties where MOE_Percentage is greater than 10%. Flag these counties in a new column called ‘Unreliable_Estimates’

pop2022 <- pop2022 %>%
  mutate(MOE_Percentage = (median_incomeM/ median_incomeE) * 100,
         Unreliable_Estimates = if_else(MOE_Percentage > 10, "Yes", "No"))

Using knitr::kable(), create a table showing the top 5 counties by MOE percentage along with their median household income, margin of error, and MOE percentage.

pop2022 %>%
  arrange(desc(MOE_Percentage)) %>%
  select(GEOID,NAME, median_incomeE, median_incomeM, MOE_Percentage) %>%
  slice(1:5) %>%
  knitr::kable()

GEOID	NAME	median_incomeE	median_incomeM	MOE_Percentage
36041	Hamilton County, New York	60625	10526	17.362474
36097	Schuyler County, New York	53291	4447	8.344749
36039	Greene County, New York	56681	4389	7.743335
36023	Cortland County, New York	59194	3852	6.507416
36035	Fulton County, New York	51663	3298	6.383679

Discussion

write a short comment in your R Markdown - imagine you are the state’s data analyst, and you want to let others know about MOEs and their potential impact on analyses. Use your results to support your comment.

The margin of error (MOE) measures the uncertainty associated with the estimate. When the margin of error is bigger, the estimate becomes less reliable. In this analysis, we identified the top 5 counties with the highest MOE percentage for median household income. These counties have a higher level of uncertainty in their estimates, which may impact the reliability of the data. For instance, the median income MOE in Hamilton County, NY, is 10,226 dollars, which indicates the estimated median household income falls within a range of 60,625 ± 10,226—a relatively broad interval, with a 95% confidence level. It is crucial to consider the MOE when interpreting the data and making decisions based on it.

Part 4: Exploring Racial Demographics (4 points)

Retrieve 2020 5-year ACS data for the same state at the tract level. Include the following variables:
- White alone (B03002_003).
- Black or African American alone (B03002_004).
- Hispanic or Latino (B03002_012).
- Total population (B03002_001).

race2022 <- get_acs(
  geography="tract",
  variables=c(white = "B03002_003",
              black = "B03002_004",
              hispanic = "B03002_012",
              tot_pop = "B03002_001"),
  state="NY",
  year=2020,
  survey= "acs5",
  output="wide"
)

Calculate the percentage of each racial group for each county. Reshape the data to long format using pivot_longer()

race2022 <- race2022 %>%
  mutate(countyFIPS= substr(GEOID, 1, 5))

race2022couty<-race2022 %>%
  group_by(countyFIPS)%>%
  summarize(whitetot=sum(whiteE),
            blacktot=sum(blackE),
            hispanictot=sum(hispanicE),
            tot_pop=sum(tot_popE))%>%
  mutate(whiteP=whitetot/tot_pop*100,
         blackP=blacktot/tot_pop*100,
         hispanicP=hispanictot/tot_pop*100)%>%
  select(countyFIPS, whiteP, blackP, hispanicP)%>%
  pivot_longer(cols = c(whiteP, blackP, hispanicP),
               names_to = "name",
               values_to = "percentage")

Identify the county with the highest percentage of Hispanic or Latino population and display its name

highest_hispanic <- race2022couty %>%
  filter(name == "hispanicP") %>%
  arrange(desc(percentage)) %>%
  slice(1)

fips_name <- read.csv("data/county_fips.csv")
highest_hispanic_name <- fips_name %>%
  filter(County.FIPS == highest_hispanic$countyFIPS) %>%
  select(County.Name,County.FIPS)
highest_hispanic_name$County.FIPS<- as.character(highest_hispanic_name$County.FIPS)

left_join(highest_hispanic_name, highest_hispanic, by = c("County.FIPS" = "countyFIPS"))%>%
  rename(Hispanic_percentage=percentage)%>%
  select(County.Name, Hispanic_percentage)%>%
  knitr::kable()

County.Name	Hispanic_percentage
Bronx	56.043

Use group_by() and summarize () to calculate the average percentage of each racial group across all counties in your state.

race2022couty %>%
  group_by(name) %>%
  summarize(avg_percentage = mean(percentage)) %>%
  knitr::kable()

name	avg_percentage
blackP	5.737881
hispanicP	8.047394
whiteP	80.315756

Finally, perform an analysis on the MOEs for the race/ethnicity variables at the tract level. Calculate and flag high MOE tracts and write a small commentary discussing the implications of your findings. As the state’s data scientist, provide guidance on when and where other planners might need to pay attention to MOEs or whether it is fine to just drop that field from their analyses. Who might be impacted and in which ways? (1 paragraph max). Support your discussion with your data.

race2022_tract<-race2022%>%
  mutate(MOE_percentage= (whiteM/whiteE)*100)%>%
  mutate(Unreliable_Estimates = if_else(MOE_percentage > 10,"Yes", "No"))%>%
  select(NAME, whiteE, whiteM, MOE_percentage, Unreliable_Estimates)%>%
  filter(whiteE>0)%>%
  arrange(desc(MOE_percentage))

race2022_tract %>%
  slice(1:5)%>%
  knitr::kable()

NAME	whiteE	whiteM	MOE_percentage	Unreliable_Estimates
Census Tract 478.02, Queens County, New York	7	41	585.7143	Yes
Census Tract 254.01, Queens County, New York	1	5	500.0000	Yes
Census Tract 294, Queens County, New York	46	151	328.2609	Yes
Census Tract 121.02, Bronx County, New York	2	5	250.0000	Yes
Census Tract 213.02, Bronx County, New York	2	5	250.0000	Yes

race2022_tract %>%
  arrange(order(MOE_percentage)) %>%
  slice(1:5) %>%
  knitr::kable()

NAME	whiteE	whiteM	MOE_percentage	Unreliable_Estimates
Census Tract 615, Saratoga County, New York	3529	27	0.7650893	No
Census Tract 267, Oneida County, New York	5280	60	1.1363636	No
Census Tract 401.02, Cayuga County, New York	2986	34	1.1386470	No
Census Tract 119.03, Broome County, New York	2597	32	1.2321910	No
Census Tract 9703, Wyoming County, New York	3217	38	1.1812247	No

Discussion

The Margin of Error (MOE) measures the uncertainty associated with the estimate. Since some census tracts have a relatively small white population, the MOE percentage tends to be higher. For example, in Census Tract 284, Queens County, NY, the estimate of the white population is 1, but the MOE for this estimate is 5. In this way, the MOE percentage of this census tract is 500, which indicates the unreliable estimation of the white population in the census tracts. Conversely, the white population estimate in 267, Oneida County, NY, is 5280. The MOE of this estimate is 60, which is much higher than the Census tract 284. However, the MOE percentage of this census tract is 1.13 because of the larger white population. So, the MOE becomes even more critical when the estimate is small.

Planners should exercise caution when using racial data in small or sparsely populated tracts, where a high MOE percentage could mislead the conclusions about community racial composition. Instead of dropping the MOE entirely, the planner should always calculate the MOE percentage (Estimate/MOE*100), especially looking at subdivision population data at census tract or census block levels. When there is a high MOE, planners should consider using alternative sources to verify the data or aggregate the data across multiple census tracts to draw conclusions. This is important for policy implication and resource allocation, as unreliable estimates could misdirect funding and services and disproportionately affect minority communities.

MUSA 5080 Assignment 1: Census Data Analysis With R Markdown

Zhanchao Yang

2025-03-22

Part 1: Set Up Your R Markdown File (1 point):

Part 2: Retrieving Census Data (1 points):

Part 3: Processing and Analyzing Data (4 points):

Discussion

Part 4: Exploring Racial Demographics (4 points)

Discussion

Back to Main Page