Part 1: Set Up Your R Markdown File (1 point):

Part 2: Retrieving Census Data (1 points):

pop2022 <- get_acs(
  geography="county",
  variables=c(median_income ="B19013_001",
              tot_pop= "B01003_001"),
  state="NY",
  year=2020,
  survey= "acs5",
  output="wide"
)

Part 3: Processing and Analyzing Data (4 points):

pop2022 <- pop2022 %>%
  mutate(MOE_Percentage = (median_incomeM/ median_incomeE) * 100,
         Unreliable_Estimates = if_else(MOE_Percentage > 10, "Yes", "No"))
pop2022 %>%
  arrange(desc(MOE_Percentage)) %>%
  select(GEOID,NAME, median_incomeE, median_incomeM, MOE_Percentage) %>%
  slice(1:5) %>%
  knitr::kable()
GEOID NAME median_incomeE median_incomeM MOE_Percentage
36041 Hamilton County, New York 60625 10526 17.362474
36097 Schuyler County, New York 53291 4447 8.344749
36039 Greene County, New York 56681 4389 7.743335
36023 Cortland County, New York 59194 3852 6.507416
36035 Fulton County, New York 51663 3298 6.383679

Discussion

  • write a short comment in your R Markdown - imagine you are the state’s data analyst, and you want to let others know about MOEs and their potential impact on analyses. Use your results to support your comment.

The margin of error (MOE) measures the uncertainty associated with the estimate. When the margin of error is bigger, the estimate becomes less reliable. In this analysis, we identified the top 5 counties with the highest MOE percentage for median household income. These counties have a higher level of uncertainty in their estimates, which may impact the reliability of the data. For instance, the median income MOE in Hamilton County, NY, is 10,226 dollars, which indicates the estimated median household income falls within a range of 60,625 ± 10,226—a relatively broad interval, with a 95% confidence level. It is crucial to consider the MOE when interpreting the data and making decisions based on it.

Part 4: Exploring Racial Demographics (4 points)

race2022 <- get_acs(
  geography="tract",
  variables=c(white = "B03002_003",
              black = "B03002_004",
              hispanic = "B03002_012",
              tot_pop = "B03002_001"),
  state="NY",
  year=2020,
  survey= "acs5",
  output="wide"
)
race2022 <- race2022 %>%
  mutate(countyFIPS= substr(GEOID, 1, 5))

race2022couty<-race2022 %>%
  group_by(countyFIPS)%>%
  summarize(whitetot=sum(whiteE),
            blacktot=sum(blackE),
            hispanictot=sum(hispanicE),
            tot_pop=sum(tot_popE))%>%
  mutate(whiteP=whitetot/tot_pop*100,
         blackP=blacktot/tot_pop*100,
         hispanicP=hispanictot/tot_pop*100)%>%
  select(countyFIPS, whiteP, blackP, hispanicP)%>%
  pivot_longer(cols = c(whiteP, blackP, hispanicP),
               names_to = "name",
               values_to = "percentage")
highest_hispanic <- race2022couty %>%
  filter(name == "hispanicP") %>%
  arrange(desc(percentage)) %>%
  slice(1)

fips_name <- read.csv("data/county_fips.csv")
highest_hispanic_name <- fips_name %>%
  filter(County.FIPS == highest_hispanic$countyFIPS) %>%
  select(County.Name,County.FIPS)
highest_hispanic_name$County.FIPS<- as.character(highest_hispanic_name$County.FIPS)

left_join(highest_hispanic_name, highest_hispanic, by = c("County.FIPS" = "countyFIPS"))%>%
  rename(Hispanic_percentage=percentage)%>%
  select(County.Name, Hispanic_percentage)%>%
  knitr::kable()
County.Name Hispanic_percentage
Bronx 56.043
race2022couty %>%
  group_by(name) %>%
  summarize(avg_percentage = mean(percentage)) %>%
  knitr::kable()
name avg_percentage
blackP 5.737881
hispanicP 8.047394
whiteP 80.315756
race2022_tract<-race2022%>%
  mutate(MOE_percentage= (whiteM/whiteE)*100)%>%
  mutate(Unreliable_Estimates = if_else(MOE_percentage > 10,"Yes", "No"))%>%
  select(NAME, whiteE, whiteM, MOE_percentage, Unreliable_Estimates)%>%
  filter(whiteE>0)%>%
  arrange(desc(MOE_percentage))

race2022_tract %>%
  slice(1:5)%>%
  knitr::kable()
NAME whiteE whiteM MOE_percentage Unreliable_Estimates
Census Tract 478.02, Queens County, New York 7 41 585.7143 Yes
Census Tract 254.01, Queens County, New York 1 5 500.0000 Yes
Census Tract 294, Queens County, New York 46 151 328.2609 Yes
Census Tract 121.02, Bronx County, New York 2 5 250.0000 Yes
Census Tract 213.02, Bronx County, New York 2 5 250.0000 Yes
race2022_tract %>%
  arrange(order(MOE_percentage)) %>%
  slice(1:5) %>%
  knitr::kable()
NAME whiteE whiteM MOE_percentage Unreliable_Estimates
Census Tract 615, Saratoga County, New York 3529 27 0.7650893 No
Census Tract 267, Oneida County, New York 5280 60 1.1363636 No
Census Tract 401.02, Cayuga County, New York 2986 34 1.1386470 No
Census Tract 119.03, Broome County, New York 2597 32 1.2321910 No
Census Tract 9703, Wyoming County, New York 3217 38 1.1812247 No

Discussion

The Margin of Error (MOE) measures the uncertainty associated with the estimate. Since some census tracts have a relatively small white population, the MOE percentage tends to be higher. For example, in Census Tract 284, Queens County, NY, the estimate of the white population is 1, but the MOE for this estimate is 5. In this way, the MOE percentage of this census tract is 500, which indicates the unreliable estimation of the white population in the census tracts. Conversely, the white population estimate in 267, Oneida County, NY, is 5280. The MOE of this estimate is 60, which is much higher than the Census tract 284. However, the MOE percentage of this census tract is 1.13 because of the larger white population. So, the MOE becomes even more critical when the estimate is small.

Planners should exercise caution when using racial data in small or sparsely populated tracts, where a high MOE percentage could mislead the conclusions about community racial composition. Instead of dropping the MOE entirely, the planner should always calculate the MOE percentage (Estimate/MOE*100), especially looking at subdivision population data at census tract or census block levels. When there is a high MOE, planners should consider using alternative sources to verify the data or aggregate the data across multiple census tracts to draw conclusions. This is important for policy implication and resource allocation, as unreliable estimates could misdirect funding and services and disproportionately affect minority communities.

Back to Main Page