Part 3: Processing and Analyzing Data (4 points):
- Add a new column named
MOE_Percentage
that calculates
the percentage of the margin of error relative to the estimate for
median household income (MOE / Estimate * 100
).
- Identify counties where
MOE_Percentage
is greater than
10%. Flag these counties in a new column called
‘Unreliable_Estimates’
pop2022 <- pop2022 %>%
mutate(MOE_Percentage = (median_incomeM/ median_incomeE) * 100,
Unreliable_Estimates = if_else(MOE_Percentage > 10, "Yes", "No"))
- Using
knitr::kable()
, create a table showing the top 5
counties by MOE percentage along with their median household income,
margin of error, and MOE percentage.
pop2022 %>%
arrange(desc(MOE_Percentage)) %>%
select(GEOID,NAME, median_incomeE, median_incomeM, MOE_Percentage) %>%
slice(1:5) %>%
knitr::kable()
36041 |
Hamilton County, New York |
60625 |
10526 |
17.362474 |
36097 |
Schuyler County, New York |
53291 |
4447 |
8.344749 |
36039 |
Greene County, New York |
56681 |
4389 |
7.743335 |
36023 |
Cortland County, New York |
59194 |
3852 |
6.507416 |
36035 |
Fulton County, New York |
51663 |
3298 |
6.383679 |
Discussion
- write a short comment in your R Markdown - imagine you are the
state’s data analyst, and you want to let others know about MOEs and
their potential impact on analyses. Use your results to support your
comment.
The margin of error (MOE) measures the uncertainty associated with
the estimate. When the margin of error is bigger, the estimate becomes
less reliable. In this analysis, we identified the top 5 counties with
the highest MOE percentage for median household income. These counties
have a higher level of uncertainty in their estimates, which may impact
the reliability of the data. For instance, the median income MOE in
Hamilton County, NY, is 10,226 dollars, which indicates the estimated
median household income falls within a range of 60,625 ± 10,226—a
relatively broad interval, with a 95% confidence level. It is crucial to
consider the MOE when interpreting the data and making decisions based
on it.
Part 4: Exploring Racial Demographics (4 points)
- Retrieve 2020 5-year ACS data for the same state at the tract level.
Include the following variables:
- White alone (
B03002_003
).
- Black or African American alone (
B03002_004
).
- Hispanic or Latino (
B03002_012
).
- Total population (
B03002_001
).
race2022 <- get_acs(
geography="tract",
variables=c(white = "B03002_003",
black = "B03002_004",
hispanic = "B03002_012",
tot_pop = "B03002_001"),
state="NY",
year=2020,
survey= "acs5",
output="wide"
)
- Calculate the percentage of each racial group for each county.
Reshape the data to long format using
pivot_longer()
race2022 <- race2022 %>%
mutate(countyFIPS= substr(GEOID, 1, 5))
race2022couty<-race2022 %>%
group_by(countyFIPS)%>%
summarize(whitetot=sum(whiteE),
blacktot=sum(blackE),
hispanictot=sum(hispanicE),
tot_pop=sum(tot_popE))%>%
mutate(whiteP=whitetot/tot_pop*100,
blackP=blacktot/tot_pop*100,
hispanicP=hispanictot/tot_pop*100)%>%
select(countyFIPS, whiteP, blackP, hispanicP)%>%
pivot_longer(cols = c(whiteP, blackP, hispanicP),
names_to = "name",
values_to = "percentage")
- Identify the county with the highest percentage of Hispanic or
Latino population and display its name
highest_hispanic <- race2022couty %>%
filter(name == "hispanicP") %>%
arrange(desc(percentage)) %>%
slice(1)
fips_name <- read.csv("data/county_fips.csv")
highest_hispanic_name <- fips_name %>%
filter(County.FIPS == highest_hispanic$countyFIPS) %>%
select(County.Name,County.FIPS)
highest_hispanic_name$County.FIPS<- as.character(highest_hispanic_name$County.FIPS)
left_join(highest_hispanic_name, highest_hispanic, by = c("County.FIPS" = "countyFIPS"))%>%
rename(Hispanic_percentage=percentage)%>%
select(County.Name, Hispanic_percentage)%>%
knitr::kable()
- Use
group_by()
and summarize ()
to
calculate the average percentage of each racial group across all
counties in your state.
race2022couty %>%
group_by(name) %>%
summarize(avg_percentage = mean(percentage)) %>%
knitr::kable()
blackP |
5.737881 |
hispanicP |
8.047394 |
whiteP |
80.315756 |
- Finally, perform an analysis on the MOEs for the race/ethnicity
variables at the tract level. Calculate and flag high MOE tracts and
write a small commentary discussing the implications of your findings.
As the state’s data scientist, provide guidance on when and where other
planners might need to pay attention to MOEs or whether it is fine to
just drop that field from their analyses. Who might be impacted and in
which ways? (1 paragraph max). Support your discussion with your
data.
race2022_tract<-race2022%>%
mutate(MOE_percentage= (whiteM/whiteE)*100)%>%
mutate(Unreliable_Estimates = if_else(MOE_percentage > 10,"Yes", "No"))%>%
select(NAME, whiteE, whiteM, MOE_percentage, Unreliable_Estimates)%>%
filter(whiteE>0)%>%
arrange(desc(MOE_percentage))
race2022_tract %>%
slice(1:5)%>%
knitr::kable()
Census Tract 478.02, Queens County, New York |
7 |
41 |
585.7143 |
Yes |
Census Tract 254.01, Queens County, New York |
1 |
5 |
500.0000 |
Yes |
Census Tract 294, Queens County, New York |
46 |
151 |
328.2609 |
Yes |
Census Tract 121.02, Bronx County, New York |
2 |
5 |
250.0000 |
Yes |
Census Tract 213.02, Bronx County, New York |
2 |
5 |
250.0000 |
Yes |
race2022_tract %>%
arrange(order(MOE_percentage)) %>%
slice(1:5) %>%
knitr::kable()
Census Tract 615, Saratoga County, New York |
3529 |
27 |
0.7650893 |
No |
Census Tract 267, Oneida County, New York |
5280 |
60 |
1.1363636 |
No |
Census Tract 401.02, Cayuga County, New York |
2986 |
34 |
1.1386470 |
No |
Census Tract 119.03, Broome County, New York |
2597 |
32 |
1.2321910 |
No |
Census Tract 9703, Wyoming County, New York |
3217 |
38 |
1.1812247 |
No |
Discussion
The Margin of Error (MOE) measures the uncertainty associated with
the estimate. Since some census tracts have a relatively small white
population, the MOE percentage tends to be higher. For example, in
Census Tract 284, Queens County, NY, the estimate of the white
population is 1, but the MOE for this estimate is 5. In this way, the
MOE percentage of this census tract is 500, which indicates the
unreliable estimation of the white population in the census tracts.
Conversely, the white population estimate in 267, Oneida County, NY, is
5280. The MOE of this estimate is 60, which is much higher than the
Census tract 284. However, the MOE percentage of this census tract is
1.13 because of the larger white population. So, the MOE becomes even
more critical when the estimate is small.
Planners should exercise caution when using racial data in small or
sparsely populated tracts, where a high MOE percentage could mislead the
conclusions about community racial composition. Instead of dropping the
MOE entirely, the planner should always calculate the MOE percentage
(Estimate/MOE*100
), especially looking at subdivision
population data at census tract or census block levels. When there is a
high MOE, planners should consider using alternative sources to verify
the data or aggregate the data across multiple census tracts to draw
conclusions. This is important for policy implication and resource
allocation, as unreliable estimates could misdirect funding and services
and disproportionately affect minority communities.