A question about a new article on police shootings

May 19, 2017 12 min read

A new article has just been published in the Journal of Criminal Justice, “The prevalence of fatal police shootings by U.S. police, 2015–2016: Patterns and answers from a new data set” by Shane, Lawton, and Swenson (2017) .

The authors make some good points that I mostly agree with, but I cannot make sense of their finding that the mean rate of black civilians killed by police (per 100,000) is lower than that of whites. The finding is reported in Table 3 and in the discussion of their descriptive statistics:

“The mean rate for Blacks was lower than for Whites, Hispanics (p < 0.000) and Native Americans (p < 0.002), but was not different for Asians and others” (3)

“The mean rate of fatalities for Whites (M =0.741) is significantly lower than for Hispanics (M = 0.897; p < 0.000) and Native Americans (M = 1.106; p < 0.005), but is higher than for Blacks (M=0.633; p < 0.000).” (5)

Their data come from the Washington Post and covers 2015-2016. They aggregate deaths by state and calculate the “mean fatality rate per 100,000 population (based on 2016 U.S. Census Bureau state populations).”

From everything I’ve read and the data I’ve examined, when normalized by race-specific population, the black rate of police shooting deaths is higher than that for whites, at the aggregate national level. You can look at this data yourself with an online R Shiny App I created here:

https://johnbradford.shinyapps.io/policeshooting/

So I decided to examine the Washington Post data again, calculating the fatality rate for each racial group for each state. The code and results are posted below.

I. Load Washington Post Data.

First, we have to load the necessary R packages and Washington Post data. I include at the beginning a function for returning state abbreviations from state names regardless of capitalization or punctuation. I also filter to exclude cases after 2016.

getState.abb <- function(x) {
    ## remove punctuation and twim white space, convert to lower case
    x <- tolower(trimws(gsub("[[:punct:]]", "", x)))
    x[which(x == "washington dc")] <- "district of columbia"
    
    states.abb <- c(state.abb, "DC", "DC", "PR")
    states.name <- c(state.name, "District of Columbia", "District of Columbia", 
        "Puerto Rico")
    
    m.abb <- tolower(states.abb)
    m.name <- tolower(gsub("[[:punct:]]", "", states.name))
    
    x[which(x %in% m.abb)] <- states.abb[match(x[which(x %in% m.abb)], m.abb)]
    x[which(x %in% m.name)] <- states.abb[match(x[which(x %in% m.name)], m.name)]
    
    return(x)
}


library(dplyr)
library(readr)

wapo <- read_csv("https://cdn.rawgit.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv") %>% 
    dplyr::transmute(Race = race, Race = replace(Race, which(Race == "W"), "White"), 
        Race = replace(Race, which(Race == "B"), "Black"), Race = replace(Race, 
            which(Race == "H"), "Hispanic"), Race = replace(Race, which(Race == 
            "A"), "Asian"), Race = replace(Race, which(Race == "N"), "Native American"), 
        Race = replace(Race, which(Race == "O"), "Other"), year = as.numeric(format(date, 
            "%Y")), State = getState.abb(state)) %>% ## Sum across years for state for each race
dplyr::filter(year < 2017) %>% dplyr::group_by(State) %>% dplyr::summarise(killed.White = length(which(Race == 
    "White")), killed.Black = length(which(Race == "Black")), killed.Hispanic = length(which(Race == 
    "Hispanic")), killed.Asian = length(which(Race == "Asian")), killed.NativeAmerican = length(which(Race == 
    "Native American")), killed.Total = n())

knitr::kable(wapo)

State	killed.White	killed.Black	killed.Hispanic	killed.Asian	killed.NativeAmerican	killed.Total
AK	5	0	0	0	5	11
AL	28	12	0	0	0	42
AR	13	3	0	0	0	20
AZ	47	3	29	0	8	92
CA	98	55	134	13	0	327
CO	28	7	17	2	1	60
CT	5	0	1	0	0	6
DC	0	9	0	0	0	9
DE	3	1	0	0	0	4
FL	54	42	16	1	0	120
GA	26	19	5	1	0	55
HI	0	1	0	1	0	8
IA	7	3	0	0	0	10
ID	10	0	3	0	0	13
IL	14	28	5	0	0	47
IN	19	10	2	0	0	33
KS	16	1	2	0	0	19
KY	25	6	2	0	0	34
LA	18	25	0	1	0	46
MA	9	7	5	0	0	21
MD	8	21	0	0	0	30
ME	3	0	0	0	0	4
MI	17	11	1	0	0	29
MN	17	5	1	1	0	26
MO	24	15	1	0	0	42
MS	11	5	0	0	0	16
MT	9	0	0	0	0	9
NC	32	21	1	1	0	56
ND	1	0	0	0	1	2
NE	12	3	0	0	0	15
NH	5	0	0	0	0	5
NJ	10	11	3	1	0	27
NM	15	0	24	0	1	41
NV	15	5	10	0	0	33
NY	16	17	1	1	0	36
OH	25	24	0	2	0	55
OK	35	13	4	0	0	58
OR	26	2	0	0	0	30
PA	17	18	1	0	0	40
RI	0	1	1	0	0	2
SC	23	13	0	0	0	36
SD	5	0	0	0	2	7
TN	28	7	2	1	0	42
TX	82	35	52	1	1	180
UT	17	0	0	0	0	18
VA	15	18	2	0	0	35
VT	2	0	0	0	1	3
WA	25	4	4	2	4	42
WI	19	7	2	0	0	28
WV	16	3	0	0	0	22
WY	5	0	1	0	1	8

II. Load Census Data

Next, we have to load the state-level population data by race. They use 2016 Census data, but we should get similar results from the 2015 Census Data, which would just as legitimate anyway since the Wasghinton Post data covers 2015 and 2016. The census data can be retrieved online:

State Characteristics Datasets: Annual State Resident Population Estimates for 5 Race Groups (5 Race Alone or in Combination Groups) by Age, Sex, and Hispanic Origin: April 1, 2010 to July 1, 2015

Using this data, you can also normalize by gender, age, and race and their cross-tabulations. Below I only examine race. The state names also have to be abbreviated to match the Washington Post data.

library(dplyr)
library(tidyr)
library(readr)

census_state <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2010-2015/state/asrh/sc-est2015-alldata5.csv") %>% 
    ## filtering to include both males and females
dplyr::filter(SEX == 0) %>% ## selecting only 2015 data, renaming the columns
dplyr::transmute(State = NAME, Hisp = ORIGIN, Race = RACE, AGE = AGE, population = POPESTIMATE2015) %>% 
    ## Creating Race categories - 'Hispanics' are Hispanic Whites.  But the
## results are roughly the same if you include hispanic blacks, asians, etc.
## since there are so few of them.  I'm also combining Asians and PAcific
## Islanders since they are not distinguished in the WaPo data.
dplyr::mutate(State = getState.abb(State), Race = replace(Race, which(Race == 
    1 & Hisp == 1), "White"), Race = replace(Race, which(Race == 1 & Hisp == 
    2), "Hispanic"), Race = replace(Race, which(Race == 2 & Hisp == 0), "Black"), 
    Race = replace(Race, which(Race == 3 & Hisp == 0), "Native American"), Race = replace(Race, 
        which(Race == 4 & Hisp == 0), "Asian"), Race = replace(Race, which(Race == 
        5 & Hisp == 0), "Asian")) %>% dplyr::select(-c(Hisp)) %>% dplyr::filter(Race %in% 
    c("White", "Hispanic", "Black", "Asian", "Native American")) %>% dplyr::group_by(State, 
    Race) %>% ## Summing across all ages each race within each state
dplyr::summarise(population = sum(population, na.rm = T)) %>% tidyr::spread(Race, 
    population) %>% ## Creating a 'Total' population category
dplyr::mutate(Total = White + Hispanic + Black + Asian + `Native American`)

knitr::kable(census_state)

State	Asian	Black	Hispanic	Native American	White	Total
AK	74220	40105	41640	143826	496799	796590
AL	92339	1338287	175866	66733	3268304	4941529
AR	69177	490231	197047	58338	2226051	3040844
AZ	331963	403715	1935766	429351	3930262	7031057
CA	6869486	2989322	14040493	1078986	15788764	40767051
CO	255130	304298	1068750	148150	3855642	5631970
CT	197283	466878	473242	40962	2500213	3678578
DC	37653	335822	55945	9141	254510	693071
DE	44550	228005	70669	12698	615841	971763
FL	766873	3642783	4622640	203379	11482439	20718114
GA	491234	3364359	820463	112516	5650933	10439505
HI	1173481	56565	101049	37862	524496	1893453
IA	92785	140006	164384	31348	2754083	3182606
ID	44902	22007	187076	46816	1395356	1696157
IL	810389	2009937	2025465	139840	8132912	13118543
IN	179317	712550	402399	62546	5400383	6757195
KS	109477	222885	309984	68432	2290079	3000857
KY	86311	413580	135678	36636	3837964	4510169
LA	110022	1559188	200763	63567	2817869	4751409
MA	512363	659668	618260	70214	5101413	6961918
MD	463421	1926573	478070	76800	3241665	6186529
ME	23323	26532	18654	19660	1264763	1352932
MI	366630	1530388	432381	153104	7688454	10170957
MN	313841	393395	252291	117564	4554134	5631225
MO	166654	783917	222123	82875	4969582	6225151
MS	43864	1146449	76423	30132	1734474	3031342
MT	16435	11400	30771	86007	917350	1061963
NC	358676	2348191	778108	229119	6560680	10274774
ND	15364	23131	22546	49847	662513	773401
NE	58998	113682	178335	39891	1547639	1938545
NH	43327	27973	40507	11838	1230066	1353711
NJ	956570	1439670	1521368	100180	5146692	9164480
NM	56475	69601	939053	246717	829130	2140976
NV	346130	311617	744673	77354	1544500	3024274
NY	1948015	3786724	2910344	318526	11356124	20319733
OH	310274	1634412	358475	101965	9476886	11882012
OK	121260	364882	342672	532016	2797325	4158155
OR	265197	120634	465792	136727	3203032	4191382
PA	513870	1653996	702016	107484	10088820	13066186
RI	48117	100579	120418	18738	799027	1086879
SC	107421	1404522	230290	53394	3193778	4989405
SD	16539	21594	24718	88266	727491	878608
TN	155792	1192750	300823	69939	5007437	6726741
TX	1545389	3672788	10192520	452905	12161654	28025256
UT	148489	58496	378560	67414	2421365	3074324
VA	664701	1776464	656862	104395	5443876	8646298
VT	13891	11993	10215	7822	594460	638381
WA	838055	398503	793680	244833	5258283	7533354
WI	192210	435145	345318	100614	4810050	5883337
WV	21885	84131	24881	14728	1730320	1875945
WY	10516	11757	53106	21746	501682	598807

III. Combine Data and Calculate Rates per 100,000 Population

wapoRates <- dplyr::left_join(wapo, census_state) %>% dplyr::group_by(State) %>% 
    dplyr::transmute(Total = 1e+05 * killed.Total/Total, White = 1e+05 * killed.White/White, 
        Black = 1e+05 * killed.Black/Black, Hispanic = 1e+05 * killed.Hispanic/Hispanic, 
        `Native American` = 1e+05 * killed.NativeAmerican/`Native American`, 
        Asian = 1e+05 * killed.Asian/Asian)

knitr::kable(wapoRates)

State	Total	White	Black	Hispanic	Native American	Asian
AK	1.3808860	1.0064432	0.0000000	0.0000000	3.4764229	0.0000000
AL	0.8499394	0.8567135	0.8966687	0.0000000	0.0000000	0.0000000
AR	0.6577121	0.5839938	0.6119564	0.0000000	0.0000000	0.0000000
AZ	1.3084804	1.1958490	0.7430985	1.4981150	1.8632774	0.0000000
CA	0.8021184	0.6206946	1.8398821	0.9543824	0.0000000	0.1892427
CO	1.0653466	0.7262085	2.3003766	1.5906433	0.6749916	0.7839141
CT	0.1631065	0.1999830	0.0000000	0.2113084	0.0000000	0.0000000
DC	1.2985683	0.0000000	2.6799912	0.0000000	0.0000000	0.0000000
DE	0.4116230	0.4871387	0.4385869	0.0000000	0.0000000	0.0000000
FL	0.5792033	0.4702834	1.1529646	0.3461226	0.0000000	0.1303997
GA	0.5268449	0.4601010	0.5647435	0.6094120	0.0000000	0.2035690
HI	0.4225085	0.0000000	1.7678777	0.0000000	0.0000000	0.0852165
IA	0.3142079	0.2541681	2.1427653	0.0000000	0.0000000	0.0000000
ID	0.7664385	0.7166630	0.0000000	1.6036263	0.0000000	0.0000000
IL	0.3582715	0.1721401	1.3930785	0.2468569	0.0000000	0.0000000
IN	0.4883683	0.3518269	1.4034103	0.4970191	0.0000000	0.0000000
KS	0.6331525	0.6986659	0.4486619	0.6451946	0.0000000	0.0000000
KY	0.7538520	0.6513870	1.4507471	1.4740783	0.0000000	0.0000000
LA	0.9681339	0.6387806	1.6033987	0.0000000	0.0000000	0.9089091
MA	0.3016410	0.1764217	1.0611398	0.8087212	0.0000000	0.0000000
MD	0.4849246	0.2467868	1.0900184	0.0000000	0.0000000	0.0000000
ME	0.2956542	0.2371986	0.0000000	0.0000000	0.0000000	0.0000000
MI	0.2851256	0.2211108	0.7187720	0.2312775	0.0000000	0.0000000
MN	0.4617113	0.3732872	1.2709872	0.3963677	0.0000000	0.3186327
MO	0.6746824	0.4829380	1.9134679	0.4502010	0.0000000	0.0000000
MS	0.5278190	0.6341980	0.4361293	0.0000000	0.0000000	0.0000000
MT	0.8474872	0.9810868	0.0000000	0.0000000	0.0000000	0.0000000
NC	0.5450242	0.4877543	0.8943054	0.1285169	0.0000000	0.2788032
ND	0.2585981	0.1509404	0.0000000	0.0000000	2.0061388	0.0000000
NE	0.7737762	0.7753746	2.6389402	0.0000000	0.0000000	0.0000000
NH	0.3693551	0.4064823	0.0000000	0.0000000	0.0000000	0.0000000
NJ	0.2946157	0.1942996	0.7640640	0.1971909	0.0000000	0.1045402
NM	1.9150145	1.8091252	0.0000000	2.5557663	0.4053227	0.0000000
NV	1.0911710	0.9711881	1.6045338	1.3428713	0.0000000	0.0000000
NY	0.1771677	0.1408931	0.4489369	0.0343602	0.0000000	0.0513343
OH	0.4628846	0.2637997	1.4684180	0.0000000	0.0000000	0.6445916
OK	1.3948494	1.2511953	3.5627956	1.1672970	0.0000000	0.0000000
OR	0.7157544	0.8117309	1.6579074	0.0000000	0.0000000	0.0000000
PA	0.3061337	0.1685034	1.0882735	0.1424469	0.0000000	0.0000000
RI	0.1840131	0.0000000	0.9942433	0.8304406	0.0000000	0.0000000
SC	0.7215289	0.7201502	0.9255818	0.0000000	0.0000000	0.0000000
SD	0.7967148	0.6872937	0.0000000	0.0000000	2.2658781	0.0000000
TN	0.6243737	0.5591683	0.5868791	0.6648428	0.0000000	0.6418815
TX	0.6422778	0.6742504	0.9529545	0.5101781	0.2207969	0.0647086
UT	0.5854946	0.7020833	0.0000000	0.0000000	0.0000000	0.0000000
VA	0.4047975	0.2755390	1.0132488	0.3044780	0.0000000	0.0000000
VT	0.4699388	0.3364398	0.0000000	0.0000000	12.7844541	0.0000000
WA	0.5575206	0.4754404	1.0037566	0.5039815	1.6337667	0.2386478
WI	0.4759204	0.3950063	1.6086592	0.5791763	0.0000000	0.0000000
WV	1.1727423	0.9246845	3.5658675	0.0000000	0.0000000	0.0000000
WY	1.3359897	0.9966473	0.0000000	1.8830264	4.5985469	0.0000000

IV. Calculate Means of State Rates for each Race

wapoRates_sum <- wapoRates %>% dplyr::ungroup() %>% dplyr::select(-State) %>% 
    dplyr::summarise_all(mean, na.rm = T) %>% tidyr::gather(Race, `Mean Killed Rate`, 
    1:6) %>% dplyr::arrange(desc(`Mean Killed Rate`))

knitr::kable(wapoRates_sum)

Race	Mean Killed Rate
Black	1.0334919
Total	0.6647738
Native American	0.5868548
White	0.5416090
Hispanic	0.4393706
Asian	0.0910665

As shown above, the Black Rate is higher. Next, I’ll plot the results. Note that Washington D.c. Results are not included. I’ve calculated this elsewhere and it doesn’t much change the results.

V. Graph the results.

library(ggplot2)
library(ggsci)
library(ggthemes)
library(ggrepel)

## First change to long form
wapoRates_L <- wapoRates %>% dplyr::group_by(State) %>% tidyr::gather(Race, 
    KilledR, 2:7) %>% dplyr::ungroup() %>% dplyr::filter(!is.na(Race)) %>% dplyr::mutate(Race = factor(Race, 
    levels = c("Total", "White", "Black", "Hispanic", "Asian", "Native American"))) %>% 
    dplyr::group_by(Race) %>% dplyr::mutate(KilledR_mean = mean(KilledR, na.rm = T))

ggplot(data = wapoRates_L, aes(x = Race, y = KilledR, color = Race, fill = Race, 
    shape = Race)) + 
geom_boxplot(alpha = 0.5, position = "dodge", color = "black") + geom_dotplot(data = subset(wapoRates_L, 
    KilledR > 0), alpha = 0.5, position = "dodge", color = "black", binaxis = "y", 
    stackdir = "centerwhole", dotsize = 0.75, binwidth = 0.1) + xlab(label = "") + 
    ylab(label = "") + geom_label(aes(x = Race, y = KilledR_mean, group = Race, 
    label = paste0(round(KilledR_mean, 2))), size = 3, fill = "lightgray", alpha = 0.1, 
    color = "black", position = position_dodge(width = 0.9)) + ggsci::scale_fill_d3(palette = "category10") + 
    ggsci::scale_color_d3(palette = "category10") + theme_gdocs() + theme(text = element_text(size = 12, 
    face = "bold"), legend.position = c(0.12, 0.9), legend.text = element_text(size = 12, 
    face = "bold"), legend.background = element_rect(fill = alpha("gray", 0.2)), 
    axis.title.y = element_text(size = 12), axis.text = element_text(size = 10)) + 
    guides(size = FALSE, stat = FALSE, fill = FALSE, shape = FALSE, color = FALSE) + 
    ggtitle(label = "Race of Civilians Killed by Police per 100,000 Residents", 
        subtitle = "State-Level Data from  Washington Post (2015-2016) & US Census (2015).\nSource: johnbradford.github.io") + 
    coord_cartesian(ylim = c(0, 5)) + ggrepel::geom_text_repel(data = subset(wapoRates_L, 
    KilledR > 2.5), aes(x = Race, y = KilledR, label = State), color = "black", 
    size = 3)

These results contravene those of Shane, Lawton, and Swenson (2017) and are well known. I thought that maybe they had calculated their mean fatality rate using total population rather than race-specific population as the denominator. This seems doubtful. Re-calculating using total population as the denominator, the White rate is higher than the black rate, but the results are still far off. My results are 0.37 white killings per 100,000 people (of all races) and 0.14 killings of blacks per 100,000 people (of all races), rather than 0.74 and 0.63, respectively. Moreover, I could not replicate the ‘mean fatality base rate’ of 0.897 for Hispanics.

Was this a mistake on their part, or did I miss something?

jhbradford

State	killed.White	killed.Black	killed.Hispanic	killed.Asian	killed.NativeAmerican	killed.Total
AK	5	0	0	0	5	11
AL	28	12	0	0	0	42
AR	13	3	0	0	0	20
AZ	47	3	29	0	8	92
CA	98	55	134	13	0	327
CO	28	7	17	2	1	60
CT	5	0	1	0	0	6
DC	0	9	0	0	0	9
DE	3	1	0	0	0	4
FL	54	42	16	1	0	120
GA	26	19	5	1	0	55
HI	0	1	0	1	0	8
IA	7	3	0	0	0	10
ID	10	0	3	0	0	13
IL	14	28	5	0	0	47
IN	19	10	2	0	0	33
KS	16	1	2	0	0	19
KY	25	6	2	0	0	34
LA	18	25	0	1	0	46
MA	9	7	5	0	0	21
MD	8	21	0	0	0	30
ME	3	0	0	0	0	4
MI	17	11	1	0	0	29
MN	17	5	1	1	0	26
MO	24	15	1	0	0	42
MS	11	5	0	0	0	16
MT	9	0	0	0	0	9
NC	32	21	1	1	0	56
ND	1	0	0	0	1	2
NE	12	3	0	0	0	15
NH	5	0	0	0	0	5
NJ	10	11	3	1	0	27
NM	15	0	24	0	1	41
NV	15	5	10	0	0	33
NY	16	17	1	1	0	36
OH	25	24	0	2	0	55
OK	35	13	4	0	0	58
OR	26	2	0	0	0	30
PA	17	18	1	0	0	40
RI	0	1	1	0	0	2
SC	23	13	0	0	0	36
SD	5	0	0	0	2	7
TN	28	7	2	1	0	42
TX	82	35	52	1	1	180
UT	17	0	0	0	0	18
VA	15	18	2	0	0	35
VT	2	0	0	0	1	3
WA	25	4	4	2	4	42
WI	19	7	2	0	0	28
WV	16	3	0	0	0	22
WY	5	0	1	0	1	8

State	killed.White	killed.Black	killed.Hispanic	killed.Asian	killed.NativeAmerican	killed.Total
AK	5	0	0	0	5	11
AL	28	12	0	0	0	42
AR	13	3	0	0	0	20
AZ	47	3	29	0	8	92
CA	98	55	134	13	0	327
CO	28	7	17	2	1	60
CT	5	0	1	0	0	6
DC	0	9	0	0	0	9
DE	3	1	0	0	0	4
FL	54	42	16	1	0	120
GA	26	19	5	1	0	55
HI	0	1	0	1	0	8
IA	7	3	0	0	0	10
ID	10	0	3	0	0	13
IL	14	28	5	0	0	47
IN	19	10	2	0	0	33
KS	16	1	2	0	0	19
KY	25	6	2	0	0	34
LA	18	25	0	1	0	46
MA	9	7	5	0	0	21
MD	8	21	0	0	0	30
ME	3	0	0	0	0	4
MI	17	11	1	0	0	29
MN	17	5	1	1	0	26
MO	24	15	1	0	0	42
MS	11	5	0	0	0	16
MT	9	0	0	0	0	9
NC	32	21	1	1	0	56
ND	1	0	0	0	1	2
NE	12	3	0	0	0	15
NH	5	0	0	0	0	5
NJ	10	11	3	1	0	27
NM	15	0	24	0	1	41
NV	15	5	10	0	0	33
NY	16	17	1	1	0	36
OH	25	24	0	2	0	55
OK	35	13	4	0	0	58
OR	26	2	0	0	0	30
PA	17	18	1	0	0	40
RI	0	1	1	0	0	2
SC	23	13	0	0	0	36
SD	5	0	0	0	2	7
TN	28	7	2	1	0	42
TX	82	35	52	1	1	180
UT	17	0	0	0	0	18
VA	15	18	2	0	0	35
VT	2	0	0	0	1	3
WA	25	4	4	2	4	42
WI	19	7	2	0	0	28
WV	16	3	0	0	0	22
WY	5	0	1	0	1	8