Appendix A — Colorado-specific data

Author

Allison Bauman

In this file, we pull data specifically related to Colorado. Data is provided by the Colorado Department of Agriculture and the Colorado Department of Public Health and the Environment.

Some data were available publicly on the website while others were requested through a public records request, Colorado Open Records Act (CORA). The goal is to identify food systems infrastructure (e.g., manufacturing, processing, distribution) that touches Colorado grown and raised products.

The data provided by the state has business addresses. We use a Google API to get latitude and longitude for each business to include in the map. If you would like to do this on your own, you need to register to get a Google API key and add it in the code below.

A.1 State and county data

We use tidycensus to get state and county names by FIPS so they are uniform across all data sets.

In the tidycensus data, there is no data for FIPS 02010 Aleutian Islands Census Area, Alaska. This FIPS is found in the Census of Agriculture. We add this fips to our county data based on the Geographic Area Codes and Titles from the U.S. Bureau of Labor Statistics.

To crosswalk zip codes to fips codes we use yearly 1st quarter ZIP-COUNTY data from the HUD USPS zip code crosswalk files. If a ZIP is in multiple counties, the crosswalk matches each zip to the county with the largest ratio of all addresses in the ZIP. It is not a perfect one-to-one match, but the best we can do with the available data. The HUD data does not including zip codes that exclusively serve PO Boxes. We match the missing data to another data set provided by United States Zip Codes with the hopes of matching more zip codes to fips. This data set does not have the largest ratio of addresses, so we have to manually address any match with multiple fips for one zip.

To geocode data from addresses, we use ggmap and the Google Geocoding API.

library(tidyverse, quietly = TRUE)
library(janitor, quietly = TRUE)
library(readxl, quietly = TRUE)
library(sf, quietly = TRUE)
library(sp, quietly = TRUE)
library(ggmap, quietly = TRUE)

# Get county and state fips, state name, county name 
county <- tidycensus::fips_codes %>% 
  unite("fips", 
        c(state_code, county_code), 
        sep = "", remove = FALSE) %>% 
  rename(county_name = county) %>% 
  select(fips, county_name, state_name) 

# Import county spatial data frame using Tigris
county_sf <- tigris::counties(progress_bar = FALSE) %>% 
  clean_names()

# Get state data and add "00" US
state <- tidycensus::fips_codes %>% 
  select(state_code, state_name) %>% 
  rename(fips = state_code) %>% 
  distinct()

# Merge so we have county and state data in one data frame
county_state <- bind_rows(county, state)

# Geocode using ggmap and google API 
# Register your google API - add your api key here
register_google("")

# Define a function to geocode and return lat and long. This function will be used for each of the sections below where we geocode addresses
geocode_address <- function(address) {
  result <- geocode(address, output = "latlona", source = "google")
  tibble(lat = result$lat, lon = result$lon)
}

A.2 Business licences - Colorado Department of Agriculture data

We gather data from the Colorado Department of Agriculture (CDA) on all business licences under the following programs.

  • Commodity Warehouse Program: performs audits and inspections of all licensed commodity warehouses within Colorado.
  • Farm Products Dealers Program: designed to protect sellers of Colorado farm products from fraudulent dealers and handlers. Farm products include unprocessed products of the soil, livestock, milk, honey, and hay. It does not include poultry, nursery stock, timber products, livestock not sold within 90 days or commodities (grain and dry edible beans).
    • Due to the Livestock Confidentiality Act, Farm Products dealers that deal with livestock do not include addresses, only names. These businesses are available in the raw data (“Farm Products Livestock.xlsx”) but will not be displayed on the map.
  • Meat - Custom Processing program: inspects custom processing facilities that process domestic livestock and/or wild game animals for the animal’s owners. These facilities are exempt from the United States Department of Agriculture’s (USDA) official inspection. Facilities are in compliance with USDA BSE regulations. Meat processed by custom processors may not be sold to anyone and may only be consumed by the animal’s owners. This program also inspects custom processing facilities that process poultry. Poultry processing facilities are inspected for proper sanitation, record-keeping, and labeling. Poultry processed by licensed custom processors may be sold to individuals and to retail establishments.
  • Liscensed Seed Dealers: requires the seed label, the germination, and purity content of the seed is accurate. Download data from the Liscensed Seed Dealer Search
  • Aquaculture Permit: issues operating permits to private aquaculture facilities.
    • These businesses are available in the raw data (“Aqua Permitees.xlsx”) but will not be displayed on the map because there are no addresses.
# Import files that are in the same format
file_list <- fs::dir_ls(path = "data_raw/colorado/cda")

df1 <- file_list %>% 
  map(~read_xlsx(.)) %>%
  bind_rows(.id = "org_type") %>%
  clean_names() %>%
  mutate(
    org_type = str_remove_all(org_type, "data_raw/colorado/cda/|\\.xlsx"), 
    org_type = str_to_sentence(org_type))

# Make address for geocoding
df1 <- df1 %>% 
  mutate(
    org_name = str_to_title(business_name), 
    address = str_c(str_to_title(address1), 
                    str_to_title(city), 
                    state_code, sep = ", "), 
    address = str_c(address, zip_code, sep = " ")) %>% 
  select(org_name, org_type, address)

# Import remaining files with different file structures 
df2 <- read_xlsx("data_raw/colorado/cda_alt_format/Custom Exempt Poultry Processors.xlsx") %>% 
  clean_names()

# Make address for geocoding
df2 <- df2 %>% 
  mutate(
    org_type = "Custom Exempt Poultry Processors",
    org_name = str_to_title(business_name), 
    address = str_c(str_to_title(address), 
                    str_to_title(city), 
                    state, sep = ", "), 
    address = str_c(address, zip, sep = " ")) %>% 
  select(org_name, org_type, address)

# bind data 
df <- bind_rows(df1, df2)
rm(df1, df2)

# Filter out Farm Products Livestock and aquaculture- they are missing addresses
df <- df %>% 
  filter(!is.na(address))

# Add columns 
df <- df %>% 
  mutate(
    variable_name = tolower(str_replace_all(org_type, " ", "_")),
    category = "Processing & Distribution", 
    topic_area = case_when(
      org_type %in% c("Custom Exempt Meat", 
                      "Custom Exempt Poultry Processors") ~ "Meat and Poultry", 
      TRUE  ~ "Distribution"),  
    value = 1, 
    value_codes = NA)

# Geocode using ggmap and google API and function defined in the state and county data section (1 observation was not able to get a lat/long)
df <- df %>%
  mutate(geocoded = map(address, geocode_address)) %>%
  unnest(geocoded)

## Get FIPS codes from lat/long data using the spatial county data from the Tigris package, defined above as county_sf
# make data frame into a spatial data frame and keep original lat/long variables
df_sf <- df %>% 
  filter(!is.na(lat)) %>% 
  rename(long = lon) %>%
  st_as_sf(coords = c("long", "lat"),
           crs = st_crs(county_sf), 
           remove = FALSE)

# intersect our spatial point-level data with the tigris county spatial data frame 
intersected <- st_intersects(df_sf, county_sf)

# get the fips code for each entry
df_sf <- df_sf %>%
  mutate(
    intersection = as.integer(intersected), 
    fips = county_sf$geoid[intersection]) 

rm(intersected)

# Turn back into a regular data frame
df <- as_tibble(df_sf) %>%
  select(!c(geometry, intersection))

rm(df_sf)

# One county was not matched because it is located in Canada, obs. dropped
df <- df %>% 
  filter(!is.na(fips))

# Add county and state names 
df <- df %>% left_join(county)
 
# Define point data frame
df_CDA_point <- df %>%
  mutate(
    year = "2023", 
    org_address = address) %>%
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes, 
    lat, long, org_name, 
    org_type, org_address)

## Number of businesses per county
df_county <- df %>% 
  group_by(fips, state_name, county_name,
           category, topic_area,  
           variable_name) %>% 
  summarise(value = sum(value)) 

# Number of operations per state, add state fips code
df_state <- df_county %>% 
  group_by(variable_name, state_name, category, topic_area) %>% 
  summarise(value = sum(value)) %>% 
  left_join(state)

# Join data 
df_agg <- bind_rows(df_county, df_state)
rm(df_county, df_state)

# Define data frame 
df_CDA <- df_agg %>% 
  mutate(value_codes = NA, 
         year = "2023") %>% 
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes)

# Make meta data 
meta_CDA <- df_CDA_point %>% 
  group_by(category, topic_area, variable_name, org_type) %>% 
  count() %>% 
  select(-n)

# Add columns 
meta_CDA <- meta_CDA %>%
  mutate(
    years = "2023", 
    user_friendly_variable_name = org_type, 
    variable_definition = case_when(
      variable_name == "active_seed_dealers" ~ 
        "Seed dealers licensed in Colorado", 
      variable_name == "commodity_handlers" ~ 
        "Licensed commodity warehouses within Colorado", 
      variable_name == "farm_products" ~ 
        "Farm products dealers licensed in Colorado (not including livestock dealers", 
       variable_name == "custom_exempt_meat" ~ 
        "Inspected custom processing facilities that process domestic livestock and/or wild game animals for the animal's owners; facilities are exempt from the United States Department of Agriculture's (USDA) official inspection", 
      variable_name == "custom_exempt_poultry_processors" ~ 
        "Inspected custom processing facilities that process poultry animals for the animal's owners; facilities are exempt from the United States Department of Agriculture's (USDA) official inspection"), 
    periodicity = "continuous", 
    aggregation = "count", 
    format = "integer", 
    keywords = "CDA|Colorado Department of Agriculture",
    hashtags = "#CDA|#License", 
    chart_type1 = "BarChart", 
    chart_type2 = NA, 
    chart_axis_x1 = user_friendly_variable_name,
    chart_axis_x2 = NA, 
    chart_axis_y1 = NA, 
    chart_axis_y2 = NA, 
    source = "Colorado Department of Agriculture",
    url = case_when(
      variable_name == "active_seed_dealers" ~ 
        "https://www.ag.state.co.us/eLicense/Licenses/External/LicensedSeedDealerSearch.aspx", 
      variable_name == "commodity_handlers" ~ 
        "https://ag.colorado.gov/ics/commodity-handler-program",
      variable_name == "farm_products" ~ 
        "https://ag.colorado.gov/ics/farm-products/farm-products-dealer-program", 
      variable_name == "custom_exempt_meat" ~ 
        "https://ag.colorado.gov/ics/meat-custom-processing", 
      variable_name == "custom_exempt_poultry_processors" ~ 
       "https://ag.colorado.gov/ics/meat-custom-processing"),
    citation = case_when(
      variable_name == "active_seed_dealers" ~ 
        "Colorado Department of Agriculture, Division of Plant Industry, Seed", 
      variable_name == "commodity_handlers" ~ 
        "Colorado Department of Agriculture, Commodity Warehouse Program", 
      variable_name == "farm_products" ~ 
        "Colorado Department of Agriculture, Farm Products Dealers Program", 
      variable_name == "custom_exempt_meat" ~
      "Colorado Department of Agriculture, Meat - Custom Processing program", 
      variable_name == "custom_exempt_poultry_processors" ~ 
        "Colorado Department of Agriculture, Meat - Custom Processing program"))

rm(df, df_agg)
#save(df_CDA, file = "data_processed/df_CDA.RData")
#save(df_CDA_point, file = "data_processed/df_CDA_point.RData")

A.3 Dairy - Colorado Department of Public Health and the Environment data

We provide data on operations with permits from Colorado Department of Public Health and the Environment (CDPHE) for: - Milk Program: inspects, regulates and samples all dairy products and enforce Colorado production and transportation rules. We do not include data on Dairy Haulers as there is no location data available. Data includes - Grade A Dairy farms - Grade A Dairy plants - Food Manufacturing and Storage: Food manufacturing, warehousing, and wholesaling in Colorado is regulated by CDPHE, the FDA, and the USDA. Operators who manufacture foods and dietary supplements with less than 2% cooked meat must be registered with CDPHE and if ingredients or finished products have interstate commerce a registration with the FDA is also required.

# Import dairy data 
sheets <- c("Grade A Farms", "Grade A Plants")

df1 <- sheets %>% 
  map_dfr(~read_xlsx("data_raw/colorado/cdphe/CDPHE Dairy Data.xlsx", 
                     sheet = .)) %>% 
  clean_names()

# Change column names to match our data format 
df1 <- df1 %>% 
  mutate(
    org_type = firm_type, 
    org_name = firm_name, 
    address = str_c(site_address_1, str_to_title(site_city), site_state, sep = ", "), 
    address = str_c(address, site_zipcode, sep = " ")) %>% 
  select(org_name, org_type, address)

# Import food manufacturing data
df2 <- read_xlsx("data_raw/colorado/cdphe/Food Manufacturing and Storage.xlsx") %>% 
  clean_names() %>% 
  mutate(
    org_type = "Food Manufacturing and Storage", 
    org_name = establishment_name) %>% 
  select(org_name, org_type, address)

# Bind data 
df <- bind_rows(df1, df2)
rm(df1, df2)

# Geocode using ggmap and google API and function defined in the state and county data section 
df <- df %>%
  mutate(geocoded = map(address, geocode_address)) %>%
  unnest(geocoded)

## Get FIPS codes from lat/long data using the spatial county data from the Tigris package, defined above as county_sf
# make data frame into a spatial data frame and keep original lat/long variables
df_sf <- df %>% 
  filter(!is.na(lat)) %>% 
  rename(long = lon) %>%
  st_as_sf(coords = c("long", "lat"),
           crs = st_crs(county_sf), 
           remove = FALSE)

# intersect our spatial point-level data with the tigris county spatial data frame 
intersected <- st_intersects(df_sf, county_sf)

# get the fips code for each entry
df_sf <- df_sf %>%
  mutate(
    intersection = as.integer(intersected), 
    fips = county_sf$geoid[intersection]) 

rm(intersected)

# Turn back into a regular data frame
df <- as_tibble(df_sf) %>%
  select(!c(geometry, intersection))
rm(df_sf)

# Add county and state names 
df <- df %>% left_join(county)
 
# Define point data frame
df_CDPHE_point <- df %>%
  mutate(
    year = "2023", 
    org_address = address, 
    category = "Processing & Distribution", 
    topic_area = "Dairy", 
    variable_name = tolower(str_replace_all(org_type, " ", "_")), 
    value = 1, 
    value_codes = NA) %>%
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes, 
    lat, long, org_name, 
    org_type, org_address)

## Number of businesses per county
df_county <- df_CDPHE_point %>% 
  group_by(fips, state_name, county_name,
           category, topic_area,  
           variable_name) %>% 
  summarise(value = sum(value)) 

# Number of operations per state, add state fips code
df_state <- df_county %>% 
  group_by(variable_name, state_name, category, topic_area) %>% 
  summarise(value = sum(value)) %>% 
  left_join(state)

# Join data 
df_agg <- bind_rows(df_county, df_state)
rm(df_county, df_state)

# Define data frame 
df_CDPHE <- df_agg %>% 
  mutate(value_codes = NA, 
         year = "2023") %>% 
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes)

# Make meta data 
meta_CDPHE <-  df_CDPHE_point %>% 
  group_by(category, topic_area, variable_name, org_type) %>% 
  count() %>% 
  select(-n)

# Add columns 
meta_CDPHE <- meta_CDPHE %>%
  mutate(
    years = "2023", 
    user_friendly_variable_name = org_type, 
    variable_definition = case_when(
      variable_name == "dairy_farm" ~ 
        "Grade A Dairy farms", 
      variable_name == "dairy_plant" ~ 
        "Grade A Dairy plants", 
      variable_name == "food_manufacturing_and_storage" ~ 
        "Food manufacturing, warehousing, and wholesaling businesses licensed in Colorado"), 
    periodicity = "continuous", 
    aggregation = "count", 
    format = "integer", 
    keywords = "CDPHE|Colorado Department of Public Health and the Environment",
    hashtags = "#CDPHE|#License|#Dairy", 
    chart_type1 = "BarChart", 
    chart_type2 = NA, 
    chart_axis_x1 = user_friendly_variable_name,
    chart_axis_x2 = NA, 
    chart_axis_y1 = NA, 
    chart_axis_y2 = NA, 
    source = "Colorado Department of Public Health and the Environment",
    url = case_when(
      str_detect(variable_name, "dairy") ~ "https://cdphe.colorado.gov/milk-program",
      variable_name == "food_manufacturing_and_storage" ~ 
        "https://cdphe.colorado.gov/food-manufacturing-and-storage"), 
    citation = case_when(
      str_detect(variable_name, "dairy") ~"Colorado Department of Public Health and the Environment, Milk Program", 
      variable_name == "food_manufacturing_and_storage" ~ 
        "Colorado Department of Public Health and the Environment, Food Manufacturing and Storage"))

#save(df_CDPHE, file = "data_processed/df_CDPHE.RData")
#save(df_CDPHE_point, file = "data_processed/df_CDPHE_point.RData")

A.4 Farm fresh directory - Colorado Department of Agriculture

Data on farms and farmers markets in Colorado is provided through the Farm Fresh Director

Data provided had location addresses and names, our team added a market type identifier (i.e., farmers market/farm stand, farm/ranch, winery/vineyard). Of the 134 records we received, 42 did not have addresses are are not included in this map. Data are only included as point data and not aggregated at the county- or state-level as they are not representative of all operations in the state.

# Import data 
df <- read_xlsx("data_raw/colorado/CDA_Farm Fresh Data 2024_processed.xlsx") %>% 
  clean_names()

# Drop obs. without address 
df <- df %>% 
  filter(!is.na(address))

# Change column names to match our data format 
df <- df %>% 
  mutate(
    org_type = category, 
    org_name = farm_name, 
    address = str_c(address, city, state, sep = ", "), 
    address = str_c(address, zip, sep = " ")) %>% 
  select(org_name, org_type, address)

# Geocode using ggmap and google API and function defined in the state and county data section 
df <- df %>%
  mutate(geocoded = map(address, geocode_address)) %>%
  unnest(geocoded)

## Get FIPS codes from lat/long data using the spatial county data from the Tigris package, defined above as county_sf
# make data frame into a spatial data frame and keep original lat/long variables
df_sf <- df %>% 
  filter(!is.na(lat)) %>% 
  rename(long = lon) %>%
  st_as_sf(coords = c("long", "lat"),
           crs = st_crs(county_sf), 
           remove = FALSE)

# intersect our spatial point-level data with the tigris county spatial data frame 
intersected <- st_intersects(df_sf, county_sf)

# get the fips code for each entry
df_sf <- df_sf %>%
  mutate(
    intersection = as.integer(intersected), 
    fips = county_sf$geoid[intersection]) 

rm(intersected)

# Turn back into a regular data frame
df <- as_tibble(df_sf) %>%
  select(!c(geometry, intersection))
rm(df_sf)

# Add county and state names 
df <- df %>% left_join(county)
 
# Define point data frame
df_farmfresh_point <- df %>%
  mutate(
    year = "2023", 
    org_address = address, 
    category = "Food Retail", 
    topic_area = org_type, 
    variable_name = tolower(str_replace_all(org_type, "/", "_")), 
    value = 1, 
    value_codes = NA) %>%
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes, 
    lat, long, org_name, 
    org_type, org_address)


# Make meta data 
meta_farmfresh <-  df_farmfresh_point %>% 
  group_by(category, topic_area, variable_name, org_type) %>% 
  count() %>% 
  select(-n)

# Add columns 
meta_farmfresh <- df_farmfresh_point %>% 
  select(category, topic_area, variable_name) %>% 
  distinct() %>% 
  mutate(
    years = "2023", 
    user_friendly_variable_name = topic_area, 
    variable_definition = "Location listed in the Colorado Department of Agriculture Farm Fresh Directory",
    periodicity = "yearly", 
    aggregation = "point", 
    format = "point", 
    keywords = "CDA|Colorado Department of Agriculture",
    hashtags = "#CDA|#FarmFreshDirectory", 
    chart_type1 = NA, 
    chart_type2 = NA, 
    chart_axis_x1 = NA,
    chart_axis_x2 = NA, 
    chart_axis_y1 = NA, 
    chart_axis_y2 = NA, 
    source = "Colorado Department of Agriculture",
    url = "https://coloradoproud.com/resources/farm-fresh-directory/",
    citation = "Colorado Department of Agriculture, Farm Fresh Directory")

#save(df_farmfresh_point, file = "data_processed/df_farmfresh_point.RData")

A.5 Combine all data and write to file

# Get metadata file for all data
meta_colorado <- bind_rows(meta_CDA, meta_CDPHE, 
                           meta_farmfresh) %>%
  mutate(
    years = as.character(years)) %>%
  ungroup() %>%
  mutate(
    `2 pager title` = "Business Development & Infrastructure",
    last_update_date = "4/19/24") %>%
  select(`2 pager title`, category, topic_area, variable_name,
         user_friendly_variable_name, variable_definition, 
         years, periodicity, aggregation, format, 
         keywords, hashtags, 
         chart_type1, chart_type2, 
         chart_axis_x1, chart_axis_x2, chart_axis_y1, 
         chart_axis_y2, source, url, citation, last_update_date)

# Get df for all point-level 
colorado_point <- bind_rows(df_CDA_point, df_CDPHE_point, df_farmfresh_point)

# Put in correct order 
colorado_point <- colorado_point %>%
  mutate(colorado_point = as.character(year)) %>%
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes, 
    lat, long, org_name, 
    org_type, org_address)

# Get df for all aggregated data 
df_colorado <- bind_rows(df_CDA, df_CDPHE, df_demo) %>% 
  ungroup()

# Drop rows with no data and put columns in correct order 
df_colorado <- df_colorado %>%
  filter(!(is.na(value) & is.na(value_codes))) %>% 
  mutate(year = as.character(year)) %>%
  select(
    fips, county_name, state_name, category, 
    topic_area, year, variable_name, value, value_codes)

# write to file 
write_csv(meta_colorado, "data_final/meta_colorado.csv")
write_csv(df_colorado, "data_final/df_colorado.csv")
write_csv(colorado_point, 
          "data_final/df_colorado_point.csv")