Dataset of food flow networks
We derive annual, intranational food flow networks for the USA using the Freight Analysis Framework version 4 (FAF4) database^{30}. The derived networks are for different food sectors and include all metropolitan areas in the USA. The FAF4 database consists of annual commodity flows during 2012−2015 for 115 geographic areas in the USA and 43 different sectors. We focus on the following four food sectors in the FAF4 database: crops, live animals, animal feed and meat. The 115 geographic areas in the FAF4 database cover the entire contiguous USA, including 69 metropolitan statistical areas and 46 remainders of states (the remainder is the area of a state that is not part of a FAF4 metropolitan area).
To obtain food flows for all metropolitan areas in the USA, we disaggregate the FAF4 database from 115 to 329 areas (Supplementary Fig. 4), out of which 284 are metropolitan or combined statistical areas (120 metropolitan and 164 combined statistical areas). The disaggregation is performed using different socioeconomic and agriculturalrelated variables as attractors of supply and demand. For each food sector, a flow with origin o and destination d in the FAF4 database is disaggregated to a metropolitanlevel flow with origin o′ and destination d′ using a disaggregation variable a as the best attractor of supply or demand.
The disaggregation is performed in two stages. In the first stage, the supply U of each FAF4 remainder of state is disaggregated to include all the metropolitan areas in that remainder of state as follows:
$${U}_{{o}^{{prime} }d}^{c}=frac{{U}_{od}^{c}}{{a}_{o}}times {a}_{{o}^{{prime} }},$$
(3)
where ({U}_{{o}^{{prime} }d}^{c}) (in tons per year) is the disaggregated supply for food sector c in origin o′ that satisfies demand at the FAF4 destination d, ({U}_{od}^{c}) (in tons per year) is the FAF4 food flow for sector c between areas o and d, and ({a}_{{o}^{{prime} }}) and a_{o} are the attractor variables for the new origin o′ and FAF4 origin o, respectively. In the second stage, ({U}_{{o}^{{prime} }d}^{c}) is further disaggregated into demand E using:
$${E}_{{o}^{{prime} }{d}^{{prime} }}^{c}=frac{{U}_{{o}^{{prime} }d}^{c}}{{a}_{d}}times {a}_{{d}^{{prime} }},$$
(4)
where ({E}_{{o}^{{prime} }{d}^{{prime} }}^{c}) (in tons per year) is the demand at destination d′ for food sector c supplied by origin o′, while ({a}_{{d}^{{prime} }}) and ({a}_{d}) are the attractor variables at the disaggregated destination d′ and FAF4 destination d, respectively.
The FAF4 database includes food flow data at both the state level (48 states) and metropolitan level (115 areas including 69 metropolitan areas). Prior to performing the disaggregation, we jointly use the FAF4 state data and the FAF4 metropolitan data to select the best performing attractor variables. That is, we first use equations (3) and (4) to disaggregate the FAF4 statelevel data to the metropolitanlevel for the metropolitan and remainderofstate areas in FAF4. By comparing the performance of our disaggregated flow data against the empirical FAF4 metropolitanlevel data, we select the best attractor variable for each food sector. The following attractor variables are considered: population^{47}, employment^{48}, wages^{48}, number of establishments^{48} and cropland area^{49}. These variables are selected on the basis of previous analysis and data availability^{50}.
To assess the performance of the attractor variables, we use the Pearson correlation coefficient between the empirical FAF4 flows and the disaggregated flows for the metropolitan areas and remainderofstate areas in FAF4 (Extended Data Fig. 3). The performance is high with correlation values greater than 0.87 and an average of 0.95. Using the bestperforming disaggregation variables, we build the food flow networks employed in this study. The nodes in the networks represent metropolitan and remainderofstate areas, and the weighted links represent annual food flows during 2012−2015 for crops, live animals, feed and meat (Supplementary Fig. 5).
The FAF4 metropolitan and remainderofstate areas we used to select the attractor variables span a wide range of populations, cropland areas, and number of establishments, since these FAF4 areas include the largest cities in the USA and a broad range of mediumsize cities. The values of the attractor variables used in the disaggregation are within the ranges implied by the FAF4 metropolitan data (Supplementary Fig. 6), indicating that the variables are reliable. The exception to this is population, which is only used to disaggregate meat demand. Population, however, has a high disaggregation performance with a correlation coefficient of 0.97 (Extended Data Fig. 3). In addition, the use of population to disaggregate meat demand is consistent with previous scaling results for metropolitan areas in the USA^{51} that have shown that metropolitanlevel variables that are related to resource consumption scale approximately linearly with population.
Food inflows supply chain diversity
To determine annual supply chain diversity, we extract the annual food buyer–supplier subgraph of each city and food sector from the food flow networks^{19}. We refer to each of these subgraphs as a food system. The food buyer–supplier subgraph of a city i consists of all the supply chain interactions with its trading partners or neighbours j for a specific food sector. Our measure of supply chain diversity is based on the notion of functional distance^{52}. We compute the functional distance d between i and any of its trading partners j by combining five different indicators: physical distance, climate correlation, urban classification, economic specialization and network modularity. The indicators are described below in the ‘Functional distance indicators’ section of the Methods. We also perform statistical analyses to evaluate the influence of the attractor variables on these indicators (Methods). The indicators represent stable characteristics of cities and therefore tend to remain fairly constant during our study period.
The functional distance ({d}_{ij}^{r}) for an indicator r between any pair of connected nodes (i,j) is calculated as
$${d}_{ij}^{r}={N}^{1}{r}_{k}{r}_{i},$$
(5)
where the normalization constant N is determined as the maximum value of ({r}_{k}{r}_{i}) between any node k in the network and i. In equation (5), ({d}_{ij}^{r}=0) for functionally similar nodes and ({d}_{ij}^{r}=1) for dissimilar nodes.
For each city’s buyer−supplier subgraph and food sector, any pair of connected nodes has 5 different functional distance indicators associated with it. To combine these distance indicators into a single measure, we calculate the average functional distance indicator (langle {d}_{ij}^{r}rangle ) as the arithmetic average of the 5 functional distance indicators for any pair (i, j) of connected nodes. We use the discrete probability distribution of food inflows binned by (langle {d}_{ij}^{r}rangle ) categories, together with Shannon entropy^{53}, to calculate the supply chain diversity ({D}_{i,c}^{t}) of node i and sector c at year t:
$${D}_{i,c}^{t}=frac{{sum }_{k=1}^{K}{Y}_{i,c}^{t}(k)mathrm{ln},{Y}_{i,c}^{t}(k)}{log ,K}.$$
(6)
For sector c and year t, ({Y}_{i,c}^{t}(k)) is the proportion of food inflows to node i within bin k. The k bin is obtained by binning all the (langle {d}_{ij}^{r}rangle ) values for node i into a total number of K bins.
({D}_{i,c}^{t}) is sensitive to the total number of bins K. Thus, for each node in our food flow networks, we tested the sensitivity of ({D}_{i,c}^{t}) to the total number of bins K. For K = 15, D values stabilize (Supplementary Fig. 7); therefore, we used 15 bins when performing all calculations of functional diversity.
Functional distance indicators
The average functional distance between a city and its trading partners is based on the following five indicators:

(1)
Physical distance indicator (PDI). The PDI is obtained by calculating the Euclidean distance from the centroid of each geographic area to the centroid of all other areas. The geometric centroids of all geographical areas are calculated using the GIS software ArcMap (https://desktop.arcgis.com/en/arcmap/).

(2)
Climate indicator (CI). To account for different climates in cities across the USA, the Palmer Drought Severity Index (PDSI) is used^{54}. The monthly PDSI is obtained from the National Oceanic and Atmospheric Administration for the years 1895−2015 at the climate division geographic level. An areaweighted average is performed to aggregate the PDSI data to the metropolitan level. The CI is obtained by calculating the monthly correlation between an area and all other areas.

(3)
Urban classification indicator (UCI). To identify the urbanization level of a geographical area, the UrbanRural Classification indicator of the National Center for Health Statistics is employed^{55}. This indicator classifies counties using a scale from 1 to 6, where a value of 1 indicates the county is highly rural and a value of 6 means highly urban. The UCI is obtained at the metropolitan level using an areaweighted average of the countylevel values.

(4)
Network modularity indicator (NMI). This indicator identifies geographical areas (network nodes) that belong to the same community. A community is a group of nodes whose strength interactions are stronger than with the rest of the network. To identify the network’s communities, we aggregate the flows from the four food sectors (crops, live animals, feed and meat) into a singlelayer network. The communities are identified by maximizing the modularity measure of Newman^{56,57} using the greedy optimization algorithm of Blondel et al.^{58,59}. Network nodes that lie in the same community are assigned a NMI of 1 and 0 otherwise.

(5)
Economic specialization indicator (ESI). Each geographical area is assigned a score based on its dominant economic supply sector. Supply is quantified using the FAF4 intranational commodity flows^{30}. Areas with a dominant meat sector are assigned an ESI of 1, crops an ESI of 2, fruits and vegetables an ESI of 3, animal feed an ESI of 4, live animals an ESI of 5, milled grains an ESI of 6, and industrial products an ESI of 7.
Probabilities of food supply chain shock
The annual probability of food supply chain shock is calculated as the probability that food inflows to a city fall below a percentage of the average inflows for that city during 2012−2015^{8}. To compute this probability, we group all nodes from the 4 food flow networks (1,221 observations) into 6 diversity bins ordered from lowest to highest functional diversity D. The bin size is selected to obtain bins with similar number of observations, approximately 204 observations in each bin. For each city i and food sector c in a bin, we calculate the food supply chain shock ({S}_{i}^{c}) as
$${S}_{i}^{c}=left[1frac{min({I}_{i}^{c})}{langle {I}_{i}^{c}rangle }right]times 100,$$
(7)
where ({I}_{i}^{c}) is the time series of total food inflows to node i for sector c during 2012−2015, and ({rm{min }}({I}_{i}^{c})) and (langle {I}_{i}^{c}rangle ) are the minimum and average values of the time series ({I}_{i}^{c}), respectively.
For each diversity bin b, we count the number of observations n_{b} that meet the criteria ({S}_{i}^{c} > s) for (sin {3,4,5,ldots ,15}), with s being the shock intensity threshold. The probability of a food supply shock S being greater than s in bin b is calculated as:
$${P}_{b}(S > s)=frac{{n}_{b}}{{N}_{b}},$$
(8)
where N_{b} is the total number of observations in bin b. Thus, for each shock intensity s, we obtain a set of probabilities of food supply chain shock,
$$P(S > s)={P}_{b}(S > s),{rm{for}},b={1,ldots ,6}.$$
(9)
Furthermore, we adapt equations (8) and (9) to calculate the probability of a food supply chain shock S being greater than s, P′(S > s), under cooccurrence conditions. We define cooccurrence as any city that experiences a shock to 2 or more food sectors during 2012−2015. With this definition, P′(S > s) is calculated in a fashion similar to that described above. We bin the network’s nodes into 6 groups from lowest to highest diversity and determine the percentage of food supply chain shock with equation (7). Letting ({n}_{b}^{{prime} }) be the total number of cities for which ({S}_{i}^{c} > s,) holds for 2 or more food sectors, the probability of a food supply chain shock S being greater than the shock intensity s in bin b is now
$${P}_{b}^{{prime} }(S > s)=frac{{n}_{b}^{{prime} }}{{N}_{b}^{{prime} }},$$
(10)
where ({N}_{b}^{{prime} }) is the total number of cities in bin b. Thus, under cooccurrence conditions, the new set of probabilities for each shock intensity s is
$$P{prime} (S > s)={P}_{b}^{{prime} }(S > s),{rm{for}},b={1,ldots ,6}.$$
(11)
Statistical analyses
We use the disaggregated food flow data to calculate both the probability of food supply chain shock and supply chain diversity. Therefore, we perform two complementary analyses to test whether the attractor variables used in the disaggregation are causing a circularity issue in the empirical relationship between the probability of food supply chain shock and supply chain diversity. For the first analysis, we determine the Pearson correlation between the functional distance indicators (PDI, CI, UCI, NMI and ESI) and the attractor variables (Supplementary Fig. 8). We find that the attractor variables are weakly correlated with the functional distance indicators (Supplementary Table 1). For the second analysis, we determine the Pearson correlation between the food supply chain shock intensities and attractor variables for the 4 food sectors (Supplementary Fig. 9). The attractor variables are also weakly correlated with the food supply chain shock intensities (Supplementary Table 2). Thus, circularity is not unduly influencing the empirical relationship between the probability of shock and supply chain diversity.
We also evaluate whether the empirical relationship between the probability of food supply chain shock and supply chain diversity is driven by the disaggregation of the original FAF4 data. For this, we recalculate the probability of shock and supply chain diversity using the FAF4 data. For the FAF4 data, the probability of shock also declines with rising supply chain diversity (Supplementary Fig. 10), similar to the reduction observed using the disaggregated food flow data (Fig. 1a), suggesting that the latter data are not driving the relationship.
Furthermore, we test whether the relationship between the probability of food supply chain shock and supply chain diversity holds for different demand levels. To control for demand, we stratify all the data into low, medium and high demand levels using population or food inflows as proxies for demand. For both stratifications, the bounds are chosen so that each level has approximately the same number of data points. Using the stratified data, we recalculate the Pearson correlation between the probability of shock at 3%, 5%, 10% and 15% shock intensities and supply chain diversity for each level of population or food inflows (Supplementary Table 3). We find that the relationship between the probability of food supply chain shock and supply chain diversity holds for these different demand levels (Supplementary Table 3). Using the same exponential model in equation (1) to fit the relationship for the stratified data (Supplementary Fig. 11), we determine the exponential model parameters k_{s} and D_{0,s} for each demand level (Extended Data Table 6). These parameters fall within the 95% confidence interval of the parameters of the exponential model in the main text (Extended Data Table 1), indicating that the model is robust.
We perform two different sensitivity analyses to assess the influence of the five distance indicators on the empirical relationship between the probability of food supply chain shock and supply chain diversity. The first analysis compares singleindicator diversity measures against the multiindicator diversity measure calculated using all 5 indicators. Five different singleindicator diversity measures are compared, one measure for each of the 5 indicators: PDI, CI, UCI, NMI and ESI. For the second sensitivity analysis, we leave out one indicator at a time to calculate diversity using the 4 remaining indicators, which results in another 5 different diversity measures. The diversity measures for the sensitivity analyses are all calculated following the approach in the ‘Food inflows supply chain diversity’ section of the Methods. To perform the sensitivity analyses, we plot the empirical relationship between the probability of food supply chain shock and each diversity measure (Supplementary Figs. 12 and 13), and calculate the Pearson correlation of the data (Extended Data Table 7). The correlation coefficients are used to quantify the influence of the distance indicators on the relationship between the probability of food supply chain shock and supply chain diversity (Extended Data Table 7). The probabilities of food supply chain shock are calculated following the approach in the ‘Probabilities of food supply chain shock’ section of the Methods. We find that the five indicators have a varied influence on the relationship between the probability of food supply chain shock and supply chain diversity (Extended Data Table 7). The inclusion of all 5 indicators, however, in the supply chain diversity measure increases the Pearson correlation between the probability of food supply chain shock and supply chain diversity (Extended Data Table 7).
Recent Comments