A Primer on Recent Advancement on Freight Transportation

The efficient and reliable flow of urban goods and services is essential for the economic well-being of the United States (U.S.), particularly the majority of Americans who live in urbanized areas. According to recently published estimates (FAF3, 2007), the U.S. freight system moves about 19 billion tons of goods valued at $17 trillion in 2007. These shipments result in a total of 5.7 trillion ton-miles of movements during the same year. The performance of the freight system has economic impacts on national productivity, the nation, the costs of goods and services, and the global competiveness of American industries. While demand for freight transportation has been rising steadily and shows little sign of abating, the capacity of the freight system has only modestly increased. Without an efficient freight system, other critical systems, such as energy supply, can be seriously impacted.


Introduction
The efficient and reliable flow of urban goods and services is essential for the economic well-being of the United States (U.S.), particularly the majority of Americans who live in urbanized areas. According to recently published estimates (FAF3, 2007), the U.S. freight system moves about 19 billion tons of goods valued at $17 trillion in 2007. These shipments result in a total of 5.7 trillion ton-miles of movements during the same year. The performance of the freight system has economic impacts on national productivity, the nation, the costs of goods and services, and the global competiveness of American industries. While demand for freight transportation has been rising steadily and shows little sign of abating, the capacity of the freight system has only modestly increased. Without an efficient freight system, other critical systems, such as energy supply, can be seriously impacted.
In order to prepare for future transportation challenges, the transportation agencies at federal, state and local levels are heavily engaged in comprehensive transportation planning efforts. Local, regional, and national planners and policymakers are beginning to more deeply understand freight transportation issues as they apply within urban areas in the United States and around the world. Furthermore, freight mobility is a leading factor in the decision of a private business to locate in or near a particular city. But the goods movement supply chain uses regional, state, and national infrastructure while adopting increasingly complicated intermodal and multimodal transport solutions so it is very complex. Therefore, both the public and private sectors conduct planning and analysis upon which to make critical strategic decisions. This requires that freight data, modelling techniques and other computational and visualization tools be available to aid transportation stakeholders in their decision-making and analysis.
For over a decade, the Center for Transportation Analysis (CTA) of the Oak Ridge National Laboratory (ORNL) has provided assistance to federal transportation agencies in the development of comprehensive national and regional freight databases and network flow models. The objective of this chapter is to provide readers with an understanding of recent ORNL advancements in freight analysis and visualization.
For the freight transportation system to move commodities effectively and efficiently, it needs to overcome geographic barriers within specified time constraints. To this end, the freight system must be multidimensional and dynamic, consider different modes of transportation (truck, rail, water, air, and pipeline), involve both public and private sectors, and continue to evolve through time. To understand this complex and complicated system, one needs not only reliable data but also sophisticated computational tools.
Over the years, practitioners within transportation research community have collected and created a wealth of information stored in many database systems throughout their organization. Some of these systems are well establish and maintained, while others involve smaller data sets collected by individuals or by an office for a specific purpose. In either case the data are valuable by themselves, and could potentially offer a broader insight if the users were given the ability to integrate these individual data sources Because of the spatial and temporal components embedded in almost all forms of freight data, the Geographic Information System (GIS) is an ideal computation framework for freight transportation analysis and modelling. GIS is an excellently suitable tool for managing, planning, evaluating and maintaining the transportation system. Huge geographic databases can be created and maintained as well as many forms of spatial data can be integrated into a GIS platform. It provides the means for researchers to evaluate the system and determine the impacts of capacity enhancements, operational improvements, and public transportation investments. The network representation of the freight system in GIS is invaluable for analysts to visualize critical segments, links or nodes, and makes possible large network analysis that would be computationally intense in other platforms. Therefore, GIS was the selected tool to integrate data within the transportation community.
With the aid of a set of analytical and interactive tools, a GIS framework enables the freight transportation analysts to understand the complex and dynamic spatial interactions among commodities (ranging from raw material, semi-finished and finished products), shipper (farmers, mining companies, factories, and manufactories), carriers (trucking, rail and marine companies, freight forwarder, third-party-logistic provider), consignees, and end consumers.
network sub-systems. We then address some of modelling techniques and computational tools used by practitioners to estimate current and future freight demand and flows on the network system. Specifically, this chapter focuses on geographic databases and analytical tools that have been developed by the CTA to understand U.S. goods movement. After reading this chapter, the reader will have a general understanding of the current freight transportation system and the several analytical tools that have been developed to estimate freight demand and evaluate the system performance.

Understanding the movement of goods
To understand how the movement of goods is shaped we need to define the main agents, or system components, affecting the demand for freight. The resulting freight activity pattern of the interactions between such components can be captured through data surveys. In this section the freight transportation system is introduced and the framework for freight data developed by CTA group is also presented.

Freight transportation system
The term freight primarily refers to the long-haul component of the supply chain. Freight shipments themselves can move by truck, rail, ocean or air and can be characterized as intercity, port to transport terminal, terminal to terminal, interplant, plant to distribution center, and distribution center to distribution center. The freight transportation system can be seen as a system composed of three main components: decision makers or users with a demand for goods movements (producers, consumers, and transportation companies); the physical system supply (transportation infrastructure); and commodities (all different types of goods to be produced and transported).
The demand for the movement of freight involves multiple decision: producers of goods and services who decide how much and how to produce, and where and at what price to sell; consumers, either intermediate (production companies) or final (households, business, public agencies, etc.), who decide what to consume and how much; and transportation companies (shippers and carriers) who decide how to provide transportation services. These decision makers are responsible for production, logistics, distribution, and marketing. The location of producers and consumers certainly affects the spatial distribution of transportation supply, and therefore the pattern of goods movement.
Commodities are the result of production and represent the entity to be transported between geographic locations. The range of products needed and produced is vast and many final products (finished goods) can require a set of other products (semi-finished or raw materials) to be produced. Therefore, in assessing the freight demand, an analyst must be aware of these commodity matrices. Furthermore, a great variety of vehicle types are necessary to match different commodity types and shipment sizes. Depending on the characteristics (e.g. hazardous materials: toxics and flammable substances; high value; perishable) of a particular commodity, its means of transportation may require specific treatments, in terms of routing and vehicle specifications.
The physical system is an intermodal and multimodal system composed of transportation modes; sub-networks, where commodities move; and the interfaces between sub-networks, where commodities are transferred or handled between different modes. In a basic sense, freight transportation modes can be classified as truck, rail, water, air, multiple-modes, and pipeline. Each mode can include different types of fleet and equipment classifications from different transportation companies. The concept of mode in freight transportation is distinctly different from the case of passenger transportation. In freight transportation, mode encompasses physical (the sequence of transportation modes used for a consignment) and organizational (the sequence of entities responsible for transportation) aspects of movements.
In abstract terms sub-networks are composed of subsets of the single-mode network systems, i.e. highway, railroad, waterway and airway; and each of these subsets represents a different reality whether it is a type of service available, a company ownership, or a different type of vehicle (Southworth and Peterson, 2000). The interfaces connecting two or more subnetworks can be either transfer terminals (e.g. seaports and airports, where commodities are transferred between different modes, or different vehicles), and intermodal points (e.g. rail yards, where commodities transported on the same single-mode system are handled by different companies).

Framework for geographic freight data and analysis
Transportation engineers and policy makers require data and analytical tools to fully understand the complex movement of goods on the freight transportation system. Although considerable mode and shipper specific data is collected for a variety of purposes, the sheer magnitude of the freight data system as well as industrial confidentiality concerns, limit the freight data which is made available to the public. Typically, a considerable portion of the freight data is not disclosed by public agencies.
The Commodity Flow Survey (CFS) is probably the most comprehensive survey of freight activities in the U.S. The survey is conducted every five years as a part of the Economic Census by the U.S. Census Bureau, in partnership with Bureau of Transportation Statistics (BTS) under the Research and Innovative Technology Administration (RITA) of the Department of Transportation (DOT). The last survey was conducted in 2007 and previous years of study include 1993, 1997, and 2002. For 2007 , approximately 102,000 establishments were selected from a universe of about 760,000 "in-scope" 2 establishments.
The BTS consolidates the CFS sample, estimates mileage and ton-miles 3 of shipments and publishes versions of survey for public access. The publicly available CFS data are restricted to confidentiality and reliability constraints. Reliability refers to the level of belief on the estimates provided by the sampling method (e.g. estimates from small sample with large variance are less reliable), whereas confidentiality corresponds to the impositions on the survey to avoid disclosure of particular industry activities. Consequently many cells in the public database have no data. In addition the data are also aggregated to higher geographic levels. The most detailed data in this database are at the state level.
The CFS data is used a base source for other freight data sources, either proprietary or of public domain. TranSearch, one of the most well-known commercial freight databases, for example, is compiled based on the CFS data and supplemented by private freight data and updated annually, and is available commercially through IHS Global Insight. The TransSearch database estimates data geographically at the county-level.
The Freight Analysis Framework (FAF) is a public database sponsored by the Federal Highway Administration (FHWA), which is composed of U.S. domestic and international (imports and exports) freight flows. FAF integrates CFS data (in-scope to the CFS) from a variety of supplemental sources for industry sectors not in the CFS (out-scope to the CFS) to create a comprehensive picture of freight movement among states and major metropolitan areas by all modes of transportation. FAF also provides forecasts on freight flows up to 30 years in the future as well as providing annual provisional updates for the current year and truck flow assignments for the base year and outlying future year. It is the third database of its kind, with FAF1 providing similar freight data products for calendar year 1997, and FAF2 providing freight data products for calendar year 2002. The CTA was heavily involved in the development and execution of the methodology for FAF2 and FAF3. With data from the 2007 CFS and additional sources, CTA developed estimates for tonnage and value, by commodity type, mode, origin, and destination for 2007 as well as ton-miles by mode.
The movements in FAF are characterized by freight volume (weight in thousand tons and dollar value), geographic dimension, transportation mode, and commodity type. In terms of the geographic dimension, the FAF data provides freight trading between 123 domestic zones and 8 external zones. The geographic level within the national boundaries is based on the CFS geographic strata, as shown in Figure 1, including 74 metropolitan areas, 33 remainder of states, and 16 regions identified as entire states. External zones, or foreign zones, are defined by 8 world regions: Canada, Mexico, and six other regions defined according to United Nations geographic region. Hence a domestic flow is spatially characterized by its origin and destination zones; imports are reported by foreign origin, FAF domestic zone of entry, and FAF destination zone; and exports are reported by FAF domestic origin, FAF domestic zone of exit, and foreign destination.
In terms of commodity classification, FAF reports freight flows using the same 43 2-digit Standard Classification of Transported Goods (SCTG) classes, as reported by the CFS.  For FAF3, the CTA added the estimations of mileage and ton-miles for freight flows. These ton-miles estimates were derived using models of the network system (see Section 4.1), and freight demand models (see Section 4.2). The CTA team manages detailed geographical data and information of a large multimodal and intermodal freight network, including highways, railroads, waterways, airways, and pipelines, and their associated infrastructure (e.g., intermodal terminals, transfer points, seaports and airports). Section 4.1 describes the construction of geographic representation of the highway-waterway-railway network systems. Among other analysis, this geographic tool makes possible to obtain likely routes for freight movements between geographic zones. The corresponding routing distances in miles are then used as estimates of mileages for freight movements. Because the geographic detail of the network representation is higher than the geographic level of FAF database, certain extrapolation of freight movements from a higher geographic level (state-, metropolitan-, and remainder-level) to a lower geographic (county-level) is required in order to provide ton-mile estimates. This disaggregation of FAF database is done using freight demand models, which associates freight activity to the exogenous variables related to economic activity and network measurements (e.g. travel times, monetary costs, distances, etc). A discussion on freight demand models is presented in Section 4.2, and some models used by CTA are presented in Section 4.3. Section 4.4 presents how ton-miles were estimated for the 2007 FAF.

CTA's intermodal and multimodal network
The CTA team in ORNL maintains a computerized representation of the national intermodal and multimodal network system. This national network system was created by combining earlier digitalized representations of the three single-mode network systems: highway, railway, and waterway (Southworth and Peterson, 2000). The following digital databases were used to construct the network: The abstract representation of a network system is composed of a set of links and nodes, where links represent events (goods movements) and nodes represent connections as well as starting or ending points for events. A "line haul" link is defined by a unidirectional link with positive length and formed by two endpoints. Another important abstract concept is the definition of route which is defined by a sequence of directed connected links. The geographic scope of the analysis defines the detail of network needed. At the national level only main physical links and routes (e.g. interstates, arterials, railroad mainlines and branch lines, ocean and rivers, etc.) are included in the network representation. The CTA network system was proposed with the main intention of estimating the routes, and therefore mileages, for the domestic and export shipments reported in the 1997 CFS. Therefore, in an effort to simulate all activities reported by the CFS, physical links are represented by logical links which in turn simulate different realities in each single-mode network. In the railway system the same physical links are represented by different links to simulate not only a railroad owner but all different railroad companies that have trackage rights over the link. Similarly in the highway system logical links were included to represent both for-hire and private services. The waterway system was separated into three different sub-networks, each one representing a different type of vessel and/or movement: inland and inter-coastal (largely barge traffic), Great Lakes, and trans-oceanic or "deep sea". This logical separation of the single-mode systems was important to model transfer costs between trucking services, railroad companies, and vessels.
Special links were designed to simulate the interfaces between each of the above logical subnetworks. These logical links are named "terminal links" and "interline links". The former simulate unloading /loading operations within terminals to handle goods between different vehicle types. The latter were specially designed to model the locations where goods movements are switched between two railroad companies. It is worth noting that terminals are represented by nodes in the network and their corresponding transfer links are links of zero-length connecting two logical endpoints at the same location. Interline are also links of zero-length connected at the same physical node.
Points of origination and termination of freight movements are represented by node centroids 4 . Such representation is a way to simplify the network model given that it is infeasible to include all actual locations where freight movements are originated and terminated. At the current static CTA network these nodes represent the county centroids.
The connection between generator points, centroids and terminals, and the network system is made by specific "access/egress links". Such links should therefore represent all movements connecting the real freight generators of a geographic area to the network system. For the CTA network a computation routine has been deployed to create these links "on-the-fly", that is every time there is a request to route a movement from an origin centroid to a destination centroid. Figure 4 illustrates how a shipment with mode sequence truck-rail-truck is routing onto the CTA network system. In simple words, a route is generated using a "shortest-path" routing algorithm that executes the following sequence of searches on the network: initiate the route by accessing the highway network (create access links), search for connection of it via truck-rail terminal to the rail sub-network, and return to the highway network via a second multimodal terminal transfer.
Although each individual layer (sub-network) of the network system can be stored and maintained in a commercial GIS, the combined multimodal network poses impediments for its use in GIS. Such impediments are related to the overlapping of geographic features (logical links of the same physical link), and the representation of many geographic features (i.e., terminal links and interlines) over a single degenerate point. In many standard GIS software zero-length links cannot be accepted and links that meet at the same location must be connected. Therefore, to use the network in standard GIS a special version of the CTA network has been prepared. In this version, the locations of logical nodes, along with the ending vertices of polylines incident to them, falling at the same geographic location are slightly perturbed preventing spurious link connections and allowing slightly positive lengths for the zero-length links. A view of the CTA network in GIS is shown in Figure 5.  A route selection routine was developed to determine likely routes over the network for shipments with known sequence of modes. This required the development of impedance functions to represent the generalized cost of different en-route activities over network facilities (line hauls, terminal links, interlines, access and egress links). This process started with a set of what Southworth and Peterson (2000) termed "native link impedance function". Routes over the highway network are determined according to link operational speed. Therefore, highway impedances are surrogates for link travel time. In the railway network impedances are assigned on the basis of the rail line class, i.e. main lines (long-haul and high capacity lines, thus with lower impedance for movement), and branch-lines (shorthaul and less capacity, and therefore with higher impedances). Waterway links received identical native impedance due to the fact that rarely there is more than one choice in routes between any pair of geographic zones.
Native impedance values have been allocated to terminal links and interline links in attempt to simulate transfer costs of unload/loading operations in terminals, and the cost of switching between railroad companies in interlines, respectively. Highway and railway access/egress links received impedance value of 5 times their link length. It is worth noting that the lengths of such links were increased by a circuit factor 5 of 20%. Lengths for water access/egress links were set to zero by assuming that most of the originators for water movements are near to the dock locations alongside the waterway network. However the real link lengths were used to calculate impedance for water access links so that the waterway network could be accessed at points closest to the centroid locations.
Given the native impedances the next step was to determine the relative costs of transports between different modes. Although transportation costs are in reality affected by many factors (related to both cargo and mode characteristics), if we assume that sequence of modes is known for a given shipment, the routing problem may become one of selecting the most likely transfer points between modes. To this end, in the routing algorithm, it has been also assumed that less expensive modes are preferentially used for as large a proportion of the trip as practicable, relegating more expensive modes to access role. Therefore the native impedances defined above were normalized to reflect differences in transportation costs between modes for multimodal movements. This normalization was such that if the water was used it dominated the route miles. Otherwise rail dominated, with highway usually used for terminal access and/or egress.

Freight demand models
As opposed to demand modelling for passenger transportation, there is not a universal paradigm to model freight demand, only individual examples. However, some of the techniques developed specifically to model inter-city and urban travel demand have been also used to model freight demand. Some of these approaches are presented in this section, most of which are based on the book written by Cascetta (2009).
The objective of estimating freight demand models is to represent the production and distribution of goods, either for intermediate use or final consumption, for a given time period. With aim of freight database and economic variables related to the production and consumption of goods, a system of freight demand models can be formally expressed as where dod represents a demand flow (usually expressed in tons) between a origin zone o and a destination zone d. The characteristics p, c, m and k are associated with sectors of economic activity, commodity types, transportation modes and routes, respectively. The A variables reflect the economics of production and consumption. T are variables related to attributes of the different transportation modes and services (times, costs, service reliability, etc). Vector β denotes the model coefficients.
Models can be classified according to the assumptions about modelling approach or to the level of data aggregation (see Cascetta, 2009). Based on the model assumptions, models can be of type descriptive if they merely describe empirical relationships between the exogenous (explicative variables related to the economic system) variables and the endogenous variables (response variables related to freight demand); or behavioural if they explain the behaviour of decision makers in choosing among the universe of choices involved in the production and distribution of goods. According to the unit of variables available for model calibration/estimation models can also be of type aggregate and disaggregate. Aggregate models use average of variables related to aggregate units (e.g. all companies of a given industry sector) aggregated to the geographic zone level, whereas disaggregate models use variables related of small units, within the geographic zone, such as individual companies or individual shipments.
In this section, we will discuss national freight models (aggregate and disaggregate models) for predicting annually domestic and foreign trade in the U.S. In the following discussion, the sector p represents the producer sector (or originated economic sector) which is consistent with the CFS database. In addition, in most of the following discussion, the subscript p is also omitted from the equations for simplicity in the notation. The allocation of freight flows on the transportation system (i.e., interaction between demand and transportation supply) is not represented in the models that will be discussed.
The model in equation (1) represents the freight demand resulting from choices made by decision makers of a given economic sector with respect to production, and spatial and modal distribution of freight demand. Such decisions affecting the freight demand are interrelated. However when the system of models of equation (1) represent the complete universe of choices that decision makers should face, a single model may not be feasible either computationally or analytically. In order to simplify the analytical representation, the single model of equation (1) is usually decomposed into sequence of sub-models each representing a step within a sequential decision process. Partial share models can be estimated by the traditional paradigm of transportation demand modelling (or four-stage model) which consists in separately estimating generation, distribution, mode choice models, and route choice models (see Ortuzar and Willumsen, 2011).   is a constant parameter that should be calibrated to balance out quantities of production and attraction between pair of geographic zones.

Equations (2) -(3) below present ways of decomposing the global model into sub-models, d od [p,c,m] =d o. [pc](A). p[d/pco](A, T) . p[m/pcod](A, T) ,
In the following sections, we will briefly describe each of the sub-models in equations (3)-(4) and illustrate how some of the sub-models can be formulated as function of aggregate variables provided by the CFS regional database, as well as variables related to the transportation system from the CTA network, and variables related to a given economic pattern provided by the U.S. economic census. Below it is described aggregate descriptive models for generation, descriptive and behavioural models for distribution and mode choice. A descriptive model for the joint generation and distribution of freight demand is also presented.
Generation models, d.d [pc](A) and d.d [pc](A), describe how much is produced (supply) and how much is needed (demand) for a given pattern of economic activity in a geographic area. Descriptive generation models represent the empirical relationship between freight demand (produced or attracted) by geographic area and a given pattern of economic activity. Such models can be applied to short term analysis in which the pattern of economic activity is given. In this case we try to estimate the amount of goods (in tons) generated due to decisions related to what and how much to produce. The freight demand resulting from long term decisions, such as where to produce, are not considered in this simplified descriptive approach. Modelling long term decisions should incorporate behavioural aspects that may require disaggregate data to be modelled.
Equation (5) shows an example of a linear model for production of goods where the unit of aggregation are economic sectors within regions. The parameters of these models are usually obtained with statistical estimation methods, or multiple regression analysis.
Where Xkpo are exogenous variables related to the economic activity of sector p in zone o.
βk are the model coefficients to be calibrated.
Attraction models are not straightforward to estimate since they should represent the total amount of goods attracted to a zone that are produced by a given sector, from companies located at different zones, and used by multiple sectors in that zone. Therefore, one way to represent aggregate attraction model is: where W is a matrix that containing coefficient factors representing the industry-to-industry trade of goods. These are binary values that are equal to one whenever a sector associated with a row can provide goods to a user associated with a column, and equal to zero otherwise. These assignment coefficients can be obtained from regional or national input-output accounts; X are variables related to the economic of those industry sectors in the destination zone that use commodities produced by sector p, as well as demographic variables representing final consumption of goods.
Examples of variables related to economic activity are the number of employees or total payroll by industry sector in each geographic zone. Demographic variables are represented by population or number of households within each geographic zone.
Distribution models estimate the trade flow between geographic zones (or regions) of the study area. The product of such models is called an origin-destination matrix of freight flows that satisfies the "trip-end" production and attraction constraints. is the total freight demand attracted to d that is produced by sector p. The name "gravity" came from the model resemblance to Newton's law of gravity. The impedance function f(Cod) is a monotonically decreasing function of the generalized transportation cost Cod. One typical expression for this function which can be derived from the entropy maximization problem (see Wilson, 1967) is: In the context of passenger transportation, the formulation of equation (7) is obtained by finding the most likely distribution pattern, corresponding to maximizing an entropy function (see Subsection 4.3.2) subject to macro constraints on the total number of trips produced and/or attracted as well as on the overall transportation cost. The entropy function is a measure of the number of possible arrangements of individuals that gives rise to a certain distribution pattern. To extend this model to the case of freight, we would have to assume that each trip represents a unit of freight (tons) being transported from an origin to a destination point.

The behavioural model, p[d/po](A,T)
, estimate a probability of choosing a destination d for a given industry sector and origin of shipment o. Therefore, this model combines the distribution and attraction models into a single formulation. Assuming that the set of alternatives is formed by all zones in the study area a formulation for this model can be estimated on the basis of the random utility theory. Under this paradigm, the most common model form is the Multinomial Logit (MNL) model, as expressed below:

p[d/po](A,T) = exp(Vd /θd) / ∑i exp(Vi /θi),
where Vd is the expected utility, or systematic utility, value of choosing a destination d for given industry sector p and origin zone o; θd represents the Gumbel distribution parameter of the perceived utility Ud, such that Ud = Vd + εd, where the εd values are the random residues, deviations from the mean value Vd, which are assumed to be independently identically distributed as Gumbel random variable with zero mean and scale parameter θd.
The systematic utility is formulated as function of the attributes related to characteristics of the alternatives and the decision makers. Equation (9) shows a typically linear specification.
The attributes of the systematic utility are grouped into attributes of the activity system in zone d, or attractiveness attributes; and attributes that quantify the accessibility or cost of travel between zones o and d. Attractiveness attributes are variables that measure the attractiveness of a zone as destination. As mentioned before, they might be a function of the number of employees or the total payroll for a given consumer industry, or client industry. Demographic variables such as population are also measures of attractiveness. Cost or accessibility attributes are variables reflecting the generalized cost of moving goods between o and d. They can be a straight-line distance connecting the centroids of zones o and d, or generalized cost variables that take into account the different contributions for each of the modes available between zones o and d. By explicitly showing the measures of attractiveness and transportation cost, equation (9) becomes where Ahd are the measures of attractiveness in zone d.

Mode choice model, p[m/pod](A,T)
, predict the fraction or choice probability that decision makers of a given sector p select mode or service m to ship goods from zone o to zone d. The first random utility models were formulated to analyze transportation mode choice. The MNL formulation can be then applied to predict these mode choice probabilities.
The definition of the mode choice alternatives constitutes the first step in the modelling process. In the context of freight transportation the alternatives of a mode choice model are the individual transportation modes (truck, rail, water, air, pipeline, etc) and combinations of single modes (truck and rail, truck and water, truck and air) as described in the CFS database. Different services provided by carriers -related to delivery time, security, levels of priority, type and capacity of vehicles, suitability of vehicles to certain types of commodities, etc. -can also be used to represent elementary alternatives of transportation.
The next step corresponds to the definition of the choice set, which is the set of mutually exclusively alternatives or group of alternatives available for a given decision maker. In this case it is possible that some alternatives are unavailable for a given decision maker and therefore should be excluded from choice set. It may also happen that the analyst has not exact knowledge of the alternatives available. To handle the mode availability special treatments have been proposed in the literature (see Cascetta, 2009). By assuming that all decision makers face the same set of alternatives, the MNL formulation for the probability choice is

p[m/pod](A,T) = exp(Vm /θm) / ∑i exp(Vi /θi).
An alternative to deal with mode availability would be to force the systematic utility to minus infinity, whenever the mode is not available, which would result in a choice probability of zero.

The joint generation and distribution model, dod[p](A,T)
, predicts the actual demand from a zone o to a zone d for a given pattern of economic activity and transportation system. This model combines the generation and distribution models into a single functional form. In some sense, such models are analogous to the gravity model, presented before, in which the number of trips (or zone mass) produced or attracted to a zone are replaced by exogenous measures of production and attractiveness of that zone. The descriptive formulation presented bellow is the traditional gravity model formulation in economics (Brocker et al., 2011): where p is a multiplier representing the overall scale of industry p; Ao and Ad are the measures of production and attractiveness of zones o and d, respectively. The Gross Domestic Product (GDP) can be used as an indicator for variables Ao and Ad; fpod is a distance decay function representing the trade impeding effect or transport cost as well as other barriers; ξp and ζp are the elasticities to be estimated.

CTA's freight models
The CTA group developed descriptive freight generation models (production and attraction models) to predict freight demand by U.S. state resultant from domestic interstate commerce as a function of economic activity (Oliveira Neto et al., 2012). Fundamentally, the industry activities, translated into capital and labour by industry sector, should be the main base to explain the economic activity within a geographic area. In addition, the CTA team uses the structure of the national freight database in GIS and network analysis tools to estimate descriptive models for freight distribution. Specifically the process of employing the gravity model for a given origin-destination matrix of freight demand and transportation supply system is presented here.

Generation models
This section presents the estimation of static freight demand model due to the domestic trade in the U.S. for the year 2007. Two major data sources were used in this effort: 2007 CFS tabulations and the 2007 County Business Pattern (CBP). Specific CFS tabulations requested as supplement information for FAF estimation process were used to obtain sample estimates for freight productions and attractions in weight units. These tables contain the information of domestic movements of goods between U.S. states for 28 industry sectors in 2007, classified by the 3-digit North American Industry Classification System (NAICS), which are responsible for the production freight transportation. The list of these NAICS codes can be found in CFS document referenced above 6 and BTS website 7 . It is worth noting that in this special tabulation only a small number of cells were suppressed due to likely small sample sizes and their resultant large sample variances. These cells were treated as missing information during the modelling process. The CBP is an annual series that provides sub-national economic data by industry. The series is useful for studying the economic activity of small areas and for analyzing economic changes over time. In addition, it can be used as a benchmark for statistical series, surveys, or other databases between economic censuses. The survey covers most of the U.S. economic activity, except self-employed individuals, employees of private households, railroad employees, agricultural production employees, and most government employees. The database provides information on payroll, number of establishments and number of employees for industries classified, since 1998, according to NAICS. The variable chosen to represent the economic activity of a given geographic area was the annual payroll by industry sector.
In the modelling process, it was assumed that the origin zones (U.S. states) are producers, where the products or commodities (raw materials or final products) are obtained or produced. In contrast, the destination zones are considered the "users" where the products or commodities are used/assembled/modified by intermediate industry sectors. Both production and attraction models by industry sector were specified as a power model with single explanatory variable by the following equation: where, ypi denotes the response variable (shipment tons by industry sector p and state i); xpi denotes the exogenous variable (annual payroll by industry sector p and state i);  and β are the model parameters, to be estimated.
Production and attraction equations have been estimated for each of the 28 industry sectors. With respect to production equations, the response variables in CFS represent total demand produced by a given industry sector and state. In this case the exogenous variable was the corresponding payroll by industry and state. As for the attraction equations, the response variable represent the total demand attracted by a state that was originally produced by one of the industry sectors. Therefore, the total demand attracted is not separated by the industries that use the goods, rather is classified by industry sectors responsible for production. To be consistent with the CFS data, the single exogenous variable, or explanatory variable, in the attraction equations is composed of the payroll by state for those industries that use the commodities produced by the originated sectors. Figure 6 shows a scatter plot of freight produced versus total payroll for industry sector 311 -Food Manufacturing. The data points are sample observations for the 50 U.S. states and the District of Columbia. The fitted curve and estimated production equation are also presented in the chart.
It is worth noting that Oliveira Neto et al. (2012) compared the structure of the 2007 models with models estimated based on 2002 CFS similar tabulations. The results of their empirical analysis indicated some structural change in the production and attraction models between 2002 and 2007 due to a possible increase in productivity and a significant economic growth over this short time period. In addition, even when no significant change in model structure was detected, the modelling process did not result in reasonable predictions of freight for a future year. After applying 2002 models to predict freight volumes in 2007, it was found that there may be other factors, besides payroll, ought to be included in the modelling process.

Distribution models
In this section we illustrate how a gravity model can be estimated employing the same 2007 CFS tabulation used for estimating freight generation models earlier. As seen in Section 4.2, in transportation planning gravity model is used to balance an origin-destination trip matrix so that the zone-to-zone flows are consistent with the total trips generated at each origin zone and the total demand terminating at each destination zone. The model form that will be estimated for this exercise is where Tpij is the annual freight flow in thousand tons to be estimated; Ppi is the total amount of freight (thousand tons) produced in state i by sector p; Apj is the total amount of freight (thousand tons) attracted to state j from the originated industry p; cij is a measure of the distance that separates the states i and j; βp is the specified parameter, the value of which says how important is the distance variable in explaining the trade between zones i and j for a given producer p; and αij are adjustment factors that should be calibrated to balance the freight flows by assuring that ∑jTpij = Ppi and ∑iTpij = Apj for all i and j. (14) is known as the classical gravity model. As demonstrated by Wilson (1967), it is derived by applying the Lagrange multipliers on the following optimization problem:

The formulation in Equation
∑i ∑jTpijcij = Cp , for all i and j , where the objective function in equation (15) is a monotonic function, often referred to as entropy function; Equations (16) and (17) are constraints representing our knowledge about the total productions and attractions per zone; Equation (18) is a constraint corresponding to our knowledge about the total expense, denoted by Cp, in using the network system by industry sector p. If we measure all cij in miles, Cp is therefore a measure of total annual tonmiles loaded on the network system for a given industry sector.
It is important to mention that if the parameter βp is known in Equation (14) the values exp (βpcij) become reference factors, or the elements of a priory trip matrix, and the model reduces to what is called in the literature as ordinary gravity model. In this case the adjustment factors αij can be estimated by a bi-proportional matrix balancing method, also known as "iterative proportional fitting". This balancing method was apparently first described by Kruithof (1937), who used the model for prediction of telephone traffic distribution (see Lamond and Stewart, 1981). In transportation planning this method is referred to "Fratar Method" in the U.S. or "Furness Method" (see Furness, 1965) elsewhere.
Since we do not have complete information about the total expenditure Cp, the estimation of βp cannot be done by directly solving the problem (15)-(18). If we had Cp it can be shown that a unique solution of the problem could be found by solving a system of linear equations of the flows Tpij and the Lagrange multipliers, one for each constraint. Approximation methods have been proposed for estimating βp when the knowledge about the overall system expenditure is unknown. The method devised by Hyman (1969) was used to obtain estimations of the parameters βp for each industry sector listed in Table 2. Hyman's method is an iterative procedure based on successive applications of matrix balancing techniques for a given sequence of estimates for βp, that are appropriately readjusted at each iteration (see Ortúzar and Willumsen, 2011).
To estimate the model of Equation (14), it is first necessary to obtain a matrix of distances cij for travelling between origin and destination zones. In this exercise we assumed that the impedance to travel between a pair of U.S. states is determined by a function of the average distance in miles of the set of paths on the CTA network system between all corresponding pair of contiguous counties, as expressed by cij = ∑rsdrs / n for all r є i and s є j , where r denotes a county centroid within origin state i and s denotes a county centroid within destination state j; n is the number of possible pair-wise combinations of counties that exists between zones i and j; drs is the average distance in miles to travel on the CTA's network between the county centroids r and s.
The routes between county centroids are determined in terms of the impedance functions defined in Section 4.1. For the highway and the waterway system, distances between counties are determined on the basis on the minimum shortest paths so that one single route is found for each pair of counties. When railway is available the distance is estimated as an average distance calculated from the set of shortest routes obtained from all possible combinations of available railroad carriers. Recall that since the sequence of modes is unknown in the CFS tabulation used for calibration, the resulting distance between counties will be predominantly on water modes, whenever available, followed by railway distances, with highway used for short-hauls.
After applying the Hyman's methods using the 2007 CFS tabulation and the average distance matrix described above, we obtained the parameters listed in Table 1. A statistic for comparing the set of estimated flows {Tpij} and the set of observed flows {Npij} is also provided in Table 1. This measure, Equation (20), dubbed standardized root mean square (s.r.m.s), was proposed by Pitfield (1978) as an alternative to deal with sparse matrices and also to consider the scale of the variables involved. However, it does not have any statistical property. It is merely a descriptive measure of goodness-of-fit.
where M is the number of cells in the estimated set {T pij }. Table 1 also shows the R 2 statistics of the models and the total freight flows in thousands of tons for each industry sector. Note that the conclusion about the model's goodness-of-fit should not be made solely based on the R 2 statistics, as systematic errors and difference in variances cannot be captured by the R 2 statistic.

Ton-miles for U.S. freight movements
This section presents two methods for estimating the ton-miles of freight movement over the U.S. network system (within the U.S. boundaries). Based on the freight network and demand models described so far, two procedures can be identified to estimate ton-miles for freight movements:  Shipment allocation for a given sequence of modes;  Prediction of freight flows and average mileage between counties using CTA's network system. The first alternative was the one proposed by Southworth and Peterson (2000) to allocate CFS shipments on the network system when the detailed sequence of modes is given. This method can be used to allocate freight movements onto the U.S. network system resultant from both U.S. domestic and foreign trade as long as the sequence of modes and the main geographic references (i.e. origin, destination and U.S. ports of entry/exit) are provided. Note that in CFS a shipment is characterized by its volume (weight and monetary value), the Zip Codes for origin and destination, and the sequence of modes used, as well as port of exit and foreign cities for exports. As discussed in Section 4.1 routing procedures were developed to find the most likely route and transfer points for a shipment with given sequence of modes. Such a computational tool can be used to allocate shipper-based databases onto the CTA network system to provide estimates of link ton-miles and, subsequently, the overall system ton-miles, with Equation (21), by transportation mode. As an illustration, Figure 7 shows an example of freight flows for shipments on the main railroad lines (i.e. Class I railroads), on the highway system, and on the inland waterway sub-network.

NAICS Industry
where Cm denotes the overall ton-miles by mode of transportation m, as listed in Table 1; Tma denotes the total freight tons through a link a; dma is the distance in miles over a link a; When a detailed description of shipments is not available, which is the case for the public CFS, the second procedure may be used. In general, the method is designed to estimate freight flows between U.S. counties as well as the mileages over the most likely routes connecting county centroids. With respect to the estimation of freight flows, freight models based on aggregated data (freight movements, and economic activity) are projected and applied to estimate freight movements in more disaggregated geographic level, based on economic disaggregated data. This process is called disaggregation of freight data. In our case, such disaggregation procedure relied on the estimation of separated nationwide generation and distribution models by industry sector as discussed in Section 4.3. Note that those models in Section 4.3 only represent the U.S. domestic trade and may not be used for estimating freight movements on the U.S. network resultant form foreign trade between U.S. and external zones (trading countries). Nevertheless, as we will see, such modes have been used to estimate ton-miles on the U.S. system for the 2007 FAF database resultant from both domestic and foreign trade.
As seen in Subsection 4.3.1, a single variable (i.e. state payroll by industry sector) was used to explain the effect of economic activity on internal freight demand. In that process a set of 28 production and attraction equations were estimated. With these models, freight productions and attractions were estimated at the county level (using the county payroll by industry sector). Such estimates were then used as weights to disaggregate each of the total FAF origin-destination flows to the corresponding county productions and attractions. These final estimates were then used as marginal totals for the distribution models. Regarding to distribution models, gravity models with exponential deterrence function have been estimated for each industry sector. The variable (argument of the deterrence function) used to explain the resistance for travelling between zones was an estimate of the average distance to travel between states on the CTA's network system (see Subsection 4.3.2). Using the estimated model parameters, freight movements between counties by the 7 classes of modes listed in Table 1 can then be obtained.
The following steps describe the FAF disaggregation procedure: a) FAF database is organized by industry sector and mode, and its flows classified by SCTG are grouped to the corresponding 28 NAICS groups; b) for each FAF origin-destination pair (classified by NAICS and mode), productions and attractions by industry sector were estimated (using the freight generation models) for the counties within the corresponding FAF origin and destination zones, respectively; c) using the estimated values from b) as weights, the FAF flow is then disaggregated to generate the productions and attractions by the corresponding counties; d) a matrix balancing procedure (i.e. gravity model by industry sector) is applied to distribute the estimated total productions and attractions from d) in order to generate the freight flows between the corresponding counties. The set of Equations (22)-(26) summarizes the disaggregation process.
Ppmr =Tpmuv wpur , Apms =Tpmuv wpus , Tpmrs = αprs Ppmr Apms exp(-θp dmrs) , where Tpmuv is the FAF freight flow between FAF domestic zones u and v by mode m to be disaggregated; wpur is the estimated fraction of the total freight produced by sector p in FAF zone u that is generated by county r; wpur is the estimated fraction of total freight produced by sector p and terminated in FAF zone v that is attracted to county s; Ppmr and Apms are the estimates for freight production and attractions in the counties r and s, respectively, shipped and delivered by mode m; μp, γp and θp are the estimated model parameters by sector p; prs are the county adjustment factors calibrated for each industry sector p; dmrs is the distance in miles to travel over the modal network denoted by m.
The set of modal distances to travel over the U.S. network system is the second piece of information necessary for estimating freight ton-miles. To this end, the CTA network system is used for estimating the mileages dmrs for likely freight movements between county centroids. In this case, distances over the U.S. network are estimated for each mode class (see FAF3, 2007, for a description of mode categories), as follows:  For truck, rail and water modes, the distance between a pair of county centroids is determined based on the most likely route over the corresponding single-mode systems (highway, railway and waterway networks);  Distances for air are estimated on the basis of the Great Circle Distances (GCD), which is the distance along the earth circumference between any two geographic points;  The distances over the highway are used as surrogates for pipeline distances;  For the multiple mode & mail category, distances are estimated by finding the most likely routes for a given truck-rail-truck mode sequence;  For the Other & Unknown modes category, distances are determined by finding the most likely routes over the multimodal and intermodal network system, similarly to that described in Subsection 4.3.2;  The no domestic mode does not use the domestic U.S. network and therefore is ignored in the ton-miles calculation.
With the disaggregated estimation of flows between counties and the county distances by mode, ton-miles for freight movements by mode can be calculated as follows Cm = ∑p∑r∑s Tpmrs dmrs. As expected all FAF3 estimates are larger than the corresponding CFS estimates due to the inclusion of freight activities for out-scope industries and import movements in the FAF3 database. Both FAF3 estimates of ton-miles for rail and water are consistent with estimates reported by modal data programs, AAR and USACE, respectively; with 2% difference in waterway movements and about 5% for railway movements. The discrepancy observed for railway movements is likely due to the following reasons. The FAF3 database does not account for transhipments from Canada (e.g., Canada-Mexico), which is estimated to be about 15 billion ton-miles annually. Furthermore, AAR report includes some movements of empty containers and the weight of containers for mixed freight (estimated to be about 60 billion ton-miles), which are not considered in the FAF3 database. With these two considerations, the gap between FAF3 and the AAR can be reduced to less than 1%.

Concluding remarks
This chapter presented a framework for national freight data suitable for national decisions with respect to movements of goods in the U.S. The CTA team in the ORNL developed and maintain a comprehensive database with all necessary dimensions (geographic, commodity nature, mode of transportations) to understand how goods move on the U.S. transportation system. To construct such database, geographic representation of the network system and analytical tools were extensively used. The chapter provided a number of analytical examples for modelling the demand for freight, and specifically presents the demand models developed by the CTA in the recent years. Although it is mostly descriptive, the chapter presents a simple application of how to estimate ton-miles over the freight transportation system.
Estimation of ton-miles is an import application of such tools to gauge the freight system usage and provide insightful information for national political decisions. The methodology for estimating ton-miles over the transportation system was based of prediction of freight flows and distances between U.S. counties. In the demand side, the disaggregation of freight flows was a necessary step to reconcile the freight database with the detailed representation of network system and derive more accurate estimates of ton-miles. As for the transportation system, the geographic representation of multimodal and intermodal interfaces over the transportation network system allows the analyst to predict likely routes that are more representative of real world activities involved in the transportation of goods. In sum, for a given transportation mode ton-miles are estimated by an element-by-element multiplication between a matrix of freight volumes and a matrix of average distances for the set of likely routes.
Models of the freight demand and the supply transportation system can also be applied in several other transportation related problems. Besides ton-miles estimation, the analytical and geographic tools described in this paper can help perform impact analysis at macro level such as energy use and environmental impacts of the transportation activity, as well as effects of external events. In future work CTA will apply the proposed models to investigate alternatives (e.g. modal shifting, vehicle technologies, etc) for alleviating some of negative effects due to the high use of fossil fuels. In addition, the network models will be used for identifying the system vulnerability and resilience to damage and disruptions caused by natural and manmade events (e.g. hurricane, flooding, terrorist attacks, chemical spills, etc).

Author details
Shih The contents of this document reflect the views of the authors, who are responsible for the facts and accuracy of the information presented herein. The contents do not necessarily reflect the official views or policies of the Department of Transportation State or the Federal Highway Administration. This report does not constitute a standard, specification or regulation.