Improving Population Demand Estimation with Transit Chaining Breaks

Jin Haitao1, 2, 3, *, Jin Fengjun1, 3, Ni Yong1, 3, 4, Huang Jianling2, Du Yong2
1 Key Laboratory of Regional Sustainable Development Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
2 Beijing Transportation Information Center, Beijing 100161, China
3 University of Chinese Academy of Sciences, Beijing 100049, China
4 China National Environmental Monitoring Centre, Beijing 100012, China

Article Metrics

CrossRef Citations:
Total Statistics:

Full-Text HTML Views: 2025
Abstract HTML Views: 1260
PDF Downloads: 300
ePub Downloads: 211
Total Views/Downloads: 3796
Unique Statistics:

Full-Text HTML Views: 845
Abstract HTML Views: 600
PDF Downloads: 219
ePub Downloads: 141
Total Views/Downloads: 1805

Creative Commons License
© 2018 Haitao et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Key Laboratory of Regional Sustainable Development Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, 11A, Datun Road, Chaoyang District, Beijing 100101, China; Tel: 0086-13911172325; E-mail:



Data mining of smart card data collected through AFC systems have proved useful in estimations of public transport demand. Whereas most estimations of demand are made by analyzing transit orientations or destinations of unchained transits. However, organization of bus or metro routes compels riders to make a lot of unnecessary transfers, and the transfer points are neither reflective of population’s actual orientations nor of their destinations.

Aims and Objectives:

The objective of this paper is to improve estimations of population demand by identifying transfer activities of riders using public transportation. Durations and displacements of transit chaining breaks are to be check in judging the transfer activities.

Boarding stops for making transfers are ruled out as transportation demand estimation. The effectiveness of the new approach entailing the use of transit chaining breaks is also to be evaluated based on the calculation of Pearson product-moment correlation coefficients for assessing the correlation between transportation estimation and population distribution.

Result and Conclusion:

Durations and displacements of transit chaining breaks could be used to identify transfer activities. The use of the transit chaining approach reduces the occurrence of false demand, resulting in the estimation being more objective in relation to the population.

The results of the study indicated that the inclusion of transit chaining breaks leads to more accurate estimations of public transport demand within a population.

Keywords: Public transport, Urban transportation, Transport demand estimation, Transit chaining breaks, Smart card data, Population equity.


Reliable estimations of transit demands can facilitate improvements in public transport [1]. Traditional estimations of the demand for public transportation are based mainly on surveys [2], which are expensive and tedious to conduct [3]. The use of smart cards enabling Automated Fare Collection (AFC) is becoming increasingly popular [4], and data collected through AFC systems have proved useful in transit planning [5]. Whereas most estimations of demand are

made by analyzing transit orientations or destinations [3], in practice, transfer points are neither reflective of transit riders’ actual orientations nor of their destinations. The practical organization of bus or metro routes compels riders to make unnecessary transfers. Consequently, an investigation of how such unnecessary transfers can be avoided is pertinent [6].

Because transit OD (Origin–Destination) pairs are not ideal for use in estimations of demands for transportation, some researchers have explored the use of trip-chaining approaches, according to which transits are viewed as complete journeys. Trip chaining approaches provide researchers with useful information [7], and the availability of comprehensive profiles of riders’ transit behaviors enables modelers and public facilities planners to improve estimations of demands for transportation [2]. The trip-chaining approach basically entails connecting sequenced legs of the trips of smart card holders and listing their public transport trips. Particular criteria, such as whether passengers’ boarding stops are close to the stops at which they previously alighted, and whether a reasonable amount of transfer time exists, are used to identify transfers made between transits. However, the reliability of these methods has not been well investigated [8], and applications of the chaining approach have not been adequately explored.

This study is aimed at investigating how transit chaining and Transit Chaining Breaks (TCBs) can be used to identify transfers made during riders’ use of public transportation. Moreover, based on our findings, we discuss whether and how the adoption of this approach could lead to improved estimations of population demand in public transport.


2.1. Description of the Transit Chaining Breaks approach

A trip chain comprises a series of trips made by a rider on a daily basis, and the entailed sequence of trips demonstrates the rider’s traveling behavior [9]. The transit chaining method is normally applied by connecting a passenger’s trip legs [8]. Some public transportation riders may arrive directly at their shopping or work destinations, whereas others may make transfers immediately after alighting at their stops. Still others may commence their transit after reaching a stop as a result of taxi rides or simple walks. Regardless of the commuting behaviors of riders, transit chains comprise single transits connected by breaks. AFC systems record the boarding and alighting times of transit riders in some cities, thereby generating geo-tagged transit data that enable the calculation of displacements and trip durations, which are important attributes of Transit Chaining Breaks (TCBs). Fig. (1) depicts a cardholder’s transit chain. Point A denotes the first boarding station; point B denotes the first alighting station, and so forth. Displacements occur between the previous stop at which the rider alights and next boarding stop. All three breaks have particular durations, but the third break does not entail a displacement.

Fig. (1). An illustration of a transit chaining break.

A key issue addressed in this study focuses on whether TCBs are transfers or actual destinations. Transfers are treated as an unnecessary demand generated by an imperfect transportation system. TCBs have specified durations, and displacements are not accounted for in estimations of demands for transportation.

2.2. TCB Duration

TCB duration is defined as the time that lapses between the swiping of a rider’s smart card when boarding at a stop and his or her previous alighting. It includes the duration of the following activities of the rider: checking out, walking between stations, and finally getting into the public transport system and checking in with his/her smart card. The duration of transit number n of a cardholder is calculated as follows:


where boarding_timen+1 denotes the boarding time for transit number n and alighting_timen denotes the alighting time for transit number n+1.

Many dimensions need to be considered to determine the purposes of transit. However, an analysis can be performed using the threshold time of transfer within a public transport system to determine whether a TCB is a transfer. The time required to make transfers varies, and it is difficult to ascertain the required transfer times using the smart card data alone [6]. Previous studies have shown that the threshold transfer time may vary, usually ranging from 30 minutes to 60 minutes, and even extending up to 90 minutes [10-12].

2.3. TCB Displacement

TCB displacement is defined as the distance between the boarding station and the previous station at which the rider alights. Because the lengths of actual routes are complex and difficult to ascertain, here great-circle distances between boarding stations and previous alighting stations are treated as TCB displacements, and the displacements are calculated using the following haversine formula:


where ϕ1, λ1, and ϕ2, λ2 denote the respective geographical latitude and longitude, in radians, of the boarding station and the previous alighting station, Δλ is the absolute difference between λ1 and λ, and r is the radius of the Earth.

Previous studies have shown that the transfer type TCB distance can range from 400 m to 1,100 m [8, 13]. On its own, the TCB displacement does not indicate whether it is a transfer point. Displacements in combination with TCB durations are more effective in identifying transfer activities.

2.4. Identification of Transfer Activity

Here, we introduce the duration-displacement matrix of TCBs. Based on this matrix, the following criteria were used to determine transfer activity relating to TCBs: (i) the TCB duration is too short for implementing activities other than transfers and (ii) the displacement occurs within a walkable distance.

2.5. Verification and Validation of the Approach

A residential population zone is considered to be positively related to the demand for public transport [14]. The test criterion was the consistency of the demand estimation with regard to the distribution of the population investigated in this study. The demand estimation of zone number i can be expressed as:


where demandi and countj denote the outcome of the demand estimation for zone i and the assumed demand of transit station number j. The value of wij is 0 or 1 depending on whether station number j is located in zone i.

Control group: Daily boarding volumes at stations were treated as public demand, and the counts were projected into cell zones based on a population survey. Array Xcontrol = [demand1, demand2, demand3, … demandi]control, where demandi denotes the need for public transportation in zone i.

Test group: After excluding transfer TCBs, the counts were projected on to population survey zones. Array Xtest = [demand1, demand2, demand3, … demandi]test.

Pearson product-moment correlation coefficients (Pearson) were calculated for population arrays in relation to the demand arrays to measure how the estimations generated by new method were in relation to the population. The formula used for calculating Pearson was:


where cov denotes covariance and σX and σY are standard deviations of X and Y.

Y = [population1, populaton2, … populationi], where populationi is a population in zone i.

If Pearson(control) < Pearson(test), then the TCB method can be considered to be more objective than the non-TCB approaches.


3.1. Data Description

An assessment of transit data for Beijing’s bus and metro systems revealed that there was very little change in traffic volumes at each station on workdays. For example, fluctuations in card checking counts at the 100 busiest stations in Beijing were less than 5% during the period August 1519, 2016. For this study, the transit logs for Beijing’s bus and metro systems were obtained for August 17, 2016, as they yielded typical data for workdays. Each entry contains card numbers, boarding stations, boarding times, alighting stations, and alighting times. A total of 13,000 bus stations and 345 metro stations were covered in the study. Analysis of the data indicated that there were 3,586,286 transit chains with 6,351,735 TCBs.

3.2. Duration-displacement Matrix

Of the 6.35 million TCB displacements that were identified, 1.7 million were 0 km in distance, indicating that riders’ boarding stations were also their final alighting stations. A graphic depiction of the duration-displacement matrix (Fig. 2) clearly reveals the relation between TCB displacements and durations. Fig. (2) shows that most transfers were made within a displacement distance of 2 km and durations were less than 15 minutes (breaks with displacements of 0 km were not depicted).

Fig. (2). Distribution of the TCB duration-displacement.

A total of 6.3 million displacements were less than 2 km. Consequently, we set a TCB displacement of 2 km as the transfer threshold distance. Fig. (3) indicates that 20 minutes was a reasonable transfer threshold time for identifying transfer activities.

Fig. (3). TCB duration count.

3.3. Identification of Transfers

Table 1 shows the 10 stations where the most transfers took place. Those stations were generally treated as major demand sources. However, not all transfer activities occurring at those stations should be considered to indicate a demand for public transport.

Table 1. Top 10 stations where transfers occurred in Beijing.
Station Name Transfer Count
Dongzhimen 16,788
Liuliqiao East 16,708
Xizhimen 15,599
Guomao 13,702
Sanyuanqiao 12,599
Dongzhimen hub 7,247
Yuquanlu 6,476
Jishuitan 6,438
Beijing West Station 6,148
Huoying 5,953

3.4. Validation of the Approach

Beijing was divided into 306 sub-district zones in China’s sixth national population census implemented in 2010. A traditional estimation method basing on transit OD pairs (Xcontrol) as well as a method entailing the use of TCBs (Xtest) were conducted. Calculated Pearson product-moment correlation coefficients of Pearsoncontrol and Pearsontest indicated that the transit chaining approach generated higher correlations between X (transport demand estimation) and Y (population distribution), especially in the most active zones (Table 2). The use of the transit chaining approach reduced the occurrence of false demand, resulting in the estimation being more objective in relation to the population (Fig. 4).

Table 2. Pearson correlations between the test and control groups.
Pearson(control) Pearson(test)
top 10 active zones 0.48 0.87
200 active zones 0.58 0.65
all 306 zones 0.69 0.72
Fig. (4). Non-chaining approach (a) and discernable improvement using the chaining approach (b).


The main contribution of this study lies in its elaboration of a TCB approach and its demonstration that this approach could improve the objectivity and reliability of estimations of transportation demands. A TCB duration-displacement matrix was developed and applied to identify transfer activities. One of the advantages of using this approach is that it yields a more reliable estimation of transportation needs. In traditional estimations, transfers are incorporated into transport needs. Consequently service shortage areas are less prominent and more difficult to identify. However, a limitation of this study was that it depended on data extracted from transit logs. Consequently, further studies are required to confirm the improvements before utilizing this method in transport policy making.


Information on TCB durations and displacements could be used to identify transfer activities. The findings of the study suggest that the application of the transit chaining method in estimations of the demand for public could yield more reliable and objective results compared with those obtained using transit OD pair-based estimations.


Not applicable.


The authors declare no conflict of interest, financial or otherwise.


The authors wish to thank Yangxue from the Beijing Transportation Information Center for providing us the data for the study. We thank Radhika Johari from Liwen Bianji, Edanz Editing China (, for editing the English text of a draft of this manuscript We would also like to thank the financial support from the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No.XDA19040402.


[1] Boyle DK, Foote PJ, Karash KH. Public Transportation Marketing and Fare Policy 2000.
[2] Chu K, Chapleau R. Augmenting transit trip characterization and travel behavior comprehension: Multiday location-stamped smart card transactions. Transp Res Rec 2010; (2183): 29-40.
[3] Munizaga MA, Palma C. Estimation of a disaggregate multimodal public transport Origin-Destination matrix from passive smart card data from Santiago, Chile. Transp Res, Part C Emerg Technol 2012; 24: 9-18.
[4] Trépanier M, Tranchant N, Chapleau R. Individual trip destination estimation in a transit smart card automated fare collection system. J Intell Transp Syst 2007; 11(1): 1-14.
[5] Pelletier M-P, Trépanier M, Morency C. Smart card data use in public transit: A literature review. Transp Res, Part C Emerg Technol 2011; 19(4): 557-68.
[6] Nishiuchi H, Todoroki T, Kishi Y. A fundamental study on evaluation of public transport transfer nodes by data envelop analysis approach using smart card data. Transp Res Procedia 2015; 6: 391-401.
[7] Seaborn C, Attanucci J, Wilson N. Analyzing multimodal public transport journeys in London with smart card fare payment data. Transp Res Rec 2009; (2121): 55-62.
[8] Alsger A, et al. Validating and improving public transport origin–destination estimation algorithm using smart card fare data. Transp Res, Part C Emerg Technol 2016; 68: 490-506.
[9] McKenzie B, Rapino M. Commuting in the United States: 2009 2011.
[10] Ma X, et al. Mining smart card data for transit riders’ travel patterns. Transp Res, Part C Emerg Technol 2013; 36: 1-12.
[11] Bagchi M, White P. What role for smart-card data from bus systems? Munic Eng 2004; 157(1): 39-46.
[12] Hofmann M, O’Mahony M. Transfer journey identification and analyses from electronic fare collection data in Intelligent Transportation Systems 2005.
[13] Zhao J, Rahbee A, Wilson NHM. Estimating a rail passenger trip origin-destination matrix using automatic data collection systems. Comput Aided Civ Infrastruct Eng 2007; 22(5): 376-87.
[14] Balcombe R, et al. The demand for public transport: A practical guide 2004.