Abstract :
[en] In urban and transportation research, important information is often scattered over a wide variety of independent datasets which vary in terms of described variables and sampling rates. As activity-travel behavior of people depends particularly on socio-demographics and transport/urban-related variables, there is an increasing need for advanced methods to merge information provided by multiple urban/transport household surveys. In this paper, we propose a hierarchical algorithm based on a Hidden Markov Model (HMM) and an Iterative Proportional Fitting (IPF) procedure to obtain quasi-perfect marginal distributions and accurate multi-variate joint distributions. The model allows for the combination of an unlimited number of datasets. The model is validated on the basis of a synthetic dataset with 1,000,000 observations and 8 categorical variables. The results reveal that the hierarchical model is particularly robust as the deviation between the simulated and observed multivariate joint distributions is extremely small and constant, regardless of the sampling rates and the composition of the datasets in terms of variables included in those datasets. Besides, the presented methodological framework allows for an intelligent merging of multiple data sources. Furthermore, heterogeneity is smoothly incorporated into micro-samples with small sampling rates subjected to potential sampling bias. These aspects are handled simultaneously to build a generalized probabilistic structure from which new observations can be inferred. A major impact in term of expert systems is that the outputs of the hierarchical model (HM) model serve as a basis for a qualitative and quantitative analyses of integrated datasets.
Scopus citations®
without self-citations
20