上海交通大学学报(自然版) ›› 2013, Vol. 47 ›› Issue (08): 1246-1250.

• 自动化技术、计算机技术 • 上一篇    下一篇

基于关系马尔可夫模型的枚举型缺失值估计

陈爽1,2,3,宋金玉1,刁兴春1,2,曹建军2   

  1. (1. 解放军理工大学 指挥信息系统学院,南京 210007; 2. 总参第六十三研究所,南京 210007;3. 吉林陆军预备役步兵第47师,吉林 132000)
     
  • 收稿日期:2012-10-22 出版日期:2013-08-29 发布日期:2013-08-29
  • 基金资助:

    中国博士后科学基金特别资助项目(201003797),江苏省博士后科研资助计划项目(0901014B),解放军理工大学预研基金项目(20110604)

Estimation of Enumerative Missing Values Based on Relational Markov Model

CHEN Shuang1,2,3,SONG Jinyu1,DIAO Xingchun1,2,CAO Jianjun2
  

  1. (1. Institute of Command Information System, PLA University of Science and Technology, Nanjing 210007, China; 2. The 63rd Research Institute, PLA General Staff Headquarters, Nanjing 210007, China; 3. The 47th Division, Jilin Army Reservist Infantry, Jilin 132000, China)
  • Received:2012-10-22 Online:2013-08-29 Published:2013-08-29

摘要:

针对数据质量中数据缺失问题,提出了基于关系马尔可夫模型(RMM)的枚举型缺失值估计方法.该方法充分考虑属性间的关联性,将动态属性选择(DAS)方法与RMM结合,最大限度地利用完整数据的信息,提高了该方法的估计能力;利用RMM计算源状态到目的状态的转移概率,采用MaxPost和ProProp 2种缺失值填充方法,对缺失值进行填充.采用公认数据集,进行了对比实验,验证了该方法的有效性和优越性.
 
 

关键词: 数据缺失, 关系马尔可夫模型, 动态属性选择, 填充方法

Abstract:

Aimed at the data missing problem in data quality, a relational Markov model (RMM) based approach was proposed, which combined RMM and the dynamic attribute selection (DAS) method to estimate missing values, taking into  full account the relations between attributes and making maximum use of available information in complete cases to improve the estimation performance of missing values. This approach utilized the relational Markov model to compute the transition probabilities from source to target state, and fills in missing values using the maximum posterior probability (MaxPost) and probability proportional (ProProp) methods. Comparative experiments on well-known datasets verify the effectiveness and advantage of this approach.
 

Key words: data missing, relational Markov model(RMM), dynamic attribute selection(DAS), imputation method

中图分类号: