Paper Title (use style: paper title)


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

DOI: 10.21307/ijanmc-2020-001                                                           1 

Research on House Price Prediction Based on Multi-

Dimensional Data Fusion 
 

Yang Yonghui 

School of Computer Science and Engineering  

Xi’an Technological University 

Xi’an, 710021, China 

E-mail: yangyh26@qq.com 

 
Abstract—The price of commercial housing is related to the 

process of urbanization in China and the living standard of 

residents, so the prediction of the price of commercial housing 

is very important. A major difficulty in predicting regression 

problems is how to handle different attribute types and fuse 

them. This paper proposes a house price prediction model 

based on multi-dimensional data fusion and a fully connected 

neural network. The model building steps are: First, normalize 

the data involved in the sample; then, interpolate the 

normalized data to increase the data density; subsequently, the 

normalized sample data is converted into a pixel matrix; 

finally, a fully connected neural network model is established 

from the pixel matrix to the price of the commercial house. 

After the neural network model has been established, the price 

of house can be obtained by entering the attributes of the house 

into the neural network model. 

Keywords-Multi-Dimensional Data Fusion; Fully Connected 

Neural Network Model; House Price Prediction 

I. INTRODUCTION 

Urbanization[1], also known as urbanization and 
urbanization, refers to the process of population 
gathering towards cities, the expansion of cities, and 
the series of economic and social changes that result 
from it. The essence is the changes in economic, social, 
and spatial structures. Modernization is the core 
proposition of China's modernization process and 
sustained economic growth. In recent years, with the 
further progress of China's urbanization process, more 
and more young people have begun to enter second-tier, 
third tier and even first-tier cities. A major factor 
affecting young people's entry into big cities is the 
price of local commercial housing. In other words, a 
major factor affecting China's urbanization process is 
the price of urban house. This shows that it is necessary 
to forecast house prices. The attributes that affect house 
prices are transaction date, house age, distance from the 
subway station, the number of convenience stores in 
the walking circle, the dimension of the house, and the 

longitude of the house. This paper will build a data 
fusion model. The input information of this model is 
the seven factors that affect house prices, and the 
output information is the price of commercial housing. 
After the data fusion model has been established, only 
the attributes that affect house prices are entered into 
the data fusion model, and the price of the commercial 
house can be obtained. 

A.  Research Background and Significance 

With the development of China's economy, people's 
living standards have gradually improved, and 
economic development has made people have a higher 
pursuit of living places. According to data from the 
National Bureau of Statistics[2]: from January to 
December 2018, the investment in real estate 
development nationwide was 12,266.4 billion yuan, an 
increase of 9.5% over the previous year, and the growth 
rate was 0.2 percentage points lower than the January-
November period, an increase from the same period of 
the previous year. 2.5 percentage points. Among them, 
residential investment was 8,529.2 billion yuan, an 
increase of 13.4%, a 0.2 percentage point drop from 
January to November, and an increase of 4 percentage 
points from the previous year. The proportion of 
residential investment in real estate development 
investment was 70.8%. With the increase in housing 
sales, housing prices have also increased. According to 
relevant data, China's housing prices have at least 
doubled from 2015. With the increase of house prices, 
people pay more attention to the prediction of house 
prices. This paper will build a data fusion model. The 
input information of this model is six attributes that 
affect house prices: transaction date, house age, 
distance from the subway station, the number of 
convenience stores in the walking circle, the dimension 
of the house, and the longitude of the house; the output 
is the price of the commercial house. After the data 

mailto:yangyh26@qq.com


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

2 

fusion model has been established, only the six 
attributes that affect house prices are entered into the 
data fusion model, and the price of the commercial 
house can be obtained. The research of house price 
prediction based on multi-dimensional data fusion can 
provide reference for China's house price prediction 
and further promote the development of urbanization in 
China. 

B.  Data sources 

The data in this paper comes from the Boston house 
price data provided by Kaggle, and the amount of data 
selected is relatively small. The data set contains 404 
training samples and 102 test samples, for a total of 506 
sample data. There are 6 attributes that affect house 
prices in house price forecasts. In the problem of house 
price prediction, the attributes that affect house prices 
are: transaction date  , house age  , distance from the 
subway station  , the number of convenience stores in 
the walking circle  , the dimension of the house  , 
and the longitude of the house  ; Dependent variable 
is house price . 

II. KEY TECHNOLOGY 

A.  Research methods for regression problems 

House price forecasting is a forecasting problem, 
and forecasting problems are regression analysis. This 
section aims to state the research methods of regression 
analysis. Regression analysis[3] is a method of 
statistically analyzing data. The purpose is to 
understand whether two or more variables are related, 
the direction, and strength of the correlation and 
establish a mathematical model to observe specific 
variables to predict the variables of interest to 
researchers. The roles in regression analysis are 
independent and dependent variables: the independent 
variable is a variable that actively changes, for example, 
several factors that affect house prices in this paper are 
independent variables; the dependent variable is a 
passively generated due to changes in independent 
variables, such as housing prices in this paper, are a 
dependent variable. Regression analysis can also be 
understood as a method for analyzing the relationship 
between independent and dependent variables. The 
regression analysis methods are linear regression, 
logistic regression, and polynomial stepwise regression. 

Linear regression is a linear equation established 
between the independent variable and the dependent 
variable. This is the most well-known regression model. 
In this type of model, the independent variable may be 
discrete or continuous; the dependent variable must be 
continuous, and the nature of linear regression is linear. 
Logistic regression is a logistic equation built from 

independent variables to dependent variables. This is a 
regression model used to calculate the success or 
failure of an event. In this type of model, the 
independent variable may be discrete or continuous; the 
dependent variable must be in the interval [   ] . 
Polynomial regression is a polynomial equation 
established between the independent variable and the 
dependent variable. This is a polynomial regression 
model commonly used in the field of deep learning. 
Under this model, a low polynomial degree leads to 
underfitting, and a high polynomial degree leads to 
overfitting. When dealing with multiple independent 
variables, stepwise regression is needed[4]. Standard 
stepwise regression does two things, adding or 
removing independent variables at each step. In this 
technique, the selection of independent variables is 
done by means of an automated process, which does 
not involve manual intervention. 

B.  Research methods for data fusion 

Data fusion[5] is a technology that fuses attribute 
values from different attributes. Fusion of multiple 
attributes will get better performance results than a 
single attribute. Data fusion is widely used in 
multidisciplinary and multi-scenario integration fields. 
For example, you can monitor the patient's 
physiological and psychological information through 
different hardware devices, and finally obtain the 
patient's physical condition through data fusion. There 
are many similar examples. There are also many 
difficulties in data fusion. The first is how to deal with 
different attributes, and the second is how to fuse the 
data. 

There are many difficulties in data fusion design. 
The first is how to handle different attribute types, and 
the second is how to fuse attributes. This thesis will 
detail the processing method of the attribute type in the 
"Handling of attribute types" Section and the data 
fusion method in the "Data Fusion" Section. 

C.  Handling of attribute types 

The attribute type refers to the data type of the 
attribute. The attribute types are: Large_Attributes, 
Small_Attributes, Intermediate_Attributes, and 
Interval_Attributes[6]. 

1) Large_Attributes 
The Large_Attributes are the larger the independent 

variable, the larger the dependent variable, that is, the 
independent variable will have a positive benefit on the 
dependent variable, in other words, there is a positive 
correlation between the dependent variable and the 
independent variable. The processing method for very 
large attributes is shown in (1). 


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

3 

 
 (1)

Among them,      is the maximum value of the 
attribute value;      is the minimum value of the 
attribute value;   is the original value of the attribute 
value;    is the normalized attribute value. 

2) Small_Attributes 
The Small_Attributes refers to: the larger the 

independent variable, the smaller the value of the 
dependent variable, that is, the independent variable 
will have a negative benefit on the dependent variable, 
in other words there is a negative correlation between 
the independent variable and the dependent variable. 
The processing method of extremely small attributes is 
shown in (2). 

    
 (2)

Among them,      is the maximum value of the 
attribute value;      is the minimum value of the 
attribute value;   is the original value of the attribute 
value;    is the normalized attribute value. After 
processing by the above method, the extremely small 
attributes have been transformed into extremely large 
attributes. 

3) Intermediate_Attributes 
Intermediate_Attributes refer to the existence of a 

threshold. When the independent variable is smaller 
than the threshold, it displays the characteristics of 
Large_Attributes. When the independent variable is 
larger than the threshold, it displays the characteristics 
of Small_Attributes. Specifically, when the 
independent variable is less than the threshold, there is 
a positive correlation between the independent variable 
and the dependent variable; when the independent 
variable is greater than the threshold, there is a negative 
correlation between the independent variable and the 
dependent variable. The processing method of 
Intermediate_Attributes is shown in (3). 

 {
   

 (3)

Among them,      is the maximum value of the 
attribute value;      is the minimum value of the 
attribute value;   is the original value of the attribute 
value;    is the normalized attribute value;    is the 
threshold. After processing by the above method, the 

interval attribute has been transformed into 
Large_Attributes. 

4) Enumerated_Attributes 
Enumerated_Attributes means that the attribute 

value of the independent variable does not have real 
measurement characteristics, and the result of the 
dependent variable will be affected by the value of the 
independent variable, but this influence relationship is 
difficult to express. The processing method of 
Enumerated_Attributes is as follows: 

Step1: List all the values of the input attributes; 

Suppose the input attribute contains   attribute 
values:   、  、…、  ; 

Step2: Convert the attribute value to One-Hot 
[7]

 
form; 

Among them,    is the     attribute value, so a 
vector with only the     position being 1 can be used 
instead. That is,    can be expressed as: 
(        )   

 ； 

Among them,    is the     attribute value, so a 
vector with only the     position being 1 can be used 
instead. That is,    can be expressed as: 
(        )   

 ；…… 

Among them,    is the     attribute value, so a 
vector with only the     position being 1 can be used 
instead. That is,    can be expressed as: 
(        )   

 ； 

So far, all values of the attribute have been 
expressed as One-Hot form. 

D.  Data Fusion 

This section analyzes the problem of data fusion, 
that is, how to merge Large_Attributes, 
Small_Attributes, Interval_Attributes, and 
Enumerated_Attributes together. This thesis will 
propose a pixel-based data fusion method: first 
establish a pixel matrix; then use a fully connected 
neural network model to process the pixel matrix. 

1) Create a pixel matrix 
This section aims to transform multiple attributes 

into a pixel arrangement. Specifically, it is assumed 
that the sample contains   samples and each sample 
contain   attributes, that is, 

All values for the     sample are:    ,    , …, 
   , …,    , …,    ; 

All values for the     sample are:    ,    , …, 
     …,    , …,    ; 


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

4 

…… 

All values for the     sample are:        , …, 
     …,      …,    . 

Then, the     pixel matrix 
is:(                  ) ;  

and the     pixel matrix 
is:(                  ) ; 

……  

and the     pixel matrix 
is:(                  ) . 

2) Processing pixel matrix 
In "Create a pixel matrix", this article has already 

established the number of pixel matrices as the number 
of samples, and then we need to use the neural network 
to process the pixel matrix. 

The choice of network structure: there are many 
neural network model structures, such as fully 
connected layer neural networks, convolutional neural 
networks, long-short-term memory networks, and 
Residual network. Because the application scenario in 
this paper is simple, it is more appropriate to choose a 
fully connected neural network model. 

Selection of activation function: The activation 
function is a function that runs on the neuron and is 
responsible for mapping the input of the neuron to the 
output. The activation functions are:         function 
(Figure 1        ),     function (Figure 2     ), 
     function (Figure 3     ),            function 
(Figure 4           ), where            is a 
special form of     . Regarding the selection 
principle of the activation function, Andrew Ng gives 
the following reference scheme in “Neural Networks 
and Deep Learning”:      is very common in machine 
learning. The activation function is generally defaulted 
to     .            is generally better than     , 
but the scope of use of      Wider; the activation 
function used in the output layer of the binary 
classification problem is        , and         was 
rarely used in other cases;      is almost always better 
than        .      and         have a 
disadvantage that when the independent variable is 
large, the slope is small. The gradient descent method 
is limited; except for the output layer, linear activation 
functions are rarely used; neural network models use 
activation functions, which will lead to the final result 
being a linear combination of input feature vectors. 

 
Figure 1. Sigmoid 

 
Figure 2. Tanh 

 
Figure 3. ReLU 

 
Figure 4. Leaky ReLU 

III. NORMALIZATION OF ATTRIBUTES 

This part needs to normalize the attributes involved 
in the data set: first analyze the data type of the 
attributes by "Attribute Analysis"; then normalize the 
attributes by "Normalization". 


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

5 

A.  Attribute Analysis 

As mentioned in "Data Sources", the data in this 
paper is derived from Boston house price data provided 
by Kaggle, and the amount of data selected this time is 
relatively small. The data set contains 404 training 
samples and 102 test samples, for a total of 506 sample 
data. In the problem of house price prediction, there are 
6 attributes that affect house prices: transaction date   ; 
house age   ; distance from the subway station   ; the 
number of convenience stores in the walking circle   ; 
the dimension of the house   ; the longitude of the 
house   ;. dependent variable: house price  . 

Transaction date    is a time variable; the house age 
   is a Small_Attributes; distance from the subway 
station    is a Small_Attributes; the number of 
convenience stores in the walking circle    is a 
Large_Attributes; the dimension of the house    and 
the longitude of the house    are an 
Enumerated_Attributes. 

B.  Normalization 

Transaction date    is a time variable; the house age 
   is a Small_Attributes; distance from the subway 
station    is a Small_Attributes; the number of 
convenience stores in the walking circle    is a 
Large_Attributes; the dimension of the house    and 
the longitude of the house    are an 
Enumerated_Attributes. 

IV. DATA FUSION 

In this part, the normalized data in "Normalization 
of attributes" needs to be fused: first, the pixel matrix is 
established by "Building a Pixel Matrix"; then the fully 
connected neural network model is established by 
"Building a Neural Network Model". 

A.  Building a Pixel Matrix 

A pixel matrix can be established by "Data Fusion". 
As described in "Data sources", the data in this paper is 
derived from Boston house price data provided by 
Kaggle. The amount of data selected is small. The data 
set contains 404 training samples and 102 test samples, 
for a total of 506 sample data. Then there are: 

All values for the     sample:       ,…,    ; 

All values for the     sample:   ,    , …,    ; 

…… 

All values for the       sample:             , …, 
      . 

B.  Building a Neural Network Model 

The paper will eventually build a neural network 
model of house attributes to house prices: where the 
input attributes are house attributes: transaction date   ; 
house age   ; distance from the subway station   ; the 
number of convenience stores in the walking circle   ; 
the dimension of the house   ; the longitude of the 
house   ;output information is house price  . 

Step1: Design the network structure 
Through the analysis of "Data Fusion", this paper 

will build a fully connected neural network model. The 
network model structure is shown in (Figure 5 Network 
structure): The input layer of the network structure 
contains 7 input nodes; the network structure contains 5 
hidden layers, each of which contains 4 nodes; the 
output layer of the network structure contains 1 output 
node; all activation functions use     function; 
Training period: 50000; Target accuracy is:     ;  
Learning rate: 0.01 

 
Figure 5. Network structure 

Step2: Selection of training tools 
There are many ways to train neural networks, such 

as Tensorflow, Caffe, MXNet, Torch, Theano in 
python, and nntool in Matlab. nntool is a network 
model training tool that is easy to deploy and simple in 
the environment. In this paper, the neural network 
model shown in (Figure 5 Network structure) is trained 
by nntool (Figure 6 nntool). 

 
Figure 6. Nntool 


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

6 

 
Step3: Code design 
See Appendix 

Step4: Training process 
In the process of neural network training using 

Matlab, part of the training process is shown in 
(Figure7 Training process). Among them, Performance 
is shown in (Figure 8 Performance); Training State is 
shown in (Figure9 Training State); Regression is 
shown in (Figure10 Regression). 

 
Figure 7. Training process 

 
Figure 8. Performance 

 
Figure 9. Training State 

 
Figure 10. Regression 

Step5: Results 
The results of the neural network model include two 

parts: one is the partial result display, as shown in 
(Figure11 Result); the other is the error proportion 
chart, as shown in (Figure12 error_raph). As can be 
seen from the (Figure10 Regression), the accuracy of 
the network model is 97.87%. 

 
International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

7 

 
Figure 11. Result 

 
Figure 12. Error_raph 

V. SUMMARY 

This paper finally established a neural network 
model from house attributes to house prices: where the 
input attributes are commodity house attributes: 
transaction date   ; house age   ; distance from the 
subway station   ; the number of convenience stores in 
the walking circle   ; the dimension of the house   ; 
the longitude of the house   ;output information is 
house price  .After the neural network model has been 
established, Enter the six attributes of the commercial 
house into this neural network model, and you can get 
the corresponding house price. The accuracy of the 
network model is 97.87%. 

VI. APPENDIX 

[pn,minp,maxp,tn,mint,maxt]=premnmx(p,t);    

NodeNum1 =4; 

NodeNum2=4;    

NodeNum3=4;    

NodeNum4=4;    

NodeNum5=4;    

TypeNum = 1;      

TF1 = 'tansig'; 

TF2 = 'tansig';  

TF3 = 'tansig'; 

TF4 = 'tansig'; 

TF5 = 'tansig'; 

TF6 = 'tansig'; 

net=newff(minmax(pn),[NodeNum1,NodeNum2,N
odeNum3,NodeNum4,NodeNum5,TypeNum],{TF1 
TF2 TF3 TF4 TF5 TF6},'traingdx'); 

%traingdm  

net.trainParam.show=50;  

net.trainParam.epochs=50000;  

net.trainParam.goal=1e-5; 

net.trainParam.lr=0.01;          

net=train(net,pn,tn);  

p2n=tramnmx(ptest,minp,maxp);  

an=sim(net,p2n); 

[a]=postmnmx(an,mint,maxt)    

plot(1:length(t),t,'o',1:length(t)+1,a,'+');  

title('o:predictive_value--- *:actual_value') 

grid on 

m=length(a);     

t1=[t,a(m)]; 

error=t1-a;     

figure 

plot(1:length(error),error,'-.')  

title('error_graph')  

grid on   


International Journal of Advanced Network, Monitoring and Controls      Volume 05, No.01, 2020 

8 

REFERENCES 

[1] Lee W C, Cheong T S, Wu Y. The Impacts of Financial Development, 
Urbanization, and Globalization on Income Inequality: A Regression-
Based Decomposition Approach [J]. SSRN Electronic Journal, 2017. 

[2] Tan Paul. House prices have been stagnant [J]. Journalist observation, 
2019 (4). 

[3] Gogtay N J, Deshpande S P, Thatte U M. Principles of Regression 
Analysis [J]. The Journal of the Association of Physicians of India, 
2017, 65(4):48-52. 

[4] Gooch J W. Stepwise Regression [J]. Encyclopedic Dictionary of 
Polymers, 2011. 

[5] Bleiholder J, Naumann F. Data fusion [J]. ACM Computing Surveys, 
2008, 41(1):1-41. 

[6] Han Zhonggeng. Mathematical model for comprehensive evaluation 
and prediction of Yangtze River water quality [J]. Journal of 
Engineering Mathematics (7): 69-79. 

[7] Shuntaro Okada, Masayuki Ohzeki, Shinichiro Taguchi. Efficient 
partition of integer optimization problems with one-hot encoding[J]. 
Scientific Reports, 2019, 9(1). 

[8] Wang Zhaoqing, Lu Xiaoyang. A Macro Element Method for Solving 
Potential Problems Based on Mean Value Interpolation [J]. Journal on 
Numerical Methods and Computer Applications (3): 21-29. 

[9] Hershey S, Chaudhuri S, Ellis D P W, et al. CNN architectures for 
large-scale audio classification[C]// 2017.