Untitled


May 21, 2010 21:20 chaos˙v2

Fuzzy weather forecast in forecasting pollution concentrations

D. Domańska
Department of Modelling and Computer Graphics, Institute of Informatics, University of Silesia,

Bȩdzińska 39
Sosnowiec, 41-200, Poland
ddomanska@poczta.onet.pl

M. Wojtylak
Institute of Meteorology and Water Management (IMGW), Bratków 10

Katowice, 40-045, Poland
monitoring.katowice@imgw.pl

In this paper we want to analyse fuzzy weather forecasts, which are computed in our system
and used to forecast pollution concentrations. The system works on real data: weather forecasts,
meteorological situations and pollution concentrations. We compare defuzzification of the fuzzy
weather forecast with weather forecast from Institute of Meteorology and Water Management.
This comprehensive analysis allows us to investigate the effectiveness of forecasting pollution
concentrations, putting the dependence between particular attributes describing the weather
forecast in order and proving the legitimacy of the applicable fuzzy numbers in air pollution
forecasting.
Model is created for data, which is measured and forecast in Poland. By reason of this data our
model is tested in real sets of data and effects are received in active system.

Keywords: Fuzzy system models, Fuzzy numbers, Fuzzy matrix, Fuzzy weather forecast, Air
pollution forecasting

1. Introduction

Weather forecasting is increasingly perfected in subsequent centuries. In recent years many prediction ap-
proaches, such as statistical [1], fuzzy [2; 3], neural networks [4; 5], neuro-fuzzy predictor [6] have emerged.
Using numerical short-term weather prediction, research into the forecasting of air pollution concentrations
began [7; 8]. This task is very difficult because apart from the information about meteorological conditions,
the emission of air pollution depends first of all on the immission. Fuzzy logic [9; 10; 11] is an established
methodology that is widely used in model systems in which variables are continuous, imprecise, or ambigu-
ous. Use of this method is known in many mathematical forecasting models. It is usually used when the
information transferred to the model is imprecise or incomplete [12; 13]. Many everyday phenomena of an
ambiguous, continuous and imprecise nature may be effectively described using this theory.

The main problem is with the knowledge. We do not have precise knowledge about the weather in the
future. We only have numerical forecasting, i.e. conditions which may announce many similar meteorological
situations.

The result of a working Air Pollution Forecasting Model (APFM) is a forecast of air pollution concen-
tration, among others PM10 for the next day. It is a specially chosen pollution because PM10 has a huge
influence on human life [14; 15].

In each stage we use meteorological data with a mathematical apparatus [16; 17]. In particular in

1


May 21, 2010 21:20 chaos˙v2

2

APFM we use the weather forecasts derived from the Consortium for Small Scale Modelling (COSMO)
model based on the Local-Model (LM) of Deutscher Wetter Dienst (DWD).

Objects appearing in the paper are vectors and matrices. In vector space Rd for vectors we use (1) as
the distance between objects.

ddk(x, y) =

(

d
∑

i=1

|xi − yi|k
)

1

k

, x, y ∈ Rd, k > 0 . (1)

For k > 1, k ∈ N the function (1) is metric.
The distance between matrix objects is composition of vector objects. For terms distance between

matrices we use (2).

d
n×m
k1k2

(A, B) = dnk1 ([d
m
k2

(ai·, bi·)], 0n), for A = [aij ], B = [bij ], A, B ∈ Rn×m , (2)
where 0n is a zero vector and ai·, bi· are i-th rows in matrices A and B.

In the first instance we introduce a term, time horizon set T , in which the forecast will be calculated.
T = {t = i · ∆t : i = 0, . . . , nT }, ∆t > 0, where ∆t means a time step (usually ∆t = 1 hour).
We will identify the term from set T with 0 hour UTC. We assume that for each term from T we have
values of df parameters of a numerical weather forecast (e.g. temperature, sea level pressure, wind direction
and speed, cloud cover — high, medium, low). For the term weather forecast we will understand a matrix
F ∈ R(nT +1)×df . Moreover, we assume that we possess the data from days from many years in every term
t ∈ T . Every term t describes the state of the atmosphere with the aid of the ds parameters measured near
the surface (e.g. temperature, wind direction) and the value of concentrations whose size we are forecasting.
The set of meteorological data for each subsequent term t ∈ T defines the meteorological situation. The
meteorological situation will be represented by a matrix S ∈ R(nT +1)×ds . The aerosanitary situation is the
number of sequences of concentrations in t ∈ T terms, so it is a time series belonging to RnT +1. In order
for the model to function properly it is essential to have all the historical data. Let us denote the set of
weather forecasts as W F , the set of meteorological situations as M S, the set of pollution concentrations
as AS.

In the first stage, because of the huge data range, we start from min − max normalisation for every
weather forecast in every column separately. Let us define:
f ∗ ∈ W F — a chosen weather forecast for which we are calculating the forecast of pollution concentrations.
k = k1 = k2 — first parameter used to control APFM system, it decides about dispersion between elements
from set W F . Θ — a set of real numbers representing distances between every normalised weather forecast

from W F and f ∗ so Θ = {ω1, . . . , ωq}, q = |W F | − 1, where ωi = d
(nT +1)×df
k

(fi, f
∗) and fi ∈ W F . |A|

means cardinality of set A.
Fractional distance — distance (1) for k1, k2 ∈ (0, 1).

In the next step we define second parameter ε. Parameter ε decides about the cardinality of similar
elements from set W F . ε is decided in (3).

∀i=1,...,q ωi < ε, (3)
where ωi ∈ Θ for i ∈ {1, . . . q}, i is the number of an element. The result of the first stage is set ε–W F (f ∗).

In the second stage, in connection with results from the first stage, we create subset ε–M SF ⊂ M S.
Every weather forecast is related with meteorological situation with date. Therefore for ε–M SF we consider
pairs (f, s), f ∈ W F, s ∈ M S. Then, we set parameters describing the meteorological situations and the
time horizon. After review of the chosen meteorological situations, we get a sequence of values:

∀t∈T ∀i=1,...,ds (ξ
(1)
t,i

, . . . , ξ
(m)
t,i

),

where

m = |ε–M SF | . (4)
We modify this sequence into a fuzzy number using a special form of the fuzzy number given by (5). For
each attribute i and in each hour t ∈ T we have individual fuzzy number. This fuzzy number is approximate


May 21, 2010 21:20 chaos˙v2

3

to the Gaussian function. The fuzzy number (5) was chosen based on our own calculations and based on
paper [18].

µi,t(x) =



















exp(
−(x−m1i,t )

2

2·σ2
1i,t

) if x ≤ m1i,t ,
1 if x ∈ (m1i,t , m2i,t ) ,
exp(

−(x−m2i,t )
2

2·σ2
2i,t

) if x ≥ m2i,t ,
(5)

where m1i,t ≤ m2i,t , σ1i,t > 0, σ2i,t > 0 for m1i,t , m2i,t , σ1i,t , σ2i,t ∈ R, for each attribute i in each hour
t ∈ T . An individual fuzzy weather forecast consists of a time series (5). We receive fuzzy weather forecast
(6) — equivalent to the real weather.

φ∗ = [µi,t] for 1 6 i 6 ds, t ∈ T (6)
where i is a number of attribute and t is an hour. φ∗ is a function for which we determine membership
matrix composed from fuzzy numbers. φ∗it : R

ds → [0, 1]. In φ∗ we can take property values. We receive
φ∗(s) = [µit(sit)] which is membership matrix, s ∈ R(nT +1)×ds . A time series is a sequence of the regularly
sampled quantities of an observed system.
Fuzzy weather forecast needs to be able to clearly and precisely define the quality of a weather forecast
and assign meteorological situation relative to a fuzzy weather forecast [13]. The assignment of a coefficient
quality of a weather forecast is the point of entrance for the exact determination of the individual influence
of an attribute on forecasting pollution concentrations in the future.

In the third stage we review all meteorological situation s ∈ M S ⊂ R(nT +1)×ds . Then, for every
meteorological situation s we calculate φ∗(s) and number ̺(s) = |φ∗(s)| using formula (7).

|φ∗(s)| = dnT +1
k

([dds
k

(φ∗·t(s·t), 1ds )]t∈T , 0nT +1), s ∈ M S, k ∈ [0, 1] , (7)
Let us fix η ∈ [0, 1] and determine a set η–M S ⊂ M S that |η–M S| = r, where ̺(s) < η for s ∈ η–M S.
For subset η–M S we consider pairs (s, p), p ∈ AS. Then we fix the weight of the meteorological situations
using following formula w(s) = 1 − ̺(s), s ∈ η–M S. In the fourth stage we choose r time series from set
AS, where r = |η–M S| > 1. Afterwards, for every chosen time series we get a function p(j) : T → R+0 , j =
1, . . . , r with weight w(j) ∈ R, representing pollution concentrations. For each t ∈ T we create a sequence
(p(1)(t), . . . , p(r)(t)).

Then we take these sequences and we carry out an aggregation process to obtain one time series. We
have used for example method αβ-aggregation (9). This and another the methods are described in details
in paper [17]. We base these methods on the well-known methods: (8).

∀t∈T ua,t =
∑r

i=1 w
(i)p(i)(t)

∑r
i=1 w

(i)
, ∀t∈T um,t = max

i=1,...,r
{p(i)(t)} . (8)

where a means average aggregation, m means maximum aggregation.
Let us denote for each t ∈ T the following time series ua,t and um,t as a time series received from

methods (8) and ur,t as a time series received from the actual researched data. For l ≈ nT4 we determine
two numbers based on knowledge about actual aerosanitary situation. We forecast pollution concentrations
having partial knowledge, that is real number l of pollution concentrations. We execute some calculations
on time series and we get parameters α, β from the second function (9). When we determine the optimal
value of the parameters α, β for (9) we receive formulas (10), (11).

h(α, β) =

l
∑

t=0

(αua,t + βum,t − ur,t)2 . (9)

where

α =

∑l
t=0 ua,tur,t

∑l
t=0 u

2
m,t −

∑l
t=0 um,tur,t

∑l
t=0 ua,tum,t

∑l
t=0 u

2
m,t

∑l
t=0 u

2
a,t − (

∑l
t=0 um,tua,t)

2
, (10)


May 21, 2010 21:20 chaos˙v2

4

(a) (b)

Fig. 1. Fuzzy weather forecast for wind speed attribute on a) 9 January 2006 b) 10 January 2006.

(a) (b)

Fig. 2. Fuzzy weather forecast for temperature attribute on a) 9 January 2006 b) 10 January 2006.

if
∑l

t=0 u
2
m,t

∑l
t=0 u

2
a,t − (

∑l
t=0 um,tua,t)

2 6= 0.

β =

∑l
t=0 um,tur,t

∑l
t=0 u

2
a,t −

∑l
t=0 um,tua,t

∑l
t=0 ua,tur,t

∑l
t=0 u

2
m,t

∑l
t=0 u

2
a,t − (

∑l
t=0 um,tua,t)

2
, (11)

if
∑l

t=0 u
2
m,t

∑l
t=0 u

2
a,t − (

∑l
t=0 um,tua,t)

2 6= 0.
From (9) we receive parameters α, β given by (10) and (11).
Then using method (9) and having knowledge about collateral information i.e. first ten hours real

pollution concentrations in the day we being forecast, we can calculate for each t ∈ T the final time series
uf,t.

2. Characteristics of fuzzy weather forecast

A fuzzy weather forecast φ∗ is determined for each attribute i individually and is evenly distributed on each
hour t ∈ T . It is valued on the basis of data similarity and proper weights of classification. We researched
the behaviour of the fuzzy weather forecasts using different sets of forecast data. This is necessary because
we have weather forecasts from a short period of time (only six years). Therefore, continuous work in a
COSMO LM model weather forecast [19], [20] is not heterogeneous for finding the period of a weather
forecast which is the best estimate of real meteorological situations. In Figs 1, 2, 3 fuzzy weather forecasts
are shown along with real meteorological situations. The fuzziness is a good measure with which to mark
the quality of a weather forecast both its elements and the whole weather forecast because fuzziness
characterises the scattering of real data around the prognosis.

3. Features to estimate the quality of a fuzzy weather forecast

The first feature is research volume for all attributes i in each hour t ∈ T . We receive the first number F i,t

F i,t =

∞
∫

−∞

µi,t(x)dx =

√
2Π

2
(σ1i,t + σ2i,t ) + m2i,t − m1i,t , (12)

where i ∈ {1, . . . , ds}, t ∈ {0, . . . , nT }, σ1i,t , σ2i,t , m1i,t , m2i,t ∈ R(nT +1)×ds .
Let us define fi(t) = F

i,t for each t ∈ T . In Figs 4, 5, 6 we see the functions fi(t) for all t ∈ T and for the
chosen attributes i are shown using real meteorological situations: wind speed, temperature and humidity.


May 21, 2010 21:20 chaos˙v2

5

(a) (b)

Fig. 3. Fuzzy weather forecast for humidity attribute on a) 9 January 2006 b) 10 January 2006.

(a) (b)

Fig. 4. fi for wind speed attribute on a) 9 January 2006 b) 10 January 2006.

(a) (b)

Fig. 5. fi for temperature on a) 9 January 2006 b) 10 January 2006.

In Fig. 7 fuzzy weather forecast is shown for all attributes.
The second feature is researching the quality of a fuzzy weather forecast by comparing it to a real

meteorological situation. In this way we keep an attribute characterised by a grade of membership for each
hour. In Figs 8, 9 grades of membership for wind speed, temperature and humidity are shown.

In Figs 1, 2, 3 the fuzzy weather forecasts are clearly and explicitly shown and it can be seen that
the fuzzy weather forecast has a little fuzziness when the grade of membership is large. By analysing
other examples, a fairly significant dependence between small fuzziness and membership can be observed,
a reverse correlation between them, because if the fuzziness is greater then the checkness of the pollution
concentrations forecasting is minor.

(a) (b)

Fig. 6. fi for humidity attribute on a) 9 January 2006 b) 10 January 2006.


May 21, 2010 21:20 chaos˙v2

6

(a) (b)

Fig. 7. fi for all attributes on a) 9 January 2006 b) 10 January 2006.

(a) (b)

Fig. 8. The grade of membership µi,t for wind speed attribute and T = 72 on a) 9 January 2006 b) 10 January 2006.

(a) (b)

Fig. 9. The grade of membership µi,t for a) temperature attribute b) humidity attribute and T = 72 on 9 January 2006.

4. Verifiability of the weather forecast

Taking into account that the key to the pollution concentration forecasting is reliable information coming
from the weather forecast we need to separate the forecasts which we think that are fulfiled from the
forecasts that have not fulfil. To make this we will use fuzzy weather forecast, weather forecasts and
meteorological situations. Let us introduce following defuzzyfication method for µi,t(x):

φi,t =

∞
∫

−∞
µi,t(x)· x dx

∞
∫

−∞
µi,t(x) dx

=

1
2
(m22i,t − m

2
1i,t

) + σ22i,t − σ
2
1i,t

+
√

2π
2

(m1i,t σ1i,t + m2i,t σ2i,t )

F i,t
(13)

Fuzzy weather forecast fi,t is defuzzyficated according to equation (13), we obtain a real value for every
hour t ∈ T and for every attribute i. The verification of the weather forecast is based on similarity of the
data:

(1) value of the attributes describing the weather forecast to the values of the meteorological situations
attributes

(2) φi,t to the value of the attributes describing the weather forecast
(3) φi,t to the value of the meteorological situations attributes


May 21, 2010 21:20 chaos˙v2

REFERENCES 7

Table 1. Results of the tests for the temperature
attribute.

Date Mfi,si Mφi,t,fi Mφi,t,si
1 January 2006 0.363 0.562 0.082
5 January 2006 0.525 0.424 0.125

Table 2. Results of the tests for the wind speed at-
tribute.

Date Mfi,si Mφi,t,fi Mφi,t,si
1 January 2006 0.069 0.076 0.018
5 January 2006 0.133 0.172 0.046
10 January 2006 0.053 0.050 0.066

We use Mean Absolute Error [22] to estimate the verifiability of the weather forecast. For each attribute
i the Mean Absolute Error between fuzzy weather forecast and weather forecasts is given by:

Mφi,t,fi =
1

nT + 1

nT
∑

t=0

|φi,t − fi| (14)

where f ∈ W F and |x| is the absolute value of x.
Analogous is with the differences between meteorological situations and fuzzy weather forecast. We

denote the Mean Absolute Error in this case as Mφi,t,si , s ∈ M S and for the differences between meteoro-
logical situations and weather forecasts we use the notation Mfi,si , s ∈ M S, f ∈ W F for the Mean Absolute
Error.

In Tabs. 1, 2 we have the obtained average differences between the weather forecast attributes and real
situations, weather forecast and φi,t, real situations and φi,t.

At first we can see in Tab 1 that for greater fuzzines differences between weather forecasts and me-
teorological situations, between fuzzy weather forecast and weather forecast are greater than between
meteorological situations and fuzzy weather forecast. At second we see possibility to improve the weather
forecast. Analogous is with the wind speed attribute in Tab 2, however small fuziness give us minor possi-
bility to improve the weather forecast.

5. Conclusion

Computations were performed for weather forecasts in 2003-2007, meteorological situations in 1997-2007
and pollution concentrations in 1998-2007 with ∆t = 1. We have df = 28 attributes describing weather
forecasts, while the number of meteorological situations was equal to ds = 9. Attributes describing meteo-
rological situations were chosen based on investigations by [21]. The effect of the suggested method for the
prediction of a weather forecast was introduced for data from COSMO LM model, but the same method
can be used for different weather forecasts based on numerical models. The condition which has to be met
is to have real meteorological data.

In the paper we have introduced the analysis of the fuzzy weather forecast, with specification of the
method which allows us to improve the calculated weather forecast.

In Figs. 1, 2, 3 is shown fuzzy weather forecast and in Figs 8, 9 particular grades of membership.
The performed experiments have shown that if the fuzzines is smaller than the membership to the real
situations is bigger.

In Tabs 1, 2 we see that for φi,t and real situation we have the smallest difference between obtained
attributes, what allows us to improve the quality of the weather forecast.

References

[1] G. Rigatosa, Q. Zhangb, Fuzzy model validation using the local statistical approach, Fuzzy Sets and
Systems 160(7) (2009) 882-904.


May 21, 2010 21:20 chaos˙v2

8 REFERENCES

[2] C. Lee, A. Liu and W. Chen, Pattern Discovery of Fuzzy Time Series for Financial Prediction, IEEE
Trans. Knowl. Data Eng 18(5) (2006) 613-625.
[3] H. Kunhuang, Heuristic Models of Fuzzy Time Series for Forecasting, Fuzzy Sets and Systems 123(3)
(2001) 369-386.
[4] W. Jiang and P. Wang, Research on Interval Prediction of Nonlinear Chaotic Time Series Based on
New Neural Networks, in: Proc. 6th World Congress Intell. Control and Automation, 2006, pp. 2835-2839.
[5] Ajit Kumar Gautam, A.B. Chelani, V.K. Jain, S. Devotta, A new scheme to predict chaotic time series
of air pollutant concentrations using artificial neural network and nearest neighbor searching, Atmospheric
Environ. 42(18) (2008) 4409-4417.
[6] M.J.L. Aznarte, J. Manuel Beńıtez Sánchez, D. Nieto Lugilde, C. de Linares Fernández, C. Dı́az de la
Guardia, and F. Alba Sánchez, Forecasting Airborne Pollen Concentration Time Series with Neural and
Neuro-Fuzzy Models, Expert Systems with Applications 32(4) (2007) 1218-1225.
[7] L. Ośródka, E. Krajny, M. Wojtylak, The use of numerical weather forecast for air pollution forecast-
ing in an urban industrial agglomeration, in: Proc. 4th Annual Meeting EMS and 5th EC on Applied
Climatology, 2004, EMS Vol. 1.
[8] L. Ośródka, M. Wojtylak, E. Krajny, K. Rorbek, Improvement of the high pollution concentrations fore-
casting methods in urban-industrial agglomerations with the help of numerical models of meteorological
forecasts, Wiad. IMGW 27(1) (2004) 105-116. [in Polish]
[9] L.A.Zadeh, Fuzzy Sets, Inf. and Control 8 (1965) 338-353.
[10] G. Klir, T. Folger, Fuzzy Sets, Uncertainty and Information, Prentice Hall PTR, Englewood Cliffs,
NJ, 1988.
[11] H.J. Zimmerman, Fuzzy Set Theory and its Applications, second ed., Kluwer, Dordrecht, 1991.
[12] W. Silvert, Ecological impact classification with fuzzy sets, Ecol. Model. 96 (1997) 1-10.
[13] B.K.Hansen, D.Riordan, Weather Prediction Using Case-Based Reasoning and Fuzzy Set Theory,
in: Proc. Workshop on Soft Computing in Case-Based Reasoning, International Conf. on Case-Based
Reasoning, Canada, 2001, pp.175-178.
[14] M. Kowalska, L. Hubicki, E.J. Zejda, L. Ośródka, E. Krajny, M. Wojtylak, Effect of ambient air
pollution on daily mortality in Katowice Conurbation, Poland, Polish J. Environ. Stud. (2007) 227-232.
[15] M. Kowalska, J.E. Zejda, L. Ośródka, K. Klejnowski, E. Krajny, M. Wojtylak, Relationship between
ambient air pollution and daily mortality in the Urban Area of Katowice-comparison on two periods
1994-1995 and 2001-2002, in: Proc. Public Conf. ”Particles and Health-State of the Research and Policy
Implications”, (9), 2007.
[16] D. Domańska, M. Wojtylak, Development data serving to forecasting pollution concentrations, Deci-
sion Support Systems, Katowice, 2009, pp. 351-359. [in Polish]
[17] D. Domańska, M. Wojtylak, Selection criteria of forecast pollution concentrations using collateral
informations, Computer Methods and Systems, Kraków, 2009, pp.213-218.
[18] L. Ośródka, M. Wojtylak, E. Krajny, R. Dunal, K. Klejnowski, Application Data Mining for forecasting
of high-level air pollution in urban-industrial area in southern Poland, in: Proc. of the 10th Int. Conf. on
Harmonisation within Atmospheric Dispers. Modelling for Regulatory Purposes, 2005, pp. 664-668.
[19] G. Brunet, The first hundred years of numerical weather prediction, Proceedings of the 19th Int.
Symposium on High Performance Computing Systems and Applications, Canada (2005).
P. C. Banacos and D. M. Schultz The use of moisture flux convergence in forecasting convective initiation:
historical and operational perspectives, Weather and Forecasting, 20, 351-366 (2005).
[20] W. Stawiany, P. Caban, B. Cimander, E. Cisowska, I. Ho lda, E. Krajny, A. Krucza la, E. Ostrowska,

L. Ośródka, K. Rorbek, S. Socha, J. Świȩch-Skiba, M. Wojtylak, Monograph of automatic measurement
systems of the air quality in Katowice agglometaion (1993-1999), Biuletyn Regionalnego Monitoringu

Środowiska: Wojewoda Śla̧ski (2000) 1-324. [in Polish]
[21] Rob J. Hyndman and Anne B. Koehler, Another look at measures of forecast accuracy, International
Journal of Forecasting (2006). 22(4), 679-688.