A genetic fuzzy expert system for automatic question classification in a competitive learning environment
Expert Systems with Applications 39 (2012) 7471–7478
Contents lists available at SciVerse ScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
A genetic fuzzy expert system for automatic question classification
in a competitive learning environment
Elena Verdú, María J. Verdú ⇑, Luisa M. Regueras, Juan P. de Castro, Ricardo García
School of Telecommunications Engineering, University of Valladolid, Paseo Belén, 15, 47011 Valladolid, Spain
a r t i c l e i n f o a b s t r a c t
Keywords:
Intelligent tutoring systems
Educational technology
Automatic question classification
Competitive learning
Genetic algorithms
Fuzzy systems
0957-4174/$ - see front matter � 2012 Elsevier Ltd. A
doi:10.1016/j.eswa.2012.01.115
⇑ Corresponding author. Address: ETSI Telecomunic
Valladolid, Spain. Tel.: +34 983423707; fax: +34 9834
E-mail addresses: elever@tel.uva.es (E. Verdú), ma
luireg@tel.uva.es (L.M. Regueras), jpdecastro@tel.uva.
uva.es (R. García).
Intelligent tutoring systems are efficient tools to automatically adapt the learning process to the student’s
progress and needs. One of the possible adaptations is to apply an adaptive question sequencing system,
which matches the difficulty of the questions to the student’s knowledge level. In this context, it is impor-
tant to correctly classify the questions to be presented to students according to their difficulty level. Many
systems have been developed for estimating the difficulty of questions. However the variety in the appli-
cation environments makes difficult to apply the existing solutions directly to other applications. There-
fore, a specific solution has been designed in order to determine the difficulty level of open questions in
an automatic and objective way. This solution can be applied to activities with special temporal and run-
ning features, as the contests developed through QUESTOURnament, which is a tool integrated into the e-
learning platform Moodle. The proposed solution is a fuzzy expert system that uses a genetic algorithm in
order to characterize each difficulty level. From the output of the algorithm, it defines the fuzzy rules that
are used to classify the questions. Data registered from a competitive activity in a Telecommunications
Engineering course have been used in order to validate the system against a group of experts. Results
show that the system performs successfully. Therefore, it can be concluded that the system is able to
do the questions classification labour in a competitive learning environment.
� 2012 Elsevier Ltd. All rights reserved.
1. Introduction interest, motivation and engagement by arousing their competitive in-
During the last years, the learning process is changing substan-
tially in order to be centred on the students and adapted to their
needs and features. Different studies have shown the effectiveness
of the new adaptive learning systems (Verdú, Regueras, Verdú, de
Castro, & Pérez, 2008). Many of these systems attempt to be more
adaptive by offering students questions with difficulty levels
according to their skills and capabilities. The aim is to increase
the efficiency and the level of interaction and motivation of stu-
dents (Lilley, Barker, & Britton, 2004). Too difficult or too easy
questions can frustrate and decrease students’ motivation, while
adaptive question sequencing provides a more efficient and effec-
tive learning (Wauters, Desmet, & Van den Noortgate, 2010). More-
over, according to (Lee & Heyworth, 2000), students should be able
to score higher if the items or problems are arranged according to
their difficulty level, since after solving easier problems, they feel
more motivated to solve the harder ones.
On the other hand, the competitive learning systems, as the QUES-
TOURnament system, are an effective technique to capture students’
ll rights reserved.
ación, Paseo Belén 15, 47011
23667.
rver@tel.uva.es (M.J. Verdú),
es (J.P. de Castro), ricgar@tel.
stincts (Anderson, 2006; Philpot, Hall, Hubing & Flori, 2005). Moreover,
competitive learning reduces procrastination, a common cause for stu-
dents failing to complete assignments (Lawrence, 2004) and improves
the learning process (Regueras et al., 2009).
QUESTOURnament is a telematic tool integrated into the e-
learning platform Moodle that allows teachers to organize dynamic
contests in any knowledge domain (Regueras et al., 2009). Students
compete for getting the highest marks and being at the top in the
ranking. They must solve exercises (known as challenges in QUES-
TOURnament) within a time limit and as soon as possible, since the
scoring function varies with time.
The competitive nature of QUESTOURnament motivates stu-
dents but also can provoke stress and discouragement in the worst
classified students. To assign the adequate opponents and ques-
tions to a student may be an effective strategy to reduce these neg-
ative effects (Wu et al., 2007). Therefore the system should group
students by knowledge level so that students with similar skills
compete together and answer questions with a difficulty level suit-
able for them.
In this context, it is very important to correctly classify ques-
tions by difficulty level. However, it is difficult for teachers to accu-
rately estimate the difficulty level according to the students’ level
of competence (Watering & Rijt, 2006). Experience helps teachers
to better estimate the difficulty level of the questions, but even
senior teachers sometimes fail and have to rectify when they
http://dx.doi.org/10.1016/j.eswa.2012.01.115
mailto:elever.@tel.uva.eselever@tel.uva.es
mailto:marver.@tel.uva.esmarver@tel.uva.es
mailto:luireg.@tel.uva.esluireg@tel.uva.es
mailto:jpdecastro.@tel.uva.esjpdecastro@tel.uva.es
mailto:ricgar.@tel.uva.esricgar@tel. uva.es
mailto:ricgar.@tel.uva.esricgar@tel. uva.es
http://dx.doi.org/10.1016/j.eswa.2012.01.115
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa
7472 E. Verdú et al. / Expert Systems with Applications 39 (2012) 7471–7478
analyze the answers given by their students. An automatic estima-
tion system could be the basis for an effective adaptation process.
A lot of systems that automatically estimate the difficulty level
of items can be found in the literature (Burghof, 2001; Cheng, Shen,
& Basu, 2008; Jong, Chan, Wu, & Lin, 2006; Lee, 1996; Wauters
et al., 2010). However, the variety in the nature of the application
environments makes difficult to apply the existing solutions di-
rectly to other applications. Therefore, a specific solution has been
designed in order to turn the competitive e-learning system QUES-
TOURnament into an intelligent system. The objective is to make
learning more effective and to mitigate some of the practical draw-
backs of competitive learning.
This paper discusses the validity of an expert system that auto-
matically estimates the difficulty level of the questions posed in
the QUESTOURnament competitive learning system. Section 2
introduces the major issues about teachers’ perception of difficulty
and summarizes the search towards the solution. The expert sys-
tem is described in Section 3. Section 4 starts with a description
of the experiment developed in order to validate the system. Next,
a study that analyzes the accuracy of the estimations of difficulty
obtained by the intelligent system is presented. Finally, the main
conclusions are stated.
2. Background
2.1. Teachers’ perception of difficulty
The correct estimation of the difficulty level of learning material
(questions, items. . .) is very important in the design and definition
of assessment processes, adaptive learning systems or standard
setting methods. However, there are not too many studies about
the perception and estimation of difficulty level by teachers.
Estimating the difficulty level of questions is not an easy
job. Several studies (Alexandrou-Leonidou & Philippou, 2005;
Hadjidemetriou & Williams 2002; Lee & Heyworth, 2000; Watering
& Rijt, 2006) question the ability of teachers to make accurate diffi-
culty level estimations of learning material since teachers usually
fail to identify the correct difficulty level according to the students’
ability. In general terms, students’ performance tends to be
overestimated by teachers (Goodwin, 1999; Impara & Plake, 1998;
Verhoeven, Verwijnen, Muijtjens, Scherpbier, & Van der Vleuten,
2002). Moreover, according to Watering and Rijt (2006), if the accu-
racy of teachers’ perception of difficulty is analysed by categories,
teachers tend to overestimate the difficulty of easy items and under-
estimate the difficulty of hard items. Impara and Plake (1998) also
suggest that estimating item difficulty accurately is quite difficult;
however, they do not think that teachers systematically underesti-
mate the difficulty of hard items and overestimate the difficulty of
easy items. In this respect, other contradictory results are found
too. For example, Mattar (2000) states that teachers are less success-
ful at rating very difficult or very easy items, while Zhou (2009)
indicates that teachers classify better the hardest items.
In short, although there are not conclusive studies about the
tendency of teachers when they classify questions by difficulty le-
vel, all researchers agree on the difficulty of doing this classifica-
tion. Therefore, an automatic system that adjusts the difficulty
level of questions according to the students’ behaviour would be
a very useful support tool and a key component for a truly adaptive
learning environment.
2.2. In search of an intelligent solution for a competitive tool
There are many domain-dependent intelligent tutoring systems
(ITSs) that provide students an adequate learning path through the
different topics of a subject, according to the previously learnt
topics. These systems are based on techniques such as Bayesian
Networks (Hibou & Labat, 2004; Nouh, Karthikeyani, & Nadarajan,
2006; Vomlel, 2004) and require the previous definition of knowl-
edge domains by using, for example, domain-specific ontologies
(Colace & De Santo, 2006). Modelling these networks of knowledge
components and their dependencies, generalizing them for every
student, is not an easy task (Noguez, Sucar, & Ramos, 2005), espe-
cially for domain-independent systems like QUESTOURnament,
which can be used for diverse subjects and levels of education.
Many domain-independent ITSs focus on presenting questions
and problems adapted to the students’ knowledge level. They often
apply the Item Response Theory (IRT) to estimate both the charac-
teristics of the questions, such as difficulty or guessing probability,
and the knowledge level of students (Chen, Lee, & Chen, 2005; Lilley
et al., 2004), independently of the knowledge domain. However, the
correct application of traditional theories for tests implies some
assumptions, which are not met by many examination contexts,
especially when telematic tools are used for distance learning.
Moreover, some of the characteristics of more specific tools, such
as the competitive nature of QUESTOURnament, make the applica-
tion of these theories difficult for the environment under study.
The typically used IRT models are one-dimensional, that is, they
assume that the response to a question depends on a single trait,
usually the knowledge level. Besides, it is also supposed that the
response a student gives to a specific question does not depend
on the responses given to other questions (Embretson & Reise,
2000). Therefore, using IRT entails carefully designing the tests so
that these both conditions are fulfilled. Moreover, conventional
IRT models only the response accuracy, ignoring response time;
since it was thought to be used in pure power tests (Roskam,
1997), which assume that students have unlimited time to solve
a question. Even if limited time could be assumed, at least the
requirement should be that time is not a factor that affects the stu-
dents’ response. However, in a competitive environment as QUES-
TOURnament, time is very important, since only the first student
who answers a challenge correctly will be able to obtain the high-
est score for that challenge. Therefore, there are different factors
that could distort the results obtained by the IRT methods when
applied to the QUESTOURnament system.
Students can apply different strategies during competition and
even different personality factors can determine the students’ final
response to an item. Several challenges can be posed at the same
time and students have to select one of them to be solved first.
Many students tend to read all the different questions and select
the one that seems the easiest to be solved first. Difficult chal-
lenges are usually read several times and solved after the easiest
ones have been answered. On the other hand, two students with
exactly the same knowledge level could respond to a same ques-
tion differently, as one can be more persistent and devote more
time to solve the question while another one can be more anxious
with the competition and quickly respond to be the first one. Con-
sequently, time and number of readings are important factors that
should be taken into account in the model, but its modelling is
dependent on the actual students’ behaviour.
Moreover, when teachers pose challenges to QUESTOURna-
ment, they do not have any restriction related to time, type of
questions or skills needed to solve them. They are free to use any
configuration of the system in any context. Then, there are some
important factors that can vary:
� Maximum time available to submit an answer to a challenge.
� Type of questions (open questions, multiple choice questions,
true/false questions, short response questions, problems, etc).
� Context surrounding students when they solve the questions: a
contest may be developed in classroom or on distance during
one or several days.
E. Verdú et al. / Expert Systems with Applications 39 (2012) 7471–7478 7473
� Personality of the students (e.g. the stress of the student faced
with a competitive situation can influence on the response).
There are different models adapted from the classic IRT that
cover different partial aspects of the searched solution but there
is not a model that covers all aspects required by the specific fea-
tures of the QUESTOURnament system. Roskam (1997) presents a
model based on IRT for speed tests with time limit where correct-
ness and response time are integrated. Van der Linden (2007) pro-
poses a flexible hierarchical solution that basically has an IRT
model, a time-response distribution model and a higher level
structure that has into account the dependencies between the
items and the students’ parameters in those models. For each of
these components, the most suitable model can be used.
Anyway, a model based on IRT, which took into account all possi-
ble factors that influence the response a student gives to a challenge
within QUESTOURnament, according to the so many different
contexts of application, would be vastly complex. There are other
solutions to determine the difficulty level of the learning material.
However, most of these proposals are too simplistic – like the solu-
tion used in Jong et al. (2006), where the difficulty is estimated as the
ratio between the number of times that a question is incorrectly
answered and the total number of answers – or are too focused on
the target subject – such as the solution described in Kunichika,
Urushima, Hirashima, and Takeuchi (2002), which estimates the
difficulty level of questions about English language sentences.
After analysing classical and specific solutions, it was decided to
design an ad-hoc solution for the system, whose fundamentals
could be applied to other systems used in open contexts. This solu-
tion is based on the definition of a fuzzy genetic expert system,
which classifies the questions in several difficulty levels.
There are examples of successful application of this kind of systems
to e-learning environments such as the one described by Romero,
Gonzalez, Ventura, del Jesus, and Herrera (2009). They use an evolu-
tionary algorithm to learn fuzzy rules, which describe relationships be-
tween the students’ interactions with the e-learning system Moodle
and the final marks obtained in the course. Typically, genetic learning
of rules assumes a predefined set of fuzzy membership functions gen-
erated by human domain experts (Cordón, 2004). However, as afore-
mentioned, the different nature of the challenges that can be posed
through QUESTOURnament, as well as the varied students’ profiles,
makes it very difficult to define and generalize fuzzy sets and rules.
Teachers can use QUESTOURnament for multiple-choice questions or
for laborious exercises or problems. Since, for example, ten minutes
can be a very short time for a complex problem but a long time for a
true/false question, it is very difficult to predefine the fuzzy member-
ship function for the time parameter. Moreover, the contests with
QUESTOURnament can be developed in very different contexts, for
example, during face-to-face classes or on distance, even lasting sev-
eral weeks. All these elements (nature of the questions, application
contexts of the system, profiles and behaviours of the students. . .)
make it necessary to define fuzzy sets and fuzzy rules each time a group
of questions are classified. Doing it by hand should be very laborious
and impractical, so an automatic system is needed. Besides, according
to Nebot, Mugica, Castro, and Acosta (2010), learning the fuzzification
process parameters by genetic algorithms instead of using the expert’s
criteria provides better results.
Then, the proposed system starts from scratch. Taking some data
about the interaction of the students with QUESTOURnament and
the initial difficulty level estimated by the teacher, it learns both
the adequate membership functions with their linguistic values as
well as the fuzzy rules. In the next section, the complete system is
detailed. Along this description of the genetic fuzzy expert system
some real case examples are included to facilitate comprehension.
The details of the real case and the corresponding experiment results
are set out in Section 4.
3. The expert system
A genetic fuzzy expert system has been designed that generates
fuzzy sets and rules appropriate for each specific case. The knowl-
edge base is provided by a Fuzzy Model Generator that includes a
genetic system capable of identifying the characteristics of the
questions for each difficulty level.
The estimation of the difficulty level then takes place in two
phases. During a first phase the Fuzzy Model Generator learns from
the Facts Base (formed by the students’ response patterns) and
dynamically creates the classification rules and the fuzzy sets of
the input variables for the specific data. During a second phase,
the fuzzy expert system infers the difficulty level of each question.
The components of all the system are shown in Fig. 1. From
Moodle and QUESTOURnament logs, three parameters are consid-
ered in the response patterns: time in minutes from the last read-
ing of the question until the submission of the answer, grade
obtained for that answer and number of accesses or readings be-
fore submitting the answer. All these factors depend on the stu-
dents’ behaviour when answering a question and are related to
the difficulty level of each challenge (as aforementioned). All these
data make up a set of context-dependent and noisy usage patterns
that are stored in the Facts Base and feed the intelligent system.
For each difficulty level, the genetic system uses the response
patterns of all the questions belonging to that level (according to
the initial classification made by the teacher) and obtains a charac-
terization of their responses as crisp sets. From these crisp sets the
Fuzzy Model Generator creates the fuzzy sets and rules of the
Knowledge Base. Once the fuzzy sets and the rules of a group of
questions have been generated, the Inference Engine can infer
the difficulty level of the patterns in the Facts Base. Finally, the dif-
ficulty level of each question is calculated as the median of the dif-
ficulty level of its response patterns and the challenges repository
is updated with the new difficulty level.
Thus, the system combines the students’ behaviour and the
teachers’ perception in order to objectively estimate the real diffi-
culty level of each challenge.
3.1. The genetic system
The objective of the genetic algorithm for the proposed system
is to generate groups of crisp sets that characterize the students’
responses for three difficulty levels: easy, moderate and hard.
The system groups challenges by difficulty level according to the
initial classification made by the teacher. The genetic algorithm then
uses the responses for all the questions belonging to a specific diffi-
culty level in order to obtain its characterization. As above men-
tioned, the input of the genetic algorithm is a set of response
patterns with the structure