Improving electronic health records retrieval using contexts


Expert Systems with Applications 39 (2012) 8522–8536
Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
Improving electronic health records retrieval using contexts

Belen Prados-Suárez a, Carlos Molina b,⇑, Carmen Peña Yañez c, Miguel Prados de Reyes c
a Department of Languages and Computing Systems, University of Granada, Granada, Spain
b Department of Computer Sciences, University of Jaen, Jaen, Spain
c Computer Science Department, San Cecilio Hospital, Granada, Spain

a r t i c l e i n f o a b s t r a c t
Keywords:
Electronic Health Records (EHR)
Context
Pertinence
Contextualized access
Fuzzy logic
Access improvement
0957-4174/$ - see front matter � 2012 Elsevier Ltd. A
doi:10.1016/j.eswa.2012.01.016

⇑ Corresponding author.
E-mail addresses: belenps@ugr.es (B. Prados-

(C. Molina), carmenpy@decsai.ugr.es (C. Peña Y
(M. Prados de Reyes).
This paper aims to solve a recently arose problem, related to the access to the Electronic Health Records
(EHR) in the Hospitals. Due to the digitalization of the information contained in the medical records, and
the growing availability of devices that directly generate digital documents to include in it, the EHR are
becoming unmanageable. Even more, to find a concrete item of information relevant for a given assis-
tance act is a very hard, difficult and time-consuming task. To solve it we propose here the definition
of contexts of access to the EHR, to exploit the logical division of the information inside each document
in the EHR into data groups, and the computation of the pertinence of each data group to each context.
It allows us to prioritize, the information in the EHR, even at a concrete data item level, according to
the situation from which it is acceded. This way when the medical personnel is involved in an assistance
act, the most relevant information for it is the one first showed, being able to widen the search but always
according to the relevance. With it we not only improve the accessibility to the EHR and make easier the
work of the doctors, but also enable other applications like the ubiquitous computation or the mobility,
using devices like tablet PC’s and PDA’s.

� 2012 Elsevier Ltd. All rights reserved.
1. Introduction

The use of EHR has become a reality in the everyday practice of
most of the hospitals, so it is possible to find in the literature a
great variety of proposals to implement the EHR in different
specialities like pediatrics, nursery, family care, emergencies, radi-
ology, elder care or outpatient consultation (as can be seen in
Ginsburg (2007), Cho, Staggers, & Park (2010), Gagnon et al.
(2010), Karahoca, Bayraktar, Tatoglu, & Karahoca (2010), Erdal
et al. (2009), Pung et al. (2009) and Vishwanath, Singh, &
Winkelstein (2010), respectively); and also, under different regula-
tions depending on the country, like in the Korean or the Czech
medical systems (in Cho, Kim, Kim, Kim, & Kim (2010) and Nagy
et al. (2010), respectively).

However in most of these proposals, as well as in other studies
regarding the satisfaction of the users about the implantation of
the EHR like McAlearney, Robbins, Hirsch, Jorina, and Harrop
(2010), Ross (2009), Vishwanath et al. (2010), Svanaes, Das, and
Alsos (2008) and Vest and Jasperson (2010); or even about the
comparison of different EHR systems as Bisbal and Berry (2009)
and Flores Zuniga, Win, and Susilo (2010), there is a remarked
ll rights reserved.

Suárez), carlosmo@ujaen.es
añez), prados@decsai.ugr.es
problem: the great amount of information that is accumulating
in the EHRs is making arise problems of access. Since all the infor-
mation is always available, it is becoming really difficult to access a
concrete information item required, even in relatively simple situ-
ations. It becomes really serious for situations like the urgencies,
where the decisions must be taken within seconds and the relevant
information for the concrete case should be immediately available
to support them. Even more, the problem will get worse due to the
increasing use of new medical machines and devices like PACs, that
automatically generate documents to be included in the EHRs
(Prados & Peña, 2003; Prados et al., 2010).

To solve this situation the storage and access cannot be re-
stricted only to information with high clinical value, since depend-
ing on the assistance act the information needed may change. This
problem is so relatively recent, that up to this moment it is quite
difficult to find proposals that really face its whole extent. Most
of them just focus on the definition of data structures and docu-
ments to organize the information provided by the EHR, offering
navigation systems on it, like Jerding and Stasko (1998) and
McAlearney et al. (2010). However they do not constitute a solu-
tion, since they allow logical and structured access to the informa-
tion, but they do not avoid the uncomfortable selections steps and
the successive screen-shots to reach the desired information
(Prados de Reyes, Carmen Peña Yáñez, & Suárez, 2006).

In the medical research community it is clear that ‘‘having a
good access to the information needed benefits the quality of the

http://dx.doi.org/10.1016/j.eswa.2012.01.016
mailto:belenps@ugr.es
mailto:carlosmo@ujaen.es             
mailto:carmenpy@decsai.ugr.es
mailto:prados@decsai.ugr.es       
http://dx.doi.org/10.1016/j.eswa.2012.01.016
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa


B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8523
attention received by the patients’’ (Adams, Adams, Thorogood, &
Buckingham, 2007). In addition, it is being pointed the importance
of taking into account the situation or context from which the ac-
cess is being performed (Weinstock, 2010), as a means to improve
the access to the EHRs.

As an example, Collins and Speedie (2008) propose to use an
infobutton engine to manage the clinician and patient context in or-
der to provide concise answers to frequent questions posed by cli-
nicians. ‘‘Infobuttons are information retrieval tools that help
clinicians to fulfill their information needs by providing links to
on-line health information resources from within an electronic
medical record (EMR) system’’ (Del Fiol & Haug, 2009). These mod-
els are usually based on classification models to predict clinician’s
decisions. However, these models are restricted to very concrete
topics like the ‘‘medication infobutton data, used to predict medi-
cation-related content topics (e.g., dose, adverse effects, drug inter-
actions, patient education) that a clinician is most likely to choose
while entering medication orders in a particular clinical context’’
(Del Fiol & Haug, 2009).

Other proposals related to the definition of contexts do not face
explicitly nor directly the problem posed. They are mainly focused
on the knowledge mobilization (Ray & Wimalasiri, 2006) and the
ubiquitous computation (Judd & Steenkiste, 2003; Kang, Lee, Ko,
Kang, & Lee, 2006); or on the standardization of Hospital Informa-
tion Systems and the exchange on information between them
(Cayir & Nuri Basoglu, 2008; Lahteenmaki & Kaijanranta, 2009;
Nagy, Preckova, Seidl, & Zvarova, 2010b). Proposals in the first
cases, can rarely be applied to the Hospital Information Systems,
since they are mainly based in the use of sensors to identify the
context (Kang et al., 2006) or to provide information according to
the device used so other applications can perform pervasive com-
puting (Judd & Steenkiste, 2003). In addition, none of these propos-
als are designed nor useful for the immense databases of EHR.
Proposals in the second case, instead of focusing on identifying
the information that is really needed to be exchanged, are centered
on adapting the system, its structures, contents and interfaces to
different regulations and standards like HL7 (Data exchange stan-
dard, 2011), DICOM (Image storage standard, 2011; Open-EHR,
2011), SNOMED-CT (Nomenclature medical standard, 2011), or
the most recent proposal of the European Committee for Standard-
ization: the ISO 13606 regulation. It leads them to forget and even
obviate the needs of the health professionals, who are the real
users of the system, and whose work improvements have more
repercussion and impact in the quality of the medical assistance
provided to the patients.

Nevertheless, the problem of the access to concrete informa-
tion items of interest in huge databases, do is addressed explicitly
in other environments, like business, legacy and e-government (in
Chaker, Chevalier, Soule-Dupuy, & Tricot (2010), Mao & Benbasat
(2001) and Bohm, Wolf, & Krcmar (2010), respectively). A German
office of digital services to the citizens has detected the problem
through deep studies (Bohm et al., 2010), but still has not pro-
posed a solution to it. In the business framework this situation
has also arose, as indicated by Chaker et al. (2010) and Buchholz,
Hochstatter, and Linnhoff-Popien (2007), and the proposals to
solve it are based on the improvement of the information retrie-
val by the definition of different contexts and business models (as
in Jung, 2009), changing the actual access mode and adapting it to
the real information needs of the acceding user. However, these
proposals are too young and are still in their first development
phases, so it is soon to extend and adapt them to other type of
systems.

This is why we propose to analyze the daily practice of the med-
ical staff, and follow the same philosophy as these solutions, to im-
prove their access to the information in the EHR based on the
information they usually request in each assistance act. Following
the work of a doctor we can find a great variety of situations with
different purposes: from a deep study of a complex diagnosis pro-
cess in his office, to a simple revision of the last consultation in-
form in a control of evolution process, passing through the
requirement of very concrete data in the response to an emer-
gency. As can be seen, we face a wide variety of activity contexts,
with quite different requirements of information. In other words,
we have different sets of relevant documents or information items
of the EHR, depending on the context we are involved in.

Our proposal is based on the study of the access patterns so the
information showed to the user can be context-sensitive. This way
the system would only show to the doctor the information that
is relevant to his/her present context. However it must be taken
into account that the information needs are not static, and they
may change along the time, so it is possible that a piece of informa-
tion that today is important will be useless in the future. Moreover,
the age of the data has influence too: there are cases like some
analysis that must be repeated if the last result of the same type
of analysis is older than a few months, since the results may
change. Hence, all of these aspects must be considered when defin-
ing the pertinence of the information items to the contexts.

According to all of it two problems must be faced. On one hand
it is necessary to identify the contexts of access. On the other hand
the information relevant for each of them must be identified. To
solve the first problem can be found some proposals focused on
the context modeling like Bobillo, Delgado, and Gómez-Romero
(2008), Chaker et al. (2010), Ehsan, Amini, and Jalili (2009), Gar-
cia-Morchon and Wehrle (2010) and Chu, Johnson, and Kangarloo
(2000); but most of them are just theoretical models too compli-
cated to be integrated in an existing system, and also require such
complex algorithms that make them not suitable for an hospital
information system. Even more if the system has to be updated
continually to adapt to new needs. Proposals to face the second
problem are mainly oriented to identify the relevant information
inside documents as Järvelin and Kekäläinen (2000), Jones
(2004), Cao et al. (2006) and Kanoulas, Pavlu, Dai, and Aslam
(2010) propose; but due to the great amount of data involved in
the Hospital Information Systems (hundreds of millions of records)
it is not possible to use them. We need a very efficient way to con-
textualize the access and decide which information is relevant on
each situation.

In this paper we present our proposal to do it, exposing first the
support we have used as base, as well as the general structure of a
EHR, in Section 2. Next, in Section 3, we show how to define the
contexts, as much automatically as on demand, and how to identify
them when a doctor accesses the system. Then, in Section 4, we
propose a method to compute the pertinence of the information
groups or items to the contexts, considering its several aspects.
Based on it we establish a priority ordering in the use of pertinence
values, and we show how it is used to improve the access to EHR, in
Section 5. After it, in Section 6, we exemplify and summarize our
proposal, and we make some comments about the implementation
in Section 7. Finally we present some results and also our conclu-
sions in Sections 8 and 9, respectively.

2. Background

First of all we must indicate that the proposal presented here
has been developed in collaboration with the University Hospital
San Cecilio from Granada, and that we have based their Electronic
Health Record System, and used it as reference.

This system stores around 800.000 EHR, containing more than
50 millions documents. In the future it is expected to have a fast
increase in the size, due to the inclusion of new types of documents
from two sources: old documents that still have not been digital-
ized (scanned images, MRI, etc.) and new documents generated


8524 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
from the recently and future acquired devices and equipments like
PAC’s.

In this section we briefly show the characteristics of this EHR
system, as well as the structure of the Electronic Health Records
stored on it.
Fig. 1. Logical organization of the EHRs.
2.1. Electronic Health Records structure

The information stored in the EHR is structured according to the
Reference Model given by the ISO-13606 (2008). According to this
standard, the elements of the hospital information systems are or-
ganized according to an Ontology with a class structure that gives
rise to the following classes:

� Folder: This class represents the divisions at the highest level
inside the clinical history. In our case these divisions are the
assistance acts and the pathologies, so all the documents in the
EHR are grouped into assistance acts or pathologies, and both
classifications coexist.
� Section: This class of the standard represents logical groupings

of information, each one representing a set of data with an uni-
form informative clinical guidance (Fig. 1), and corresponds to
each document stored in the EHR. Examples of documents are
from a blood analysis to a preanaesthetic study, or from an
admission document to a X-ray test.
� Entry: According to the standard each entry represents a clinical

observation or a set of them. It corresponds to what we call data
groups (i.e. the hematology information in a blood analysis).
� Cluster and Element: These classes correspond to what we call

data items. The difference between these classes is that the first
one is used to represent a unique observation or action (a data
item) that requires a complex structure like a list, a table or a
temporal series (i.e. an electrocardiogram); whereas the second
class represents a unique and simple value, instance of some of
the types defined by it (i.e. the percentage of hematocrit in a
blood analysis).

As indicated in Prados and Peña (2003) and Prados-Suárez, Rev-
uelta, Peña Yáñez, and Fernández (2008), each document is charac-
terized by a set of properties like:
Fig. 2. Screenshot exampl
� the type (exploration, anamnesis, epicrisis, checkup, nursing
control, intervention, external,. . . ).
� the speciality (medical speciality as surgery, cardiology and so

on, nursing, administrative, etc.).
� the pathological or clinical process (documents about preg-

nancy, cataract, diabetes, etc.).
� . . .

In our system, documents are organized according to assistance
episodes (admissions, outpatient consultation, emergency assis-
tance, day hospital,. . .) in a chronological or medical ordering,
depending on the assistance processes. The documents are classi-
fied according to the types, considering 1500 different documents
classes in the system: intervention sheet, progress sheet, nursing
e of the applications.


Fig. 3. Structure of the system.

1 http://www.citrix.com.
2 http://www.oracle.com.

B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8525
sheet, pregnancy process, diabetes protocol, radiological report,
and so on. An example of it can be seen in Fig. 2, where the inter-
face that the medical personnel use to access the EHR is shown.

When a doctor is looking for a concrete data item, from a spe-
cific test made to the patient, he/she must select (on the square
marked with a number 1 inside a circle in Fig. 2) the type of assis-
tance act that is performing. Then a search in the square marked
with 2 in Fig. 2 must be done, according to the date of the test or
the medical speciality in which it was done. Once the assistance
act in which the test was made is found, the doctor has to find
among the documents generated in that act, the one with the re-
sults of the test, in the interface part marked with 3 in Fig. 2. With
it the document is recovered and finally it must be scanned to find
the item of interest, by clicking on the tab marked with a number 4
in Fig. 2. Though it is possible to use different ordering criteria
(marked with number 5 in Fig. 2), it can be seen that it is a long te-
dious task that consumes a lot of the doctor’s time, and that must
be repeated each time the doctor needs to know something about
each patient attended.

2.2. Data groups

Inside the documents there are data items that can also be
grouped into small logical units, that we call data groups, when
they are related under a clinical point of view (Fig. 1).

Each data group inherits the general properties of the document
where it is contained. In addition, as shown in Fig. 1, each data
group has its own specific properties, like the relevance level (for
the concrete patient and episode), the confidentiality level, etc.

Examples of data items for a blood analysis or a preanaesthetic
study are: erythrocyte, hemoglobin, corpuscular volume, amylase,
GGT, HDL-cholesterol, LDL-cholesterol or VLDL-cholesterol, in the
first case; and Hypertension, cardiopathy, electrocardiogram,
radiologic study or echography, in the second case. These data
items can be grouped into the data groups general biochemistry
and lipid information, for the first type of document; and risk fac-
tors or additional tests, for the second type. Here we would like to
remark that the information of EHR and patient’s identification is a
‘‘special’’ data group, common to all the documents. Due to it, it is
discarded from the processes explained later.

This logical organization of documents and their content, allows
the processing and analysis of the information, as much at docu-
ment level as at individual data items level or data group level. Here
we consider the data groups as the minimum unit of information,
since a single data item can be managed as a data group with just
one element.

2.3. EHR information system

The structure of the system is shown in Fig. 3. The users access
the system using medical workstations. These are normal PCs, light
PCs (or net PCs), medical devices like the X-ray systems or the
ultrasound scans, or the most recently incorporated terminals as
the Tablet PCs and PDAs. The user then log on the system and ac-
cess to a Citrix1 farm of servers where the applications are executed.
All the data are stored in a data base cluster using Oracle DBMS.2 The
screen-shot of the Doctor’s interface once logged is shown in Fig. 2.

This system, as legally demanded, stores each access to the EHR,
indicating the data acceded and, in case of modification, the mod-
ified data; the staff member acceding; and the assistance situation
(called ‘‘controlled assistance situation’’) in which the access occurs.
From now on, we will call this access data base as Retrospective Ac-
cess Data Base (RADB). The number of records stored in the RADB is
in the order of hundreds of millions.

We support the work proposed here on the registers of this data
base since, as we will see next, their analysis allows to know which
information has been acceded and the related context.

3. Contexts

We call Context to a situation in the Doctor–Patient relationship
inside an assistance act, requiring an access to the information pre-
viously stored in the EHR.

To contextualize the access to the EHR we first need to establish
the set of possible situations or contexts where that access may oc-
curs. Then, to exploit the contextualized access system, it is neces-
sary to count on a mechanism to identify the context in which the
medical staff is involved.

3.1. Context definition

The contexts can be defined under three criteria or a combina-
tion of them:

http://www.citrix.com
http://www.oracle.com


8526 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
�P-
athological process: In this case the contexts are defined based on
a diagnosed pathology that requires monitoring and it is included
in the EHR as such process. Some of these processes are defined by
the Regional Health Administration and others by the hospital ser-
vices themselves. It must be taken into account that several med-
ical specialities can be involved in the same process. Some
examples are the pregnancy process, the cataract process, the dia-
betes process, etc.
� Medical speciality: Here the contexts are defined according to

the specificity of each medical speciality (pediatrics, gynecol-
ogy, nursery, cardiology, etc.).
� Kind of assistance: The context definition here is based on the

environment where the assistance process takes places. The fol-
lowing cases can be distinguished:
– Diagnostic study.
– Surgical intervention.
– Post-surgical revision.
– Evolutive revision.
– Room visit.
– Treatment revision of outpatient consultation.
– Analytical control.
– Urgent assistance situation.

According to these three criteria, we ask different medical doc-
tors to identify the contexts on each speciality. The set of contexts
obtained has been reviewed by different groups of medical doctors
to validate the results for their corresponding speciality.

3.2. Context identification

Once we have the contexts, it is necessary to have an automatic
method to identify when a medical doctor accesses an EHR in a gi-
ven context. By means of different interviews with the medical
staff we have identified some characteristics of the accesses they
perform, that are important in this process:

� Speciality of the medical staff like cardiology, ophthalmology,
internal medicine, emergency, administration, nursing, and so
on.
� Position of the medical staff. There are different positions for

each type of medical personnel like. Some examples are: resi-
dent (from first to fifth year), facultative, section manager or
head of service for the medical doctors; the categories of man-
agement technician, administrative technician, section manager
or head of service for the administrative personnel; or the nurs-
ing position or nursing supervisor in that department.
� Type of the medical workstation. The data about the type of ter-

minal used to perform the access, gives a lot of information
about the type of assistance act in which the medical staff is
involved. This attribute has several parts:
– The type of the terminal. This value gives information about

the hardware used (PC, PDA, patient’s room terminal, com-
puter associated to a concrete equipment like the X-ray
machines or ultrasound scan, etc.).

– The medical unit associated. Each terminal is associated to
an unit (gynecology, pediatrics, etc.) for management rea-
sons; but this information helps to identify the context. As
an example if a cardiologist is acceding an EHR from a com-
puter associated to the emergency unit, the context could be
a cardiology emergency.

– Physical location. It helps to concrete even more the type of
context in which the medical staff is involved. In the previ-
ous example, if the terminal acceded is located in the obser-
vation room in emergencies, the context is different from the
case when the access is performed from the surgery room.
� The kind of the present patient’s appointment. For each
appointment with a doctor, the information about its type is
stored. There are around 50 usual types of appointments like
first visit, checkup, scheduled visit, urgent visit, extern emer-
gency, admission, several types for the different complementary
tests and explorations, inter-consultation, movement between
services, and so on. There are also some other types less usual
or even rare but also considered in the system, like radiologic
surgery.
� Last visit of the patient. This information in some cases helps to

predict the cause of the next appointment. As an example,
always after a surgical intervention there is a post-surgical
checkup.

To identify the context with these attributes we use a rule-
based system built on the past accesses data base (RADB). To get
the data necessary to build it, a question about the context was
added to the normal application (Fig. 2), so the medical staff could
answer it each time an EHR was acceded. The system showed a list
of contexts, filtered by a very simple criteria as the medical speci-
ality, and the doctor chose the one that best fitted his/her access.
This data collecting process, was active for 6 months and we have
gotten an answer rate of 24.5% (about half million records).

With all the information collected, the RIPPER algorithm (Co-
hen, 1995) was used to build the rule-based system. This is a well
known algorithm that produces good results, even compared to
more recent proposals. This method basically consists on building
a list of rules in an iterative way, based on the information gain.
After it, a pruning process on the rule list is performed, which im-
proves the classification rates of the unknown cases.

To verify that the results of the algorithm can be used, we have
first tested it using a 10 cross-folder validation. In this process we
got an average classification rate of 89.32%, which can be consid-
ered a good percentage, taking into account the high number of
contexts (classes). Once the quality of the rule set has been proved,
we have built a new model, but in this case using all the records.
With it we have obtained an ordered list of rules, with near two
hundred elements and we search for the present context by
sequentially verifying this list and stopping at the first satisfied
rule.

Though it may seems to be a considerable number of rules, we
optimize the search on it by saving the rules in a data base where
all the antecedent’s variables are stored, as well as their corre-
sponding consequents, in addition to their order. Hence, to obtain
the context only a simple query must be done, and so the context
identification supposes no significative overhead in the access
process.

4. Pertinence

Once the set of considered contexts is defined it is necessary to
identify the relevant information for each one. The relevance of a
concrete data group for a given context is what we call pertinence:
the more needed or interesting the data group is for the context,
the higher is its pertinence to the context.

There is a great variety of factors to consider to compute this
pertinence like:

� The regulations about each clinical process. Usually the infor-
mation relevant for each act of some pathologies (not of all),
is fixed by protocols set by governmental institutions, by the
hospitals or by the medical services.
� The opinion of the concrete doctor. In addition to the regula-

tions, each doctor can consider that, from his/her point of view,
there are other items that must also be taken into account.


Fig. 4. Behavior of the time pertinence PT(X), according to different values of B.

B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8527
� The own history of a concrete patient. Some data groups with-
out significance for the majority of the patients, may have a
great and especial importance for a given patient.
� The aging of the information. With time there are tests that

loose their validity, because they are too old or because there
are new tests of the same type.
� The access patterns. It is possible that the medical staff starts to

access frequently a concrete data group for a given situation,
and that they are not informed or ‘‘conscious’’ of it, so they do
not include it through any of the previous ways. Taking into
account this aspect of the pertinence, new patterns of access
can be discovered and also the system can also automatically
adapt to them.

The system must be capable of representing and bringing to-
gether all these aspects of the pertinence. The way we propose to
do it is shown in next sections, where we present a method to cap-
ture and calculate each of these aspects of the pertinence.

4.1. Static pertinence: regulations, doctors and patients

As static pertinence we understand those set by medical criteria
or given by the medical staff. Therefore, we consider three types of
static pertinence, corresponding to three of the its aspects men-
tioned above.

On one hand, the medical criteria and regulations that deter-
mine which information must be always taken into account for a
given process or pathology.

On the other hand, there personal opinions or even research
studies of the doctors, that lead them to find a specific information
item especially relevant for all the patients they see in a concrete
context.

In addition, must be considered the concrete data group is par-
ticularly important for a given patient but not for the rest.

Hence, we include in the system three degrees of pertinence de-
fined by doctors or medical criteria: one associated to the regula-

tions PRDc 2 ½0; 1�
� �

, another one related to the personal opinion of

the doctor PCDc 2 ½0; 1�
� �

and the other one associated to the specific

patient PPDc 2 ½0; 1�
� �

.

4.2. Time pertinence

The pertinence of a group of data (and the implicit document)
will depend too on the date of creation. It is logic that the results
of an analysis will be more important if it was completed a few
days before than if it was performed a year ago.

However the influence of the age will not be the same for all
document types: some type of analysis may be valid for several
months meanwhile others are valid for years.

Hence we propose to modify the pertinence of the document
depending on a established age threshold in months. If the docu-
ment is younger than the threshold we want the time pertinence
to be high (value grater than 0.7). If it is older, we want the time
pertinence to decrease and give a low value.

To fulfill this restriction we propose the next definition.

Definition 1. Been D a document and A the age of the it (express in
months), we calculate the Time pertinence of document D as

PTðDÞ¼ e�
logBðAÞ

e ð1Þ

where B 2 ]1, +1] is a parameter defining the decreasing strength.
Fig. 4 shows the behavior of the function according to the value

B. Let note that the value of B is the point where the function has
the first value under 0.7 (high pertinence).
4.3. Dynamic pertinence

In most of the situations the doctors will not give a static perti-
nence for a document neither a patient. So we need to learn the
pertinences. Hence we propose to do it according to the accesses
stored in the RADB database.

Due to the great number of records in the RADB database we
need a very efficient process. If we consider that we want the sys-
tem to be dynamic and to update on-line the pertinences according
to the new accesses, the efficiency requirement is even more
important.

As we have mentioned, different methods to calculated the rel-
evance of preferences can be found in literature but the great com-
plexity that they have, makes them not valid for our system. It has
lead us to propose the new method explained next.

To calculate to pertinence we propose to use an adaptation of
the Vector Space Model Salton, Wong, and Yang (1975). This tech-
nique comes from the Documentary Computing, concretely from
the automatic indexation methods and retrieval systems Gil-Leyva
and Rodríguez-Muñoz (1966). It is used to determine which
descriptors are more specific or discriminate better between
documents.

The discrimination value classifies terms in the text according
to their capability to distinguish some documents from others in
a given collection; i.e., the discrimination value of a term depends
on how the average distance between the documents changes
when a content identification is set for the term. Therefore, the
best words are those resulting in a higher distance.

The basic idea of this model lays in the construction of a matrix
or table of information items and documents, where the rows are
the terms and the columns correspond to the documents acceded.

The rows would correspond to the terms that would be ex-
pressed according to the occurrences (access frequency) of each
information item.

Applying it to our case, we consider as documents (columns)
the possible Contexts and as terms (rows) the data groups inside
the documents. Hence, the table with the access frequencies will
be like the one show in Fig. 5, where tfij represents the number
of accesses to the data group i in the context j, and

tfj ¼
XN
i¼1

tfij ð2Þ

gives information about the total accesses for context j.


Fig. 5. Frequency table for data groups and Contexts.

Fig. 6. Evolution of the influence according to the distance in days to the reference
date.

8528 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
However in our situation it is not enough, since we need the re-
cent accesses have to a higher influence than older ones when cal-
culating the pertinence. This is why we propose to measure the
relevance according to the time as the weight function shown in
Fig. 6. Let DR be a reference date to consider relevant or not the
information for the system and DA represent the access date. In
that case, we propose the following function to calculate the
weight for a given date (DA):

WðDAÞ¼ 2
DA�DR

365 ð3Þ

where the date difference is calculated in days. This way an access
made today will have more influence that the accesses of the last
year but, as the time goes by, less influence than future accesses.
The equation establishes that given two accesses with a year of dif-
ference, the newer one will have double influence than the older
one.

This definition introduces in the system two important and use-
ful capabilities:

� The pertinences will be updated according to the aging of the
access and the decreasing relevance.
� The system will be adapted automatically to future accesses

patterns and needs, that can even allow us to define new
contexts.

Then the system will update the pertinence of the data groups
having more influence the newer accesses and enable the system
to adapt to future needs.

As shown in Fig. 6, the influence is an increasing function. This
may introduce some problems of representation or loss of preci-
sion when the values are very high. The definition of the influence
that we have proposed allows us to avoid this problem in an easy
way. We only have to move the reference date (DR) and adapt the
stored values to get an easy and quick adaptation: if we add one
year to the reference date and divide all the values by 2, we get
the same frequency and we reduce the magnitude of the stored
values.

We can repeat this process as many time as needed as long as
the final reference date is previous to the next access to be stored.
The proposed system will do this each new year keeping the refer-
ence day with one year length to actual date. The update of the
values only need to change the reference date (one update sen-
tence) and the stored values (a set of very simple update sen-
tences) which would need just a short time.

Now we have the frequency, we propose an adaptation of the
inverse document frequency of the Vector Space Model (Salton et
al., 1975) to measure the pertinence of a data group to a context,
based on the information stored in the RADB database.

Definition 2. Let C be a context and X a data groups, the
restrospective pertinence is

PCRðXÞ¼
tfXC
tfC

� �1=4
ð4Þ

The idea behind this pertinence is to consider relevant a data
group if the number of accesses to it is high in comparison to the
total number of accesses.
4.4. Global pertinence

Once we have the different considered aspects about the perti-
nency and defined a way to compute them, we need to aggregate
the information given by them into a single value. To do it, we ob-
tain a global pertinence of a data group to a given context as in
next definition.

Definition 3. Let X be a group of data in a document D, and C a
context, we define the global pertinence of X to C as

PCGðXÞ¼ P
R
DcðXÞ� P

C
DcðXÞ� P

P
DcðXÞ� P

C
RðXÞ

� �
� PTðDÞ ð5Þ

where

� PRDc is the pertinence set by medical doctors according to the
regulations,
� PCDcðXÞ is the pertinence set by medical doctors for the data

group to the context under their personal point of view,

� PPDcðXÞ is the pertinence set by medical doctors for the data
group to a given patient,
� PCRðXÞ the retrospective pertinence according to prior accesses,
� PT(X) the pertinence considering the age of the document,
� � and � a t-conorm and a t-norm, respectively.


ntextualized query process.

B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8529
For the system we have chosen the maximum and the minimum
Fig. 7. Scheme of the co
Fig. 8. Scheme for pertinences update process.
as t-conorm and t-norm because of their simplicity, and therefore,
efficient and fast calculation as well as they are quite extended.

Hence, we include in the system three degrees of pertinence de-
fined by doctors or medical criteria: one associated to the regula-

tions PRDc 2 ½0; 1�
� �

, another one related to the personal opinion of

the doctor PCDc 2 ½0; 1�
� �

and the other one associated to the specific

patient PPDc 2 ½0; 1�
� �

.

5. Contextualized access system

With all the elements to implement the contextualized access
to the EHR, we show next how we propose to provide this access
by presenting the use of the proposed method, as well as the up-
date process that allows the system to automatically adapt to
new needs.

5.1. Access to the system

An scheme of the access process in shown in Fig. 7.

� The doctor starts the process by logging in the system.
� Using the information about the terminal and the schedule of

the doctor, the system gets the context for this access using
the simple rule system.
� The doctor identifies the patient in the system to access his/her

EHR.
� The system gets the EHR and queries the static and dynamic

pertinences for all the data groups that appear in his/her EHR.
The result of aggregating these pertinences and the time perti-
nence as shown in Eq. (5) is used to order the data.
� Finally, the system selects the first data groups and returns

them to the doctor ordered y priority, as well as a way to access
the other data groups if the doctor needs them.

5.2. Update process

The system is updated on each access so the pertinences are
adapted continually to reflect doctors’ needs. In this process no
manual intervention is needed and a few records are changed so
it needs a short time to be executed. The update process is as
follow:

� When the doctor logs in the system, the context of the access is
calculated as mentioned above.
� The doctor asks for a data group of a specific patient.
� The system then gets the required data and returns them to the

doctor. At the same time the system logs the access in the RADB
table and updates the frequency table used to obtain the retro-
spective pertinence with this new access. Only two records are
changed: the accesses to the data group in this particular con-
text (tfij) and the total number of accesses to the context (tfj).

A scheme summarizing of the process is shown in Fig. 8.
After every access to the system, the dynamic pertinence is

automatically updated so the next time a context is acceded the
pertinence of the data groups to the context is computed consider-
ing the most updated information. As can be seen it is done with a
very low computation cost. In addition this updating process al-
lows, not only to give the most updated information, but also to
discover new patterns of accesses, which give us the chance to de-
fine new contexts.


Table 1
Documents and data groups contained on each of them. B is the parameter to define
the decreasing strength.

Code Document types Data groups B

DT1 Electrocardiogram g1 3
DT2 Blood Analysis Coagulation (g2) 3
DT2 ’’ Immunology (g3) 3
DT2 ’’ Biochemistry (g4) 3
DT3 Discharge report g5 24
DT4 Thorax radiography g6 6
DT5 Surgery report g7 24

8530 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
6. Example

In this section we show some examples to clarify the proposed
method. Here we present two contexts and a set of five different
documents with several data groups, and indicate how we com-
pute their pertinence to each context. In a real case the number
of contexts as well as the number of documents and data groups
will be considerably much higher, as shown in Sections 3.1 and
2. With these very simplified examples our aim is jut to clarify
and show the correspondence between the formal notions pre-
sented here and the medical terminology. The set of documents
and data groups, are shown in Table 1, and the context selected
are only two:

� C1: Emergency after catheterization surgery
� C2: Traumatology pre-surgery appointment.

In Table 2 the previous accesses are shown (a simplification of
the RADB), as well as the associated frequency table is shown in
Table 3.
Table 2
Recorded accesses to data groups in EHRs (W(D) is the weight of the access according to

Date Context Data group W(D)

22/01/08 c1 g7 1.04
16/01/08 c1 g6 1.03
16/01/08 c2 g6 1.03
24/02/08 c1 g7 1.11
03/02/08 c1 g7 1.06
22/02/08 c1 g6 1.10
20/03/08 c2 g1 1.16
03/03/08 c1 g7 1.12
26/03/08 c2 g1 1.18
19/04/08 c1 g7 1.23
04/04/08 c1 g7 1.20
21/04/08 c1 g6 1.23
14/05/08 c1 g3 1.29
01/05/08 c2 g6 1.26
22/05/08 c2 g6 1.31
01/06/08 c1 g7 1.33
27/06/08 c2 g2 1.40
03/06/08 c2 g3 1.34
25/07/08 c1 g6 1.48
25/07/08 c1 g7 1.48
11/07/08 c1 g7 1.44
16/08/08 c1 g7 1.54
17/08/08 c2 g2 1.54
26/08/08 c2 g6 1.57
05/09/08 c2 g1 1.60
12/09/08 c1 g3 1.62
08/09/08 c2 g1 1.61
12/10/08 c1 g6 1.72
11/10/08 c2 g6 1.71
23/10/08 c2 g2 1.75
06/11/08 c1 g3 1.80
04/11/08 c2 g7 1.79
03/11/08 c1 g3 1.79
18/12/08 c1 g7 1.95
16/12/08 c1 g3 1.94
28/12/08 c1 g7 1.99
24/01/09 c1 g7 2.09
08/01/09 c1 g7 2.03
24/01/09 c1 g7 2.09
19/02/09 c1 g3 2.20
19/02/09 c1 g1 2.20
25/02/09 c1 g3 2.22
01/03/09 c1 g5 2.24
21/03/09 c1 g3 2.33
08/03/09 c2 g1 2.27
01/04/09 c2 g7 2.38
07/04/09 c2 g3 2.40
25/04/09 c1 g7 2.49
In the next section we present in detail two examples for two
different patients.
6.1. Examples of retrieval

6.1.1. Patient 1
In the first example we consider a patient that recently had a

catheter surgery. The last time he came to the hospital was for
the date as shown in Fig. 6 considering DR = 01/01/2008).

Date Context Data group W(D)

24/05/09 c1 g4 2.63
21/05/09 c1 g7 2.61
26/05/09 c2 g3 2.64
26/06/09 c1 g7 2.80
18/06/09 c2 g4 2.76
01/06/09 c2 g1 2.67
17/07/09 c1 g3 2.91
23/07/09 c2 g6 2.95
06/07/09 c1 g5 2.85
08/08/09 c2 g1 3.04
28/08/09 c1 g5 3.15
22/08/09 c1 g4 3.12
18/09/09 c1 g3 3.28
13/09/09 c1 g7 3.25
07/09/09 c1 g3 3.22
02/10/09 c2 g6 3.37
10/10/09 c1 g2 3.42
07/10/09 c1 g1 3.40
06/11/09 c1 g7 3.60
08/11/09 c1 g5 3.62
05/11/09 c1 g7 3.60
06/12/09 c1 g5 3.81
17/12/09 c1 g7 3.90
19/12/09 c1 g5 3.91
18/01/10 c1 g5 4.14
10/01/10 c1 g5 4.08
07/01/10 c1 g5 4.05
25/02/10 c1 g3 4.45
14/02/10 c2 g6 4.36
04/02/10 c2 g2 4.27
15/03/10 c2 g4 4.60
16/03/10 c1 g7 4.61
28/03/10 c1 g5 4.72
20/04/10 c2 g3 4.93
01/04/10 c1 g3 4.75
28/04/10 c2 g2 5.00
09/05/10 c1 g3 5.11
09/05/10 c2 g4 5.11
26/05/10 c2 g2 5.28
07/06/10 c2 g2 5.40
24/06/10 c1 g5 5.58
17/06/10 c2 g2 5.50
11/07/10 c2 g5 5.76
06/07/10 c2 g2 5.71
28/07/10 c1 g7 5.95
16/08/10 c1 g5 6.17
03/08/10 c2 g2 6.02
02/08/10 c1 g5 6.01


Table 3
Frequency table.

Document
types

Data
group

Acum Frequency PCC

C1 C2 C1 C2 C1 C2

DT1 g1 5.60 13.53 0.03 0.13 0.42 0.60
DT2 g2 3.42 41.88 0.02 0.39 0.37 0.79
’’ g3 38.93 11.31 0.22 0.11 0.69 0.57
’’ g4 5.75 12.47 0.03 0.12 0.42 0.58
DT3 g5 54.33 5.76 0.31 0.05 0.74 0.48
DT4 g6 13.24 17.56 0.07 0.16 0.52 0.64
DT5 g7 55.53 4.17 0.31 0.04 0.75 0.44

Total 176.80 106.68

Table 4
Electronic health record of patient 1.

Document Creation date Document type

1 01/08/2010 DT1
2 07/03/2008 DT1
3 31/07/2010 DT5
4 02/08/2010 DT3
5 01/08/2010 DT2
6 15/06/2010 DT2
7 15/06/2010 DT4

Tab
Per

Tab
Per

B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8531
the post-surgery review but know he is feeling really bad (much
pain in the breast). He comes to the Emergency service and a res-
ident doctor in the fifth year (R5) is going to check up him. Now
they are in the emergency room and the doctor wants to access
the patient’s EHR.

The first stage of the approach is to identify the access context.
According to the information about the attributes used to identify
the context, the first rule that is satisfied is:

IF Speciality = Cardiology AND

Last_visit=‘‘cateterismo post-surgery review’’ AND
Unit=‘‘Emergency’’

THEN Context=‘‘Emergency after cateterismo sur-

gery’’ (C1)

The electronic health record for this patient is in Table 4. Let us
note that the EHR has several documents and some of them are of
le 5
tinences for patient 1’s EHR for context 1.

Doc. Date Document type Data group PC1DC

1 01/08/10 DT1 g1 0.00
3 31/07/10 DT5 g7 0.00
4 02/08/10 DT3 g5 0.00
5 01/08/10 DT2 g2 0.00
’’ ’’ ’’ g3 0.00
’’ ’’ ’’ g4 0.00
7 15/06/10 DT4 g6 0.00

le 6
tinences for patient 1’s EHR for context 2.

Doc. Date Document types Data group PC2DC

1 01/08/10 DT1 g1 0.00
3 31/07/10 DT5 g7 0.00
4 02/08/10 DT3 g5 0.00
5 01/08/10 DT2 g2 0.00
’’ ’’ ’’ g3 0.00
’’ ’’ ’’ g4 0.00
7 15/06/10 DT4 g6 0.00
the same type but with different age. We now consider an access
from each of the two contexts mentioned above, and show the per-
tinent data groups in each case.

The first step is always to keep just the newer document of each
type. Hence, the documents selected to work with are 1, 3, 4, 5, and
7, whereas documents 2 and 6 are discarded.

With this subset of documents we calculate the different perti-
nences for each of the data groups to the context C1. Table 5 shows
the values for each type and the global pertinence in the column
PC1G , applying the Definition 3.

The next step is to order the data groups according to the global
pertinence values. The last column of the table collects the ranking.
If we make blocks of 4 data groups to create levels of preference,
the result for the query will include the data groups Doc3.g7,
Doc4.g5, Doc5.g3, and Doc7.g6. Whereas, Doc5.g4, Doc1.g1 would
be in the next block of data groups. The number of considered data
groups (size of the blocks) may change according, as an example, to
the capacities of the medical workstations (in a PDA four data
groups may be enough because of limitation of the screen but in
a PC the number could be greater).

Before returning this set to the user, we refine the answer to
determine if in some concrete case it is preferable to show the en-
tire document instead of its pertinent data groups. We do it if more
than a given percentage of the data groups inside a document are
selected in the answer. In such case we consider the rank of the
most pertinent data group. As an example, if the percentage is
set to the 60%, the documents Doc3, Doc4, Doc7, that only contain
one data group and it has been found pertinent, will replace in the
solution the pertinent data group. However Doc5 has three data
groups from which only one has been found pertinent. Therefore,
in the final solution this data group will be kept. This way, the ini-
tial solution {Doc3.g7, Doc4.g5, Doc5.g3, Doc7.g6} will be replaced
by {Doc3, Doc4, Doc5.g3, Doc7}.

Finally we would like to remark that in the real system we
establish several preference levels, by grouping the pertinent data
groups into sets of a fixed size, 10. If the information needed does
not appear within the first block of 10 data groups, the next set is
shown; and this process is repeated until the information is found.
In addition, in any moment the user can access the complete EHR
by the traditional navigation system.

To show the differences lets suppose that the appointment is in
a different situation and the inferred context is C2. In that case
the pertinences are different, as shown in Table 6. The data groups
PPDC
PT PC1C P

C1
G

Rank

0.00 0.79 0.42 0.42 6
0.00 0.92 0.75 0.75 1
0.00 0.92 0.74 0.74 2
0.00 0.79 0.37 0.37 7
0.00 0.79 0.69 0.69 3
0.00 0.79 0.42 0.42 5
0.00 0.75 0.52 0.52 4

PPDC
PT PC2C P

C2
G

Rank

0.00 0.42 0.60 0.60 3
0.00 0.75 0.44 0.44 7
0.00 0.74 0.48 0.48 6
0.00 0.37 0.79 0.79 1
0.00 0.69 0.57 0.57 5
0.00 0.42 0.58 0.58 4
0.00 0.52 0.64 0.64 2


Table 7
Electronic health record for patient 2.

Document Creation date Document type

1 01/09/2010 DT4
2 01/09/2010 DT1
3 25/08/2010 DT2
4 01/07/2010 DT3
5 15/06/2009 DT2
6 20/06/2009 DT5

Table 10
Updated frequency table for access 1.

Document type Data group Acum PC1C

DT1 g1 5.6 0.42
DT2 g2 3.42 0.37
’’ g3 38.93 0.68
’’ g4 5.75 0.42
DT3 g5 54.33 0.74
DT4 g6 13.24 0.52
DT5 g7 62.41 0.76

Total 183.68

8532 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
pertinent for this answer would be the set {Doc5.g2, Doc7.g6, -
Doc1.g1, Doc5.g4}. In that case 2 of the three data groups in docu-
ment 5 are pertinent. Therefore, the refined set would be
{Doc5, Doc7, Doc1.g1}.
Table 11
Updated frequency table for access 2.

Document types Data group Acum PC2C

DT1 g1 20.41 0.65
DT2 g2 41.88 0.78
’’ g3 11.31 0.56
’’ g4 12.47 0.58
DT3 g5 5.76 0.47
DT4 g6 17.56 0.63
DT5 g7 4.17 0.44

Total 113.57
6.1.2. Patient 2
This patient has a fracture in a rib that needs surgery. Most of

the preparatives have been done (pre-surgery analysis as blood
test and cardiogram) and now he is at the doctor office to check
that everything is right for the surgery. The doctor is a trauma fac-
ultative in his/her office accessing the EHR for the analisis results.
The EHR for this patient is shown in Table 7. The patient has a
chronic disease that makes his defenses be very low all the time.
In this case we consider that by medical recommendation, one type
of document groups (the defenses analysis inside the blood test) is
specially important for that patient: the data group g3 in document
type DT6. Hence, for this type we consider the static pertinence gi-

ven by doctor to this document type for this patient PPDC

� �
with the

value PPDC ¼ 0:7 that means this type of documents is especially
important for this patient independently from the access context.

As in the previous case, we identify the context using the rule
base stored in the system. The first rule that is satisfied in this case
is the next:

IF Speciality = Trauma AND

Present_visit=‘‘pre-surgery’’
THEN Context=‘‘trauma pre-surgery review’’ (C2)

Under this assumption, an access from context C2 will result in
the set of pertinences shown in Table 8. As in the previous exam-
ple, we have first selected only one document of each type, consid-
ering the newer one if there are several for a type.
Table 8
Pertinences of patient 2’s EHR for context 2.

Doc. Date Document type Data group PC2DC P
P
DC

PT PC2C P
C2
G

Rank

1 01/09/10 DT4 g6 0.00 0.00 1.00 0.64 0.64 3
2 01/09/10 DT1 g1 0.00 0.00 1.00 0.60 0.60 4
3 25/08/10 DT2 g2 0.00 0.00 0.79 0.79 0.79 1
’’ ’’ ’’ g3 0.00 0.70 0.79 0.57 0.70 2
’’ ’’ ’’ g4 0.00 0.00 0.79 0.58 0.58 5
4 01/07/10 DT3 g5 0.00 0.00 0.88 0.48 0.48 6
6 20/06/09 DT5 g7 0.00 0.00 0.73 0.44 0.44 7

Table 9
Pertinences for patient 2’s EHR for context 1.

Doc. Date Document type Data group PC1DC P
P
DC

PT PC1C P
C1
G

Rank

1 01/09/10 DT4 g6 0.00 0.00 1.00 0.52 0.52 4
2 01/09/10 DT1 g1 0.00 0.00 1.00 0.42 0.42 5
3 25/08/10 DT2 g2 0.00 0.00 0.79 0.37 0.37 7
’’ ’’ ’’ g3 0.00 0.70 0.79 0.69 0.70 3
’’ ’’ ’’ g4 0.00 0.00 0.79 0.42 0.42 6
4 01/07/10 DT3 g5 0.00 0.00 0.88 0.74 0.74 1
6 20/06/09 DT5 g7 0.00 0.00 0.73 0.75 0.73 2
In this access, as Table 8 shows, the pertinent data groups found
are {Doc6.g10, Doc3.g2, Doc3.g3, Doc1.g6, Doc2.g1}. If we refine the
answer considering the complete documents, the returned set
would be {Doc3, Doc1, Doc2}. In this answer the static pertinence
has been very important. Without this information the final perti-
nence for Doc3.g3 would be 0.57 and only the data group g2 would
have been returned.

If the access to patient 2’s EHR would be done from context C1,
the result would be different. Table 9 collects the pertinence val-
ues. Following the same process as in previous example, the an-
swer of the system would be {Doc4, Doc6, Doc3.g3, Doc1}.

6.2. Example of update

In the previous sections we have shown the answer of the
system when a doctor accesses an EHR. Once the doctor selects a


B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8533
document the system updates automatically the dynamic perti-
nences. In this section we show two examples of these updates.

First, suppose that the system returns a set of documents and
the user selects one of these documents. Considering the patient
1 and context C1, let’s assume the Doc1.g2 is chosen, on 27th of
September of 2010. The frequency table is updated and all the pert-
inences for this context are updated too. According to the date, the
weight W(D) for Doc1.g2 is 6.68 and the new frequency table (with
the corresponding dynamic pertinences) is shown in Table 10. The
values that have changed are in italic. In this case two pertinences
have changed and they will be taken into account for the future
accesses.

If the user selects a document that is not in the data set, for
example Doc2.g1 in C2, the process is similar and the result is
shown in Table 11. In this case the pertinences for five of the data
groups change too. The pertinence of this type of document
(Doc2.g1) has increased to reflect the access to this document
and the other four has been decrease.
7. Implementation

As we have mentioned before, in this kind of system the effi-
ciency is very important. In this section we briefly comment the
implementation for our proposal and we analyze the efficiency of
the final system.

Next we explain the implementation of the two most relevant
elements in the system, the rule base used to identify the context
and the frequency table to select its pertinent documents, as well
as the processes needed to access them.

7.1. Rule base

Each time a hospital’s staff member accesses the EHR sys-
tem, the first step is to identify the context of the access. As
mentioned in Section 3.2, the rule base and the process to se-
lect the context have been implemented inside the data base
so with just a single, simple and fast query the answer to this
question is obtained.

The rule base is implemented using one table inside the data-
base. Each record represents a rule, and we store the order learnt
by RIPPER, the values for each attribute presented in Section 3.2,
and the context. If one rule has no value for an attribute then a
NULL value is stored. Therefore the table has 9 columns (one for
the rule order, seven for the attributes and one for the context)
and near two hundreds records.

When an access occurs, the context is identified by scanning the
table looking for the first rule satisfied. The SQL sentence that gets
the context in this way uses the 7 attributes of the access to build a
select clause for Oracle DBMS as follows:

SELECT context FROM RuleBase WHERE

(Speciality=‘‘value1’’ or Speciality IS NULL) and

(position=‘‘value2’’ or position IS NULL) and. . .

and (last_visit=‘‘value7’’ or last_visit IS NULL)
ORDER BY ord ASC HAVING ROWNUM<=1

The efficiency of the query is improved defining indexes on the
table over the seven attributes. Therefore the time needed to an-
swer the query is very small.

7.2. Frequency table

The other table needed is the frequency table, used to know the
pertinent documents for each context. This table is stored in the
database too, with the following attributes:
� DG: internal code for each data group type.
� Context: Context
� Weight: Total weight for all the accesses to this data group type

(DC) in this context (element tfXC in Eq. (4)).
� RP, DcP: the static pertinences (established by regulations and

doctors) for the data group to the particular context
respectively.

The primary key is (DG, Context). In the DG domain we include a
new code (�1) that do not represent any data group type. This va-
lue will be used to store the aggregation of all the weights for all
the data group types in a particular context. This value is the ele-
ment tfC in Eq. (4). The number of records stored in the table is
jcontextsj� (jDGj + 1).

To improve the queries efficiency we define indexes over both
attributes individually and together (3 indexes). To improve the
query over the table we define clusters on the table (physical
blocks to store related records) considering the context value (all
the records storing the frequency of each data group type in a par-
ticular context are stored physically together).

7.2.1. Query for the pertinent data groups
To know the frequency of the five most pertinent data group

types to a concrete context we only need to execute the following
select sentence:

SELECT DG, Weight FROM FrequencyTable WHERE

context = C

ORDER BY Weight DESC HAVING ROWNUM<=6
The first row will give the value of tfC and the five next rows will
give the values tfXC for the five most pertinent data group types. Let
us note that the query is very simple and needs a very low execu-
tion time.

The five most relevant data groups for a concrete patient in a
particular context are obtained using the value of the function
PCGðXÞ and ordering the data groups according to its value. To speed
up the process we have implemented the function inside the data-
base using PL/SQL (the Oracle language for stored functions and
procedures) as PGC. With these values the query has to join the fre-
quency table and the EHR table, execute the function and order the
data groups. Let be pid the ID for the patient, C the related context,
TOTAL the value of tfC, and RP, DcP and PP the static pertinence (reg-
ulations, doctor and patients respectively) of this context for this
particular patient. Then the select query has the following
structure:

SELECT ⁄,
PGC (FrequencyTable.DcP,FrequencyTable.RP,DPP,

FrequencyTable.Weight,TOTAL,EHR.Date)

as Pertinence FROM FrequencyTable,EHR WHERE

EHR.PID = pid and

FrequencyTable.Context = C and
/⁄ JOIN ⁄/
EHR.DG = FrequencyTable.DG

/⁄ Order ⁄/
ORDER BY Pertinence DESC HAVING ROWNUM<=5

With one query to know the values of tfC (very simple) and this
second query we get the pertinent data groups. All the computa-
tion is made inside the database server so this process does not
introduce any computation overload on the terminal.

7.2.2. Update of the frequency table
There are two processes that change the frequency table:


8534 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
� Each time a data group is acceded (very often).
� Each time the reference date changes (once a year) the frequen-

cies stored are updated.

For the first update is performed by executing just one sentence.
Let be id the code for the data group, C the context and W(A) the
weight for this new access. Then the update sentence is:

UPDATE FrequencyTable SET Weight = Weight + W(A)

WHERE DG in (id,�1) AND Context = C;

As can be seen this sentence is simple a very fast to execute
since only two records are modified: the one related to the data
group and the value of tfC.

The second update will be done once a year (at most) and it will
imply to modify all the records in Frequency table. As we have men-
tioned in Section 4.3, changing the reference date by one year the
update sentence would be:

UPDATE FrequencyTable SET Weight = Weight/2;

Though this update is more time consuming than the previous
one, it is assumable since it will only be executed once a year
and just blocks one table for a very simple processing.
8. Results

To test our proposal we have designed the test explained
next. For each context we have selected a set with the most
frequent assistance acts. Then we have monitored these access
for a month to get the data groups accessed on each case. Then
we have performed the accesses using the new system and we
have counted the number of times that all the information
needed was present within the first selection of pertinent data
groups, the number of occasions in which it has been needed
to access the second set of pertinent data groups, the same with
the third set, and so on. In the 81.56% of the accesses all the
information requested was in the first set, in the 4.12% of the
cases, the needs of information where satisfied with the second
set of data groups, the 1.91% of the accesses needed to reach a
set of data groups in the third, and finally in the remaining
12.41% of the cases it was necessary to reach at or over the
fourth block.

To compare the performance of the proposed system regarding
the previous one, we have compared the number of ‘‘clicks’’ re-
quired to obtain all the information needed in both systems
(we compare the number of clicks and not the time, because in
the navigation based systems the time depends on the skills of
the user). To do it we have considered that each access to a set
of data groups requires one click in the new system. In the old
system, we have considered that each document acceded needed
one click (to open the document) and that the search of each doc-
ument at least needed three clicks (to select the type of assistance
act, the concrete assistance act and the set of documents gener-
ated on it, as shown in Fig. 2). According to it, and taking as an
example an assistance act that needs data groups of information
contained in five different documents, with the traditional navi-
gation systems 20 clicks would be needed; whereas with the
new one, according to the previous percentages, in the 81.56%
of the cases only one click is needed. Just with this data it can
be appreciated the great gain of time that the proposed system
offers.

Some possible improvements that can be made in the system
are the next:
� Adapt the size of the sets of data groups showed in each priority
level, according to the context. This way the contexts like a pre-
surgery study, that usually need to access more than 10 data
groups, would show in each level sets of 20 or 30 data groups;
whereas other contexts could keep smaller sets, like a simple
outpatient consultation. It would only require to add a column
to the table of contexts.
� Considering that in Section 3.2 we mentioned that the context

identification had a percentage of success of the 89.32% it is
possible that some of the accesses over the third set of pertinent
documents were due to a bad identification of the context. Con-
sidering that the training was made with only a participation of
the 24%, improving the training with a higher participation the
identification of the context could be better.

Finally, we must remark that the system is still on its develop-
ment and improvement stage, so it is not still completely im-
planted. Anyway the doctors that have tested it have commented
that it is ‘‘very useful’’, ‘‘quite comfortable’’ and, on top of it, ‘‘really
very time saving’’.
9. Conclusions

In this paper we have proposed a new paradigm to access to the
EHR systems, based on a contextualized access to the information,
in such a way that the information is ordered according to their
preference or relevance for the assistance act or pathologic process
in which the medical staff in involved. It is therefore specifically
thought to improve the availability of information for medical
practice, satisfying their specific needs and offering with it a faster
and more efficient access to the really relevant information for a
concrete assistance act, avoiding the handling of a huge quantity
of information superfluous or unnecessary for it.

We have also presented a method defined the contexts and also
to identify them each time a doctor accesses the system, based on a
rule system stored in a data base. In addition we have proposed a
technique to efficiently define the pertinence of data groups to the
context and update it on each new access, so the system can auto-
matically adapt to changes on the access patterns and to the new
doctor’s needs of information.

In addition, we have shown some details of the implementation
as well as options to improve the accessibility. We have also
showed some examples of how the system works and the results
obtained are quite encouraging as the statistics obtained in the test
of the system remak, as well as the satisfied opinion of the medical
staff that have used it.

With this proposal several new research lines have been
opened, since we enable and make easier to provide the system
with new capabilities. This is the case of the knowledge mobiliza-
tion, where the contextualized access solves the limitation in the
use of mobile devices to perform complex accesses to great vol-
umes of information, since with this access the few data really
needed for a concrete assistance act are easily available and visual-
ized in this type of devices, being unnecessary the navigation
through the EHR.

Another example of this advantage is the possibility of provid-
ing the system with new functionalities for research purposes (El
Fadly et al., 2010), just defining the corresponding contexts. In
addition it has open the possibility to provide the citizens with a
personalized access to their own medical data which, as Charters
(2009) and Ruland, Brynhi, Andersen, and Bryhni (2008) indicate,
it is a growing demand. Moreover, this new access paradigm can
be applied to enable the interoperability between health record
systems, by using the contexts to define the archetypes that the
ISO 13606 requires.


B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536 8535
Finally, we would like to remark the viability of this proposal,
that is proved with the actual programming line in the San Cecilio
University Hospital from Granada.

Acknowledgment

The research reported in this paper was partially supported by
the Andalusian Government (Junta de Andalucía) under project
P07-TIC03175 ‘‘Representacin y Manipulación de Objetos Imper-
fectos en Problemas de Integracin de Datos: Una Aplicación a los
Almacenes de Objetos de Aprendizaje’’ and also by the Spanish
Government (Science and Innovation Department) under project
TIN2009-08296. We would also like to thank their collaboration
to the medical personnel that is participating in the development
of the system.

References

Adams, A., Adams, R., Thorogood, M., & Buckingham, C. (2007). Barriers to the use of
e-health technology in nurse practitioner-patient consultations. Informatics in
Primary Care, 15(2), 103–109.

Bisbal, J., & Berry, D. (2009). An analysis framework for electronic health record
systems. Methods of Information in Medicine, 48(1).

Bobillo, F., Delgado, M., & Gómez-Romero, J. (2008). Representation of context-
dependant knowledge in ontologies: A model and an application. Expert
Systems with Applications, 35, 1899–1908<http://portal.acm.org/citation.
cfm?id= 1401266.1401463>.

Bohm, K., Wolf, P., & Krcmar, H. (2010). Context oriented structuring of
egovernment services – An empirical analysis of the information demand of
expatriates in germany. In 43rd Hawaii International Conference on System
Sciences (HICSS), 2010 (pp. 1–10).

Buchholz, T., Hochstatter, I., & Linnhoff-Popien, C. (2007). Distribution strategies for
the contextualized mobile internet. Electronic Commerce Research and
Applications, 6(1), 40–52<http://www.sciencedirect.com/science/article/
B6X4K-4K5HV8M-1/2/90c68f322aa155b9f62aaafa2917c25b>.

Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., & Hon, H.-W. (2006). Adapting ranking svm
to document retrieval. In SIGIR’06: Proceedings of the 29th annual international
ACM SIGIR conference on research and development in information retrieval
(pp. 186–193). New York, NY, USA: ACM.

Cayir, S., & Nuri Basoglu, A. (2008). Information technology interoperability
awareness: A taxonomy model based on information requirements and
business needs. In Portland International Conference on Management of
Engineering Technology, 2008, PICMET 2008. (pp. 846 –855).

Chaker, H., Chevalier, M., Soule-Dupuy, C., & Tricot, A. (2010). Improving
information retrieval by modelling business context. In Third international
conference on advances in human-oriented and personalized mechanisms,
technologies and services (CENTRIC), 2010 (pp. 117–122).

Charters, K. (2009). Challenges of electronic medical record extracts for a personal
health record. Studies in Health Technology and Informatics, 146, 197–201.

Cho, I., Kim, J., Kim, J., Kim, H., & Kim, Y. (2010). Design and implementation of a
standards-based interoperable clinical decision support architecture in the
context of the korean ehr. International Journal of Medical Informatics, 79(9),
611–622.

Cho, I., Staggers, N., & Park, I. (2010). Nurses’ responses to differing amounts and
information content in a diagnostic computer-based decision support
application. Computers, Informatics, Nursing: CIN, 28(2), 95–102.

Chu, W., Johnson, D., & Kangarloo, H. (2000). A medical digital library to support
scenario and user-tailored information retrieval. IEEE Transactions on
Information Technology in Biomedicine, 4(2), 97–107.

Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the twelfth
international conference on machine learning (pp. 115–123). Morgan Kaufmann.

Collins, B., & Speedie, S. (2008). Attaching context sensitive infobuttons to an ehr
options and issues. AMIA. In Annual symposium proceedings/AMIA symposium
AMIA Symposium 914.

Data exchange standard (2011). <http://www.hl7.org>. http://<www.HL7.org>.
Del Fiol, G., & Haug, P. (2009). Classification models for the prediction of clinicians’

information needs. Journal of Biomedical Informatics, 42(1), 82–89.
Ehsan, M., Amini, M., & Jalili, R. (2009). Handling context in a semantic-based access

control framework. In International conference on advanced information
networking and applications workshops, 2009, WAINA’09 (pp. 103 –108).

El Fadly, A., Lucas, N., Rance, B., Verplancke, P., Lastic, P.-Y., & Daniel, C. (2010). The
reuse project: Ehr as single datasource for biomedical research. Studies in Health
Technology and Informatics, 160(Pt. 2), 1324–1328<http://www.biomedsearch.
com/nih/REUSE-project-EHR-as-single/20841899.html>.

Erdal, S., Catalyurek, U., Payne, P., Saltz, J., Kamal, J., & Gurcan, M. (2009). A
knowledge-anchored integrative image search and retrieval system. Journal of
Digital Imaging: the Official Journal of the Society for Computer Applications in
Radiology, 22(2), 166–182.
Flores Zuniga, A., Win, K., & Susilo, W. (2010). Functionalities of free and open
electronic health record systems. International Journal of Technology Assessment
in Health Care, 26(4), 382–389.

Gagnon, M., Desmartis, M., Labrecque, M., Lgar, F., Lamothe, L., Fortin, J., et al.
(2010). Implementation of an electronic medical record in family practice: A
case study. Informatics in Primary Care, 18(1), 31–40.

Garcia-Morchon, O., & Wehrle, K. (2010). Efficient and context-aware access control
for pervasive medical sensor networks. In 8th IEEE international conference on
pervasive computing and communications workshops 2010 (PERCOM Workshops)
(pp. 322–327).

Gil-Leyva, I., & Rodríguez-Mu ~noz, J. (1966). Tendencias en los sistemas de indizacin
automtica. estudio evolutivo. Revista Espaola de Documentacion Cientffica, 19(3),
273–291.

Ginsburg, M. (2007). Pediatric electronic health record interface design: The pedone
system. In 40th annual hawaii international conference on system sciences, 2007,
HICSS 2007 (pp. 139).

Image storage standard. (2011). <http://www.medical.nema.org>.
ISO-13606 (2008). Iso 13606: Electronic health record communication.
Järvelin, K., & Kekäläinen, J. (2000). Ir evaluation methods for retrieving highly

relevant documents. In SIGIR’00: Proceedings of the 23rd annual international
ACM SIGIR conference on Research and development in information retrieval
(pp. 41–48). New York, NY, USA: ACM.

Jerding, D., & Stasko, J. (1998). The information mural: a technique for displaying
and navigating large information spaces. IEEE Transactions on Visualization and
Computer Graphics, 4(3), 257–271.

Jones, K. S. (2004). A statistical interpretation of term specificity and its application
in retrieval. Journal of Documentation, 60, 493–502.

Judd, G., & Steenkiste, P. (2003). Providing contextual information to pervasive
computing applications. In Proceedings of the First IEEE international conference
on pervasive computing and communications, 2003. (PerCom 2003) (pp. 133–142).
doi:10.1109/PERCOM.2003.1192735.

Jung, J. J. (2009). Contextualized query sampling to discover semantic resource
descriptions on the web. Information Processing & Management, 45(2),
280–287<http://www.sciencedirect.com/science/article/B6VC8-4V8FFJY-/2/
6f9094e73f868971acee4f7cfb78f225>.

Kang, D., Lee, H., Ko, E., Kang, K., & Lee, J. (2006). A wearable context aware
system for ubiquitous healthcare. In Conference proceedings: Annual
international conference of the IEEE Engineering in Medicine and Biology
Society. IEEE Engineering in medicine and biology society conference (Vol. 1,
pp. 5192–5195).

Kanoulas, E., Pavlu, V., Dai, K., & Aslam, J. (2010). Modeling the score distributions of
relevant and non-relevant documents. Advances in Information Retrieval Theory,
152–163.

Karahoca, A., Bayraktar, E., Tatoglu, E., & Karahoca, D. (2010). Information system
design for a hospital emergency department: A usability analysis of software
prototypes. Journal of Biomedical Informatics, 43(2), 224–232.

Lahteenmaki, H., Leppanen, J., & Kaijanranta, J. (2009). Interoperability of personal
health records. In Conference proceedings: Annual international conference of the
IEEE Engineering in Medicine and Biology Society. IEEE engineering in medicine and
biology society conference (p. 1726-9).

Mao, J.-Y., & Benbasat, I. (2001). The effects of contextualized access to knowledge
on judgement. International Journal of Human–Computer Studies, 55(5),
787–814<http://www.sciencedirect.com/science/article/B6WGR-4582BWS-7/
2/cb7b598f793b5017ca950d7e785f369c>.

McAlearney, A., Robbins, J., Hirsch, A., Jorina, M., & Harrop, J. (2010). Perceived
efficiency impacts following electronic health record implementation: An
exploratory study of an urban community health center network.
International Journal of Medical Informatics, 79(12), 807–816.

Nagy, M., Hanzlfcek, P., Preckov, P., Rfha, A., Dioszegi, M., Seidl, L., & Zvrov, J. (2010).
Semantic interoperability in czech healthcare environment supported by hl7
version 3. Methods of Information in Medicine, 49(2), 186–195<http://
www.ncbi.nlm.nih.gov/pubmed/19936441>.

Nagy, M., Preckova, P., Seidl, L., & Zvarova, J. (2010b). Challenges of interoperability
using hl7 v3 in czech healthcare. Studies in Health Technology and Informatics,
155, 122–128.

Nomenclature medical standard, S., 2011. <http://www.nlm.nih.gov/research/
snomad/snomad_main.html>.

Open-EHR open electronical health records, 2011. <http://www.openehr.org>.
<www.openehr.org>.

Prados, M., & Peña, M. (2003). Sistemas de Informacion hospitalarios. Organizacion
y gestion de Proyectos. EASP.

Prados de Reyes, M., Carmen Peña Yáñez, M. A. V. M., & Suárez, M. B. P. (2006).
Generation and use of one ontology for intelligent information retrieval from
electronic record histories.

Prados, M., Peña, M., Prados, B., Martinez, B., Ortigosa, J., & Delgado, A. (2010).
Electronical health record (ehr) representation through ontology, mobility,
accesibility and interoperability usefulness.

Prados-Suárez, B., Revuelta, E., Peña Yañez, C., Molina Fernàndez, C. (2008).
Ontology based semantic representation of the reports and results in a
hospital information system. In Proceedings of the ICEIS 2008 (pp. 300–306).

Pung, H. K., Gu, T., Xue, W., Palmes, P. P., Zhu, J., Ng, W. L., Tang, C. W., & Chung, N. H.
(2009). Context-aware middleware for pervasive elderly homecare. IEEE Journal
on Selected Areas in communications, Institute of Electrical and Electronics
Engineers Inc., The 27, pp. 510–524.

http://portal.acm.org/citation.cfm?id=1401266.1401463
http://portal.acm.org/citation.cfm?id=1401266.1401463
http://www.sciencedirect.com
http://www.sciencedirect.com
http://www.hl7.org
http://www.HL7.org
http://www.biomedsearch.com/nih/REUSE-project-EHR-as-single/20841899.html
http://www.biomedsearch.com/nih/REUSE-project-EHR-as-single/20841899.html
http://www.medical.nema.org
http://dx.doi.org/10.1109/PERCOM.2003.1192735
http://www.sciencedirect.com
http://www.sciencedirect.com
http://www.sciencedirect.com
http://www.sciencedirect.com
http://www.ncbi.nlm.nih.gov/pubmed/19936441
http://www.ncbi.nlm.nih.gov/pubmed/19936441
http://www.nlm.nih.gov/research/snomad/snomad_main.html
http://www.nlm.nih.gov/research/snomad/snomad_main.html
http://www.openehr.org
http://www.openehr.org


8536 B. Prados-Suárez et al. / Expert Systems with Applications 39 (2012) 8522–8536
Ray, J. & Wimalasiri, P. (2006). The need for technical solutions for maintaining
the privacy of ehr. In Conference proceedings: Annual international conference of
the IEEE Engineering in Medicine and Biology Society. IEEE engineering in medicine
and biology society conference (Vol. 1, pp. 4686–4689).

Ross, S. (2009). Results of a survey of an online physician community regarding use
of electronic medical records in office practices. The Journal of Medical Practice
Management: MPM, 24(4), 254–256.

Ruland, C., Brynhi, H., Andersen, R., & Bryhni, T. (2008). Developing a shared
electronic health record for patients and clinicians. Studies in Health Technology
and Informatics, 136, 57–62.

Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic
indexing. Communications of the ACM, 18(11), 613–620.
Svanaes, D., Das, A., & Alsos, O. (2008). The contextual nature of usability and its
relevance to medical informatics. Studies in Health Technology and Informatics,
136, 541–546.

Vest, J., & Jasperson, J. (2010). What should we measure? Conceptualizing usage in
health information exchange. Journal of the American Medical Informatics
Association: JAMIA, 17(3), 302–307.

Vishwanath, A., Singh, S., & Winkelstein, P. (2010). The impact of electronic medical
record systems on outpatient workflows: A longitudinal evaluation of its
workflow effects. International Journal of Medical Informatics, 79(11), 778–791.

Weinstock, M. (2010). For hospitals and meaningful use, context is everything.
Hospitals & Health Networks AHA, 84(8), 20–21.


	Improving electronic health records retrieval using contexts
	1 Introduction
	2 Background
	2.1 Electronic Health Records structure
	2.2 Data groups
	2.3 EHR information system

	3 Contexts
	3.1 Context definition
	3.2 Context identification

	4 Pertinence
	4.1 Static pertinence: regulations, doctors and patients
	4.2 Time pertinence
	4.3 Dynamic pertinence
	4.4 Global pertinence

	5 Contextualized access system
	5.1 Access to the system
	5.2 Update process

	6 Example
	6.1 Examples of retrieval
	6.1.1 Patient 1
	6.1.2 Patient 2

	6.2 Example of update

	7 Implementation
	7.1 Rule base
	7.2 Frequency table
	7.2.1 Query for the pertinent data groups
	7.2.2 Update of the frequency table


	8 Results
	9 Conclusions
	Acknowledgment
	References