SJNL1038-02-DO00020673.tex


UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

A Grid-Based Hiv Expert System

Sloot, P.M.A.; Boukhanovsky, A.V.; Keulen, W.; Tirado Ramos, A.; Boucher, C.A.B.
DOI
10.1007/s10877-005-0673-2
Publication date
2005

Published in
Journal of Clinical Monitoring and Computing

Link to publication

Citation for published version (APA):
Sloot, P. M. A., Boukhanovsky, A. V., Keulen, W., Tirado Ramos, A., & Boucher, C. A. B.
(2005). A Grid-Based Hiv Expert System. Journal of Clinical Monitoring and Computing, 19(4-
5), 263-278. https://doi.org/10.1007/s10877-005-0673-2

General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)
and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open
content license (like Creative Commons).

Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please
let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material
inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter
to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You
will be contacted as soon as possible.

Download date:06 Apr 2021

https://doi.org/10.1007/s10877-005-0673-2
https://dare.uva.nl/personal/pure/en/publications/a-gridbased-hiv-expert-system(f4b256de-6fec-4b27-9806-31fe6822b5eb).html
https://doi.org/10.1007/s10877-005-0673-2


UN
CO

RR
EC

TE
D

PR
OO

F

TECHBOOKS Journal: JOCM MS Code: CH1 PIPS No: DO00020673 DISK 22-8-2005 18:20 Pages: 16

Journal of Clinical Monitoring and Computing (2005) xxx: 1–16 C© Springer 2005

A GRID-BASED HIV EXPERT SYSTEM1
Peter M.A. Sloot,1 Alexander V. Boukhanovsky,2

Wilco Keulen,3 Alfredo Tirado-Ramos,1 and
2

3

Charles A. Boucher44

From the 1Section Computational Science, University of
Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands,
2Institute for High Performance Computing and Information Sys-
tems, Bering St, 38, St. Petersburg, Russia, 3Virology Educa-
tion, 69042 Utrecht, The Netherlands, 4University Medical Center,
University of Utrecht, 3508 GA Utrecht, The Netherlands.

Received—, and in revised form—. Accepted for publication—.

Based on “A Grid-based HIV Expert System”, by P.M.A. Sloot, A.V.
Boukhanovsky, W. Keulen, and C.A. Boucher, which appeared in
the IEEE/ACM International Symposium on Cluster Computing
and the Grid, Cardiff, UK, May 9-12, 2005. c©2005 IEEE.
Address correspondence to Peter M.A. Sloot, Section Computa-
tional Science, University of Amsterdam, Kruislaan 403, 1098 SJ
Amsterdam, The Netherlands
E-mail: sloot@science.uva.nl

Sloot P MA, Boukhanovsky AV, Keulen W, Tirado-Ramos A, Boucher
CA. A grid-based HIV expert system.

J Clin Monit 2005; xxx: 1–16

ABSTRACT. Objectives. This paper addresses Grid-based in- 5
tegration and access of distributed data from infectious dis- 6
ease patient databases, literature on in-vitro and in-vivo phar- 7
maceutical data, mutation databases, clinical trials, simulations 8
and medical expert knowledge. Methods. Multivariate analyses 9
combined with rule-based fuzzy logic are applied to the inte- 10
grated data to provide ranking of patient-specific drugs. In addi- 11
tion, cellular automata-based simulations are used to predict the 12
drug behaviour over time. Access to and integration of data is 13
done through existing Internet servers and emerging Grid-based 14
frameworks like Globus. Data presentation is done by standalone 15
PC based software, Web-access and PDA roaming WAP access. 16
The experiments were carried out on the DAS, a Dutch Grid 17
testbed. Results. The output of the problem-solving environ- 18
ment (PSE) consists of a prediction of the drug sensitivity of the 19
virus, generated by comparing the viral genotype to a relational 20
database which contains a large number of phenotype-genotype 21
pairs. Conclusions. Artificial Intelligence and Grid technology 22
is effectively used to abstract knowledge from the data and pro- 23
vide the physicians with adaptive interactive advice on treatment 24
applied to drug resistant HIV. An important aspect of our research 25
is to use a variety of statistical and numerical methods to iden- 26
tify relationships between HIV genetic sequences and antiviral 27
resistance to investigate consistency of results. 28

KEY WORDS. grid, HIV, PSE, expert system, artificial intelligence, 29
bio-statistics. 30

1. INTRODUCTION 31

1.1. Motivation 32

Forty two million people worldwide have been infected 33
with HIV and 12 million have died, over the last 20 years. 34
Figure 1 shows the pan-epidemic extent of HIV infections. 35

Effective antiretroviral therapy has lead to sustained HIV 36
viral suppression and immunological recovery in patients 37
who have been infected with the virus. The incidence of 38
AIDS has declined in the Western world with the intro- 39
duction of effective antiretroviral therapy, though questions 40
on “When to start treatment? What to start with? How to 41
monitor patients?” remain heavily debated. Adherence to 42
antiretroviral treatment remains the cornerstone of effec- 43
tive treatment, and failure to adhere is the strongest pre- 44
dictor of virological failure. Long-term therapy can lead to 45
metabolic complications. Other treatment options are now 46
available, with the recent introduction to clinical practice 47
of fusion inhibitors, second-generation non-nucleoside re- 48
verse transcriptase inhibitors, and nucleotide reverse tran- 49
scriptase inhibitors. The sheer complexity of the disease, 50

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

2 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

Fig. 1. Worldwide spread of HIV infections, history and near future per-
spective.

the distribution of the data, the required automatic updates51
to the knowledgebase and the efficient use and integration52
of advanced statistical and numerical techniques necessary53
to assist the physician motivated us to explore the novel54
possibilities supported by Grid technology.55

In this position paper we describe ongoing research in56
our 3 laboratories (Utrecht, St. Petersburg and Amsterdam)57
addressing the development of a Grid based medical deci-58
sion support system. The goal of the research is to investi-59
gate novel computational methods and techniques that sup-60
port the development of a user friendly integrated support61
system for physicians. We use emerging Grid-technology62
to combine data discovery, data mining, statistical analyses,63
numerical simulation and data presentation [1].64

The paper is organized as follows. Chapter 2 describes65
the background of HIV research and a prototypical rule-66
based approach to data analyses. In chapter 3 we give an67
overview of the two computational techniques we study68
to understand the temporal variability of HIV populations69
through stochastical modeling and the evolution of HIV70
infection and the onset of AIDS through Cellular Automata71
(CA) modeling. Chapter 4 describes a first approach to72
advanced data presentation through roaming devices such73
as Personal Digital Assistants (PDA’s).74

1.2. Background75

1.2.1. Clinical aspects of HIV76

The clinical management of patients infected with Human77
Immunodeficiency Virus (HIV) is based on studies on the78
pathogenesis of the disease and the results of trials evaluat-79
ing the effects of anti-HIVdrugs. Retrospective analysis of80
large cohorts has identified laboratory markers for disease81

progression, such as the amount of virus (HIV-RNA) and 82
the number of T helper cells (CD4 + cells) in blood. In ad- 83
dition the results of prospective drug trials have generated 84
data on effectiveness of individual drugs and drug combi- 85
nations and the effect of drug resistant viruses on therapy 86
outcome. Currently clinicians are limited in the practical 87
use of this information because in most cases they are only 88
provided with statistical relationships between individual 89
parameters and disease or therapy outcome. Large data sets 90
have not been analyzed and made available in such a way 91
that it allows a clinician to use the available data in more 92
clinical settings. The availability of large databases and the 93
development of innovative data mining approaches create 94
the opportunity to develop systems which allow the prac- 95
ticing clinician to determine the risk profile for disease 96
development, or the change or success for a given regimen 97
for his individual patients. Such a system will determine the 98
rate of success for different drug regimens by taking into 99
account the effect and interaction of all relevant laboratory 100
and clinical parameters and by comparing the results for 101
similar patients available in the database. 102

Currently there are fifteen drugs licensed for treatment of 103
individuals infected with HIV. These drugs belong to two 104
classes, one inhibiting the viral enzyme reverse transcrip- 105
tase and another inhibiting the viral protease. These drugs 106
are used in combination with therapy to maximally inhibit 107
viral replication and decrease HIV-RNA to below levels of 108
detection levels (currently defined as below 50 copies per 109
ml) in blood. Treatment with drug combinations is suc- 110
cessful in inhibiting viral replication to undetectable levels 111
in only 50% of the cases. In the remaining 50% of cases 112
viruses can be detected with a reduced sensitivity to one 113
or more drugs from the patients’ regimen. The molecular 114
base for resistance has been, and still is, focus of extensive 115
research. Over 80 amino acid positions in the viral enzyme 116
reverse transcriptase (RT) and 40 positions in the protease 117
enzyme can undergo changes when exposed to selective 118
drug pressure in vitro or in vivo. For some drugs, at cer- 119
tain positions, a change towards a specific new amino acid 120
is seen. At other positions several alternative amino acids 121
may appear and cause (variable) levels of resistance to one 122
or more drugs. In theory, therefore, an infinite number 123
of combinations of amino acid changes could appear and 124
cause resistance in vivo. Preliminary clinical observations 125
however show that specific amino acid changes at a limited 126
number of positions and a limited number of combina- 127
tions prevail. In addition to changing drug sensitivity some 128
amino acid changes may also influence the replication po- 129
tential of HIV. Amino acids selected initially during a failing 130
regimen cause resistance to the drugs the patient is taking, 131
but at the same time may decrease the capacity of the virus 132
to replicate. Changes appearing later do not function to 133
further increase resistance but merely function to restore 134

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 3

the capacity of the virus to replicate (“viral fitness”). Sev-135
eral clinical studies have been performed recently to evalu-136
ate the clinical benefit of resistance-guided therapy. These137
studies show that a better virological response is obtained138
in patients who are failing their therapy, when their new139
regimen is chosen on the basis of their resistant profile.140
In three out of the four studies from last year the results141
showed that if new regimens were selected on the basis of142
the mutations (viral resistance genotype) the results were143
better as compared to standard care approaches. Currently,144
the basis for clinical interpretation of the viral genotype is145
based on data sets relating mutations to changes in drug sen-146
sitivity, and/or data sets directly relating mutations present147
in the virus to clinical responses to specific regimens. Ini-148
tially, experts compared the observed mutations to lists of149
published sequences taken from the literature, and based150
on this comparison would select a regimen.151

1.2.2. Prototype support system152

Recently, first generation bioinformatics software pro-153
grams have been developed to support clinicians. Examples154
of such systems are the Virtual Phenotype developed by155
Virco NV, and a first generation decision support system156
(Retrogram TM) developed by Virology Networks BV in157
collaboration with parts of our research team. The out-158
put of these programs consists of a prediction of the drug159
sensitivity of the virus generated by comparing the viral160
genotype to a relational database containing a large num-161
ber of phenotype-genotype pairs. The Retrogram decision162
software interprets the genotype of a patient by using rules163
developed by experts on the basis of the literature, taking164
into account the relationship of the genotype and phe-165
notype. In addition, it is based on (limited) available data166
from clinical studies and on the relationship between the167
presence of genotype directly to clinical outcome. It is im-168
portant to realise however that these systems focus on bio-169
logical relationships and are limited to the role of resistance.170
The next step will be to use clinical databases and inves-171
tigate the relationship between the viral resistance profile172
(mutational profile and/or phenotypic data) and therapy173
outcome measures such as amount of virus (HIV-RNA)174
and CD4+ cells. A summary of the flow of data is shown175
in Figure 2.176

1.2.3. Data collection177

Large high quality clinical and patient databases are used178
to explore the relationships described above and to de-179
velop a first prototype matching system. The Athena co-180
hort is a large Dutch observational clinical cohort study181

Fig. 2. From molecule to man: Hierarchical data flow model for infectious
diseases.

aiming at the surveillance of antiretroviral treatment sup- 182
ported by the government. The cohort consists of 3000 183
patients from whom data are centrally collected through a 184
decentralized data entry system. Within the cohort 600 pa- 185
tients are studied intensively, whose phenotypic and geno- 186
typic data, drug levels and CD4+ and HIV-RNA patterns 187
are collected. Phenotype, genotype, viral fitness and drug 188
levels as CD4+ and HIV-RNA patterns will be collected 189
from two large international trials (sponsored by Roche 190
Pharmaceuticals), evaluating the effect of a new fusion in- 191
hibitor drug (T20), and representing 1000 patients. The 192
third database will be from the international multi-center 193
Great study, sponsored by Virology Networks BV. Within 194
this study the value of the Retrogram decision support 195
program is evaluated and similar parameters as described 196
above will be collected. Within this study 360 patients will 197
be enrolled. 198

The Viradapt study showed that the virological response 199
was better in the patient group in which genotype and rule- 200
based interpretation was used as compared to the standard 201
of care arm [2]. On the basis of these results, a more elabo- 202
rate decision support software system (Retrogram version 203
1.0) was built in collaboration with Virology Networks 204
B.V. This system ranks the efficacy of the antiretroviral 205
drugs within each class. The ranking is based on expert 206
interpretation of two types of data. The software system 207
estimates the drug sensitivity for the fifteen drugs by in- 208
terpreting the genotype of a patient by using mutational 209
algorithms. These mutational algorithms are developed by 210
a group of experts on the basis of the scientific literature, 211
taking into account the published data relating genotype to 212
phenotype. In addition, the ranking is based on data from 213
clinical studies on the relationship between the presence of 214
particular mutations and clinical or virological outcome. 215

The Athena cohort is a large Dutch observational clini- 216
cal cohort study aiming at the surveillance of antiretroviral 217

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

4 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

treatment supported by the Dutch government. The co-218
hort consists of 3000 patients from whom clinical, viro-219
logical, immunological and data on drug side effects are220
centrally collected through a decentralised data entry sys-221
tem. Within this cohort 600 patients are studied intensively,222
phenotypic and genotypic data, drug levels and CD4+ and223
HIV-RNA patterns are collected. From two large interna-224
tional trials (sponsored by Roche Pharmaceuticals) eval-225
uating the effect of a new fusion inhibitor drug (T20),226
representing 1000 patients from whom also phenotype,227
genotype, viral fitness, drug levels as CD4+ and HIV-RNA228
patterns will be collected. The third database will be from229
the international multi-center Great study sponsored by230
Virology Networks BV, within this study the value of the231
Retrogram decision support program is evaluated and sim-232
ilar parameters a described above will be collected, within233
this study 360 patients will be enrolled. Another dataset234
will come from the Italian Musa study, in this trial data will235
be collected from 450 patients followed over a year. Entry236
point to the trial is failing a fist or second regimen, subse-237
quently patients will be genotyped and a new regimen will238
be selected on the basis of Retrogram 1.4 or the Virtual239
Phenotype from Virco (Belgium).240

Throughout the duration of the project we will collect241
additional datasets. These datasets may serve to further re-242
fine our models and first version software and may also be243
use to perform validation studies.244

1.2.4. Data analysis245

The primary goal of the data analysis is to identify pat-246
terns of mutations (or naturally occurring polymorphisms)247
associated with resistance to antiviral drugs and to predict248
the degree of in-vitro or in-vivo sensitivity to available drugs249
from an HIV genetic sequence. The statistical challenges250
in doing such analyses arise from the high dimensional-251
ity of these data. A variety of approaches have been de-252
veloped to handle this type of data, including clustering,253
recursive partitioning, and neural informatics. Neural in-254
formatics is used for synthesis of heuristic models received255
by methods of knowledge engineering, and results of the256
formal multivariate statistical analysis in uniform systems.257
Clustering methods have been used to group sequences258
that are “near” each other according to some measure of259
genetic distance [3]. Once clusters have been identified,260
recursive partitioning can be used to determine the im-261
portant predictors of drug resistance, as measured by in-262
vitro assays or by patient response to antiviral drugs. Prin-263
ciple component analyses can help to identify what are the264
most important sources of variability in the HIV genome.265
An important aspect of our research is to use a variety of266
methods to identify relationships between HIV genetic se-267

quences and antiviral resistance to validate the consistency 268
of results. 269

The molecular sequences of the viral enzymes reverse 270
transcriptase and protease are the micro parameters in the 271
model. In theory an infinite number of combinations of 272
mutations could appear and cause (variable) changes in viral 273
drug sensitivity and viral replication capacity (See also Ta- 274
ble 1). Clinical datasets however show that specific amino 275
acid changes at a limited numbers of positions in a lim- 276
ited number of combinations prevail. HIV-RNA and CD4 277
are the primary parameters determining disease outcome. 278
HIV-RNA, the amount of HIV-RNA genomic copies per 279
ml plasma, has been validated as being highly predictive of 280
clinical outcome. HIV-RNA and CD4+ cell numbers are 281
now the standard endpoint in clinical trials for approval of 282
new antiretroviral drugs. A patient’s HIV-RNA may range 283
between a few hundred to millions of RNA copies per 284
ml plasma. The CD4+ cell numbers in peripheral blood 285
range typically between zero and thousand. Whereas the 286
predictive clinical value of both parameters has been deter- 287
mined initially in untreated individuals, they have also been 288
shown to be of predictive value also for patients under an- 289
tiretroviral therapy. Recently observations have been pub- 290
lished indicating that in some patients under highly active 291
antiretroviral therapy (HAART) a disconnect may occur 292
between the response in HIV-RNA and in CD4 counts. 293
Typically, in these patients a rise in HIV-RNA as conse- 294
quence of incomplete inhibition of viral replication under 295
therapy is not paralleled by a continuous decrease in CD4 296
counts. This disconnect has been explained by a decrease 297

Table 1. Parameters for the data analyses. Here the hierarchical ap-
proach shown in Figure 2 is extended to detail the content of the
parameters

Micro Parameter Protease
Mutations
Reverse
Transcriptas
Mutations

Primary Parameter HIV-RNA CD4
Drug
Resistance

Macro Parameter Meta Parameter:
Virological

Viral Fitness

Meta Parameter:
Clinical

Weight
Opportunistic
Infections and
Tumors
Survival

Intervention Parameter Drug Dosage
Bio-availability
of Drug/Drug
Level

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 5

in the viral replicative capacity (‘viral fitness’) which leads298
to a decrease in capacity to lower CD4 counts.299

The patient’s weight and secondary opportunistic infec-300
tions and/or malignancies are parameters that determine301
disease outcome and survival time. Currently there are fif-302
teen drugs licensed for treatment of individuals infected303
with HIV: More than ten inhibitors have been developed304
which inhibit the reverse transcriptase process. These in-305
hibitors can be classified in two sub-categories that dif-306
fer in the way they inhibit the RT-enzyme, nucleoside307
(analogue) RT-inhibitors (NRTI) and the non-nucleoside308
RT-inhibitors (NNRTI). These compounds inhibit the309
protease enzyme, which acts much later on in the HIV310
replication cycle than reverse transcriptase.311

The protease is responsible for cleaving a long poly-312
protein into smaller functional proteins. The overall ex-313
posure to antiretroviral drugs has been shown to be an314
important factor for the degree of success for a given ther-315
apy. The overall exposure can be captured by parameters316
as dosage and bio-availability which will codetermine the317
drug level within an individual patient. Given the relation-318
ships between exposure and antiviral efficacy, variability in319
drug levels (which may be due to differences in patient320
adherence to their regimens) will contribute to virologi-321
cal and immunological outcome. Individuals with relatively322
low exposure are more likely to experience virological fail-323
ure than those with a high exposure.324

2. METHODS AND MATERIALS325

2.1. Modeling the dynamics and temporal variability326
of HIV-1 populations327

In addition to rule based and parameter based decision sup-328
port we developed statistical models and cellular automata329
based models to study the dynamics of the HIV popula-330
tions. These 2 numerical models run on Grid-resources.331
The output is integrated with the medical support system332
and accessible to the end-user. In this paragraph we briefly333
outline the two computational methods. Details are be-334
yond the scope of this paper; we refer to the references335
provided.336

2.1.1. A cellular automata model to study the evolution337
of HIV infection and the onset of AIDS338

A cellular automata model to study the evolution of HIV339
infection and the onset of AIDS is developed. The model340
takes into account the global features of the immune re-341
sponse to any pathogen, the fast mutation rate of the HIV,342

and a fair amount of spatial localization, which may occur 343
in the lymph nodes. The dynamics of the cellular automata 344
requires high throughput computing, which is provided by 345
the resource management of the Grid. In this section, we 346
employ non-uniform Cellular Automata (CA’s) to simulate 347
drug treatment of HIV infection, in which each compu- 348
tational domain may contain different CA rules, in con- 349
trast to normal uniform CA models. Ordinary (or par- 350
tial) differential equation models are insufficient to de- 351
scribe the two extreme time scales involved in HIV in- 352
fection (days and decades), as well as the implicit spatial 353
heterogeneity. Zorzenon dos Santos et al. [7] reported a 354
cellular automata approach to simulate three-phase pat- 355
terns of human immunodeficiency virus (HIV) infection 356
consisting of primary response, clinical latency and onset 357
of acquired immunodeficiency syndrome. We developed a 358
non-uniform CA model to study the dynamics of drug 359
therapy of HIV infection, which simulates four-phases 360
(acute, chronic, drug treatment responds and onset of 361
AIDS). Our results indicate that both simulations (with and 362
without treatments) evolve to the same steady state. Three 363
different drug therapies (mono-therapy, combined drug 364
therapy and HAART) can also be simulated in our model. 365
Our model for prediction of the temporal behaviour of the 366
immune system to drug therapy qualitatively corresponds 367
to clinical data. 368

Pseudo Code 1a: HI Model (Adapted from Zorzenon dos 369
Santos R. M., Phys. Rev. Let. 2001). H = healthy cell, 370

A1 and A2 are infected cells at different time steps. 371

Assume: {H, A1(t), A2(t+ τ ), D}; 1 time-step = 1
week; Simulation of lymph-node;
Moore neighbourhood and square
lattices used

Rule 1: (a) If it has at least one infected-A1
neighbor, it becomes infected-A1

(b) If it has no infected-A1 neighbor but
does have at least R (2 < R < 8)
infected-A2 neighbors, it becomes
infected-A1

(c) Otherwise it stays healthy
Rule 2: An infected-A1 cell becomes infected-A2

after τ time steps
Rule 3: Infected-A2 cells become dead cells
Rule 4: (a) Dead cells can be replaced by healthy

cells with probability prepl in the next step.
(b) Each new healthy cell introduced may

be replaced by an infected-A1 with
probability p infec

372373

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

6 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

This CA (Pseudo-code 1a) mimics in a simple way the374
dynamical properties of a HIV infection; next we intro-375
duce drug therapy into the model by modelling a response376
function Presp and changing only rule 1.377

Pseudo Code 1b: Advanced HI Model, taking into378
account drug therapy effects.379

Rule 1:
(a) If there is one A1 neighbor after the starting of drug

therapy, N(0 ≤ N ≤ 7) neighbor healthy cells become
infected-A1 in the next time steps with probability presp.
Otherwise, all of eight neighbors become infected-A1.

N represents effectiveness of drugs.
N = 0: no replication;
N = 7: less effective for the drug.
Presp (t − ts ) represents certain response function of drug

effects over the time steps (t). The ts is the starting of
treatment.

380381

The main success of the presented CA model is the ad-382
equate modeling of the four-phases of HIV infection with383
different time scales into one model. Moreover, we could384
also integrate all of the three different therapy procedures.385
The simulations show a qualitative correspondence to clin-386
ical data. During the phase of drug therapy response, tem-387
poral fluctuations for N > 3 were observed, this is due to388
the relative simple form of the response distribution func-389
tion (Pdis)applied to the drug effectiveness parameter N390
at each time-step. The simulation results indicate that, in391
contrast to ODE/PDE, our model supports a more flexible392
approach to mimic different therapies through the use of393
mapping the parameter space of Pdis to clinical data. There-394
fore there is ample room to incorporate biologically more395
relevant response functions into the model. The data inte-396
gration required for the CA, the parametric computation397
and the data presentation are supported by the Grid.398

2.1.2. Multivariate stochastic modeling399

The modeling of Human Immunodeficiency Virus400
(HIV-1) genotype datasets has a goal to identify patterns401
of mutations (or naturally occurring polymorphisms) as-402
sociated with resistance to antiviral drugs and to predict403
the degree of in-vitro or in-vivo sensitivity to available drugs404
from an HIV-1 genetic sequence. The statistical challenges405
in doing such analyses arise from the high dimensionality406
of these data. Direct application of the well-known genetic407
approaches [5] to analysis of HIV-1 genotype results in a lot408
of problems. Principal difference is in the fact that, in HIV409

DNA analysis, the main scope of interests is the so-called 410
relevant mutations – a set of mutations, associated with the 411
drug resistance. These mutations might exist in different 412
positions over the amino-acid chains. Moreover, the sheer 413
complexity of the disease and data require the development 414
of the reliable statistical technique for its analysis and mod- 415
eling. A multivariate stochastic model for describing the 416
dynamics of complex non-numerical ensembles, such as 417
observed in the (HIV) genome, has been developed in [6]. 418
This model was based on principle component analyses for 419
numerated variables. Generally speaking, the interpretation 420
of numerated variables in terms of relevant mutations is not 421
clear. Below we develop this model directly for the ensem- 422
ble of relevant mutations in the RT and protease parts of 423
the HIV-1 genome. Each element of the ensemble is pre- 424
sented as the cortege �k = {ξ j }n kj =1, k = 1, M with the 425
variable dimension n k -the total number of the mutations 426
in the gene. Each value ξk is a literal index and corresponds 427
the position and new value of the amino acid (e.g., 184 V, 428
77I, etc.). It allows to associate each mutation with the cat- 429
egorical random variable i ∈ 1 . . . K , where K is the total 430
number of possible mutations. Each sub sample of genomes 431
with a fixed number of mutations n = const may be con- 432
sidered as the realizations of a categorical random vector. 433

The representation above is based on the proximity to the 434
“wild-type” virus and takes into account only the relevant 435
mutations in a genome. It allows for significant compression 436
of the DNA representation and simplifies the interpretation 437
of the results. 438

Principle of the modeling approach. The joint variability of dif- 439
ferent mutations in the HIV-1 genomes is a complicated 440
phenomenon. The dimension of the probabilistic charac- 441
teristics is high, and its analytical investigations and inter- 442
pretation are hard. Hence, for the studying of HIV-1 pop- 443
ulations we use a computational statistical approach that 444
allows to numerically generate an ensemble with the same 445
probabilistic properties by means of a Monte-Carlo pro- 446
cedure. This is a well-known powerful method to study 447
complex system variability. 448

The idea of the stochastic modeling is shown in the 449
Figure 5. It is based on the evolutionary hypothesis, consid- 450
ering the group with n + 1 mutations as subgroup of group 451
with n mutations in a previous step. For each gene the tran- 452
sit from n to n + 1 mutation groups is driven by a stochastic 453
operator D(n+1), which defines the mutations on the n + 1 454
step, when the mutations on the previous n steps are known. 455
The initial step of the stochastic procedure begins from the 456
whole ensemble of wild-type viruses. The number of the 457
genomes that has been mutated at each step of the stochas- 458
tic procedure is in accordance with Mn = ρn M, where ρn 459
are the probabilities of the occurrence of genotypes with n 460
mutations in a total population of M genes. 461

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 7

Fig. 3. Temporal behaviour of the CD4 count, with modeled Brownian
movement for lymphocytes [8].

Fig. 4. As in Figure 3, with additionally modeled mono therapy in week
300 [8].

Fig. 5. Principle of the modeling.

The stochastic operator D may be considered as a “black462
box”. It is formalized in terms of the conditional probabil-463
ities of the occurrence of mutation ξi , if the mutation ξ j464
arise in the previous step of the generation. For genotypes465
with 2 mutations only the values Di j are the conditional466
probabilities of the pairs. In this case the matrix {Di j } is467

the transition Markov probability matrix, containing the 468
conditional probabilities for simple Markov chains with 469
the number of these states corresponding to quantity of 470
the relevant mutations. In more complicate cases, where 471
n > 2, the probability matrix {Di j } consists of the con- 472
ditional probabilities to meet mutation ξ j in certain gene, 473
when the mutation ξi is present. 474

This approach allows us to reduce the complicated sta- 475
tistical description of the dataset to a rather simple model, 476
using only three probabilistic distributions as the initial pa- 477
rameters of the model: distribution of number n of the 478
mutations ρn ; 479

• distribution P (1)ξ for the relevant mutations in the group 480
n = 1; 481

• transient probability matrix D. 482
All these parameters might be identified on the sample 483

datasets of the HIV-1 population. 484

Identification of the model. For the identification of parameters 485
of the model, a large database of HIV-infected patients, col- 486
lected over several years in USA, is used [4]. These databases 487
contain genotypes of 43620 patients examined from Au- 488
gust 9, 1998 to May 5, 2001. We observed 59 different 489
mutations in the RT genome, including 17 mixed muta- 490
tions, and 77 different mutations in the protease genome, 491
including 34 mixed mutations. 492

Distribution ρn of number of mutations. The practice of HIV 493
treatment however, has shown that the variability of the 494
number of mutations n is high, due to the complexity of 495
the drug combinations that has been applied. The sample 496
estimate of distribution ρn of the number of mutations in 497
protease is shown in the Figure 6. It is seen, that the distri- 498
butions have a clear first peak (n = 1), and a shelf (or second 499
peak), corresponding to n = 3 ÷ 5. Therefore we expect 500
that there are two groups of genomes in the database, cor- 501
responding to the low and high number of mutations. The 502
possible interpretation of the discovered bi-modal distri- 503
bution is that we have two groups of patients. One group 504
is the “new” patients who had one or two treatments, thus 505
their genotype contains relative small numbers of muta- 506
tions. The second group is the “old” patients, which have 507
a long treatment history, or new patients, infected through 508
treated HIV-1 patients [15]. 509

Distributions of the relevant mutations Pξ . Distribution ρn al- 510
lows describe the variability of the groups of the “new” 511
and “old” patients, only. For a more detailed study of the 512
virus mutations driving by the certain drugs combinations, 513
the probabilities of occurrence of the relevant mutations 514
ξ should be considered. They are estimated by the sample 515

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

8 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

Fig. 6. Statistical description for distribution of mutations in Protease.

frequencies:516

Pξ =
{Number of genes with mutation ξ }

M
. (1)

Here M is the total number of genomes in the dataset.517
Equation (1) describes the marginal impact of each muta-518
tion in the total population, without any information about519
number and occurrences of other mutations. The prob-520
abilities of the most significant relevant mutations ξk (in521
decreasing order of its probability) are shown in Figure 6.522
The marginal estimates of Pξ over the total dataset show523
only general impacts of the mutations. For a detailed524
analysis of its behavior we also consider the occurrences525
P

(n)
ξ of mutations in the groups of genotypes with exactly526

n mutations. These values were computed also by means527
of Equation (1), where M

def= Mn = ρn M – the number of528
genes with n mutations in a database. The sample estimates529
of these occurrences are also shown in the Figure 1. It is530
clearly seen that the inputs of some mutations are rather dif-531
ferent for different n, both for the protease and RT parts of532
the genome. E.g., for RT, for n = 1, the mutations 184 V533
and 103 N have the main input. The distribution P

(1)
ξ is the534

limit distribution from the procedure shown in Figure 5.535
From Figure 1 we also observe that the total sum536 ∑
k Pξk > 100%, excluding case n = 1. This demonstrates537

that the analysis of the marginal mutations is not enough538
for general statistical description of all DNA ensemble vari-539
ability, because some positions of DNA may be statistically540
dependent [15], especially in relation to viral fitness. Hence,541

the joint characteristics of its variability must be taking into 542
account. 543

Transient probability matrix D.The conditional probability of 544
the occurrence of mutation ξi , if the mutation ξ j arises 545
from the previous steps of the generation, is estimated by: 546

Di j =
{Number of genotypes with mutations ξi and ξ j simult&aneously}

{Number of genotypes with mutation ξi }
.

(2)

547
The dimensionality of the related matrix, obtained from 548
Equation (2), may be rather high. In order to decrease 549
the dimensionality we consider the algebraic technique of 550
orthogonal expansion, applied to transient probability ma- 551
trices [16]. 552

D = ��1/2�. (3)

where � are the eigenvectors of matrix DDT , and �-of 553
matrix DT D. It allows considering the coefficients a k = 554√

λk as the principal components (PC) [13], and represents 555
the probability (2) as a series: 556

Di j =
∑

k

√
λk φi k ψ j k . (4)

The values λk shows the part of the probability, explained 557
by k-th PC. The sum of the first k-th coefficients λk may 558
be interpreted as a measure of convergence of the series 559
(4). In Table 2 the values of the first 7 λk for the RT and 560
protease parts of the HIV-1 genome are shown. These data 561
were obtained for the total database. It can be seen that the 562
series (4) converges rather fast in both cases: e.g. for the RT 563
part only the first term of the series explain more 60% of 564
conditional probability (the first five terms explain 80%). 565

Let us consider the normalized bases φ̃i k = λ0.25k 566
φi k , ψ̃ j k = λ0.25k ψ j k . It allows to present the terms in Equa- 567
tion (4) as the p i jk = φ̃i k ψ̃ j k and interpreted these values 568
as the independent factor loadings, driving the changes of 569
the conditional probability Di j over all the mutations ξi , ξ j 570
in the database. For example, in the Figure 7 the estimates 571

Table 2. Normalized (%) values of the expansion coefficients λk in
Equation (4)

# of PC

Part of the genome 1 2 3 4 5 6 7

RT 61.3 8.2 5.4 2.8 2.1 1.7 1.6
Protease 55.0 6.3 4.5 4.2 3.4 2.7 2.4

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 9

Fig. 7. Orthogonal basic functions of expansion (4) for transient probability
matrix.

of the first basic functions are shown for RT and protease572
parts of the genotype (the input of multiplication of func-573
tions are in the Table 2). It is clearly seen, that the first574
term p i j1 = φ̃i 1ψ̃ j 1 reflects the total occurrence of the mu-575
tations in a genotype (see Figure 6): for the mutations with576
the maximal occurrences the input to conditional proba-577
bilities of its pairs is also high.578

Model validation. The simulation model is based on the579
ρn , P

(1)
ξ , D distributions of the mutations only. No infor-580

mation of more complicate mechanisms (distributions of581
pairs, triples, etc.) has been used for this identification.582

The main goal of the verification is the possibility to583
reproduce these features of the ensemble through the de-584
pendencies formalizing the matrix D. We compared the585
total occurrences of all mutations in genotypes, estimated586
on the initial and simulated samples, see also Figure 6 (solid587
line). It is seen, that the results of the simulation and sample588
are rather close.589

The error of the simulation increases proportionally to590
absolute value of the occurrences. Nevertheless, for some591
cases the error of the simulation is larger then the boundary592

of the confidence interval. This systematic error may be 593
explained by possible variations in matrix D for groups of 594
the “old” and “new” patients. 595

Application to forecast of HIV-1 evolution in time. The evolu- 596
tion of total world populations of HIV-1 and the associ- 597
ated changing of the related drug resistance levels should 598
be taken into account. The stochastic models, used to de- 599
scribe the HIV-1 genotype ensemble in terms of parame- 600
ters and shown in the Figure 5, can be used for the analysis 601
of its temporal variability during the observation period 602
(VIII.1998–V.2001). The temporal variability of the data 603
may be considered in terms of the samples of the seasons 604
(3-months periods). The volumes of seasonal samples are 605
from 1500 till 4500 genotypes; that is enough for obtain- 606
ing the stable estimations. Only the hypothesis of linear 607
trends is considered: ξ (t ) = a t + b + δ(t ), where a is the 608
most interesting parameter—value of the trend, b is the 609
shift parameter, and δ is the white noise. In the Table 3 the 610
integral parameters of trends of the various parameters of 611
the HIV-1 population (mean value of the parameter, value 612
of the trend, determination coefficient R2 and the sample 613
value of F-criterion) are shown. 614

Trends of single mutations occurrence Pξ . The database allowed 615
us to investigate trends in codon frequency in the period 616
of 1998 till 2001. Results for Protease and RT are shown 617
in Table 3. The majority of the mutations in the genotype 618
have a negative trend, only 77I in Protease has significant 619
positive trend. 620

Trends of bi-modal distribution for number of mutations in geno- 621
types ρn . For the decreasing of the data dimensionality and 622
the statistical discrimination of two groups in the dataset 623
we consider the model of the mixture of two Bernoulli 624
distributions: 625

ρn = p g Ckm 1 q k1 (1 − q 1)m 1−k
+ (1 − p g )Ckm 2 q k2 (1 − q 2)m 2−k (5)

where p g is an input of the first group of mutations (and 626
p g is an input of the second group, m 1, m 2-are maximal 627
numbers of mutations in groups and q 1, q 2-are probabil- 628
ities to find each one (arbitrary) mutation in the groups. 629
The use of Bernoulli distribution logic (based on the rep- 630
etition of the independent events) is more close to the 631
description of the mutation process, then the Poisson dis- 632
tribution, generally applying to description of rare events. 633
Temporal variability of the parameters ( p , q 1, q 2, m 1, m 2)t 634
of the ρn approximation by Equation (5) are shown in 635
Table 3. In both cases only the parameter p g (weight 636
of the left part for group of m1 mutations) has a clear 637

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

10 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

Table 3. Trend analysis of the parameters of the HIV-1 genotype population (F is compared with Fisher’s test F(1,31,95%) = 4.14)

Occurrence of mutations, % pg , %, Coefficients
√

λk , Equation (4)

Parameter 77I 90M 10I 71V Equation (5) k = 1 k = 2 k = 3

Protease part
Mean 37.78 32.69 27.97 23.64 48 5.78 1.67 0.83
a (1/month) 0.20 −0.43 −0.72 0.32 0.74 0.13 0.06 0.06
R2 0.68 0.91 0.61 0.82 0.67 0.80 0.73 0.54
F 16.7 77.6 9.6 47.1 64.0 23.6 26.8 11.8

RT part
41L 215Y 103N 67N k = 1 k = 2 k = 3

Mean 32.86 31.37 30.66 27.21 47 6.65 2.20 2.08
a (1/month) −0.51 −0.50 −0.32 −0.39 0.49 0.11 0.17 0.07
R2 0.88 0.93 0.88 0.84 0.75 0.68 0.78 0.71
F 57.4 98.7 59.8 41.8 94.3 21.4 36.1 25.3

significant positive trend. For protease value p g increased638
from 39% in Summer, 1998 to 62% in Summer 2001639
(with average increment a = 0.74% per month). Taking640
into account trends for separate mutations we observed a641
“degradation” of genotypes: the number of patients with642
simple genotypes (small number of mutations) is growing643
but a number of patients with big count of mutations is644
decreased.645

Trends of transient probabilities D. The analysis of the trends of646
parameters for distribution (1) shows that the input of the647
first group of mutations with low number n is increased.648
Hence, it may be a consequence of the temporal variations649
of the interdependencies between different mutations, gov-650
erned by the developing of the drug therapy. For the anal-651
ysis of these hypothesis, let us consider the trends for the652
matrix D, Equation (2). Taking into account the expan-653
sions (3, 4), we may reduce the complicate problem for654
joint trend analysis for components Di j to the procedure655
of trend analysis for independent time series – components656
of expansions (4). From the Table 3 it can be seen, that all657
the components have a clear positive trends. Taking into658
account the shape of first bases functions, see Figure 7, it is659
clear, that generally the joint probabilities Di j of the mu-660
tations is increased also; moreover, the power of increasing661
corresponds to the total occurrences of the mutation in the662
ensemble.663

The discrimination of the groups of “old” and “new”664
patients in terms of bi-modal distribution (5) allow to fore-665
cast the growth of the total number of HIV-infected people666
in time:667

N(t ) = Nnew
patients

(t ) + Nold
patients

(εt ), ε � 1. (6)

Here ε – is the slow time parameter, which shows the rapid 668
increasing of the new patients group in comparison with 669
the old patients. The part of “new” patients of the sample 670
is p g (old patients−(1 − p g )) from (5). Hence, the growth 671
curve is: 672

N(t ) = Nold
patients

(0)
[
1 + p g (t )

1 − p g (t )
]
, (7)

where p g (t ) = p0 + a g t -is the linear trend with the pa- 673
rameters from Table 3, and N old

patients
(0) is the initial value of 674

“old” (treated) patients on the beginning of the forecast. 675
In Figure 8 the “crucial” forecast of the HIV-1 popula- 676

tion growth are shown. It is based on the fact that altogether 677

Fig. 8. Qualitative forecast of HIV-1 population grows. 1 – mean value
(7), 2 – 90% confidence interval.

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 11

42 million people worldwide have been infected with HIV678
at the beginning of XXI century, and 12 million have died679
over the last 20 years. Moreover, not taken into account680
is the arising of new drugs and different prophylactic and681
social preventive activities for restriction of HIV-1 infec-682
tion. Really, this result is qualitative only; for quantita-683
tive conclusions the more sophisticated research should be684
done.685

3. RESULTS686

3.1. Data presentation: Roaming PDA access687

3.1.1. User Scenario688

RetroGramTM (www.retrogram.com) is a unique HIV-689
genotype expert based interpretation software program,690
which weighs the effect of specified genotype changes on691
clinical drug activity. It accepts a list of substitutions to the692
protease and reverse transcriptase genes with respect to the693
NL4-3 reference strain. This is accomplished by running694
a “simulation”, which applies some hundred rules relat-695
ing substitutions on the HIV genome to knowledge of696
effects on drug response. The latter comes from over hun-697
dreds of references from the clinical literature. The rules are698
checked against the reported substitutions, and each drug is699
evaluated for its suitability. In a later stage we added Web-700
access where a Web interface is used to submit the input701
and take out the output. We want to make the simulations702
wireless-accessible. Developing a wireless Internet version703
from scratch will not be cost-efficient and causes maintain-704
ability problems. For example, the rules mentioned above705
are often changed and these changes have to be reflected in706
both versions. Furthermore, for privacy and security rea-707
sons the developer is not granted access to the source code708
of the “simulation”. Thus, it is much more convenient to709
have wireless access to the Web-based interface. In this case710
the “simulation” take places in a unique server and privacy711
and security are guaranteed. A typical user scenario is de-712
scribed below and the associated graphical representation713
of the Retrogram Web access is given in Figure 9.714

After the user has successfully logged in, the Patient Detail715
page is displayed (Figure 10). The form, taking place in716
this page is used to enter the personal data of the patient.717
Two fields are required in the form, Patient ID and Data of718
Sample.719

According to the information taken from the laboratory720
the user enters the laboratory test results (i.e. Protease or721
RT substitutions) for the patient in the Laboratory Informa-722
tion page. Next a script invoked on the server does the723
following:724

Fig. 9. Web-based Retrogram use case sequence.

Script 1: Server validation script 725

Validate inputs:
Validate Protease or RT substitutions if they conform

to certain rules.
A single substitution should be represented by an

integer (for position in the gene) and a letter (for the
amino acid). The position in the gene is in the rage
from 1 to 99 for Protease position and from 1 to 599
for RT position. The amino acid code is one of the
following codes: A C D E F G H I K L M N P Q R
S T U V W Y.

Submit the inputs to the “simulation” program and
take back the drugs ranking result.

Show the Drugs ranking result in the ‘HIV Therapy
decision support’ screen:

After applying certain rules on the laboratory test
result return to the final drugs ranking or drug’s level
of suitability indication as follows:

A (green): This drug can be used
B (yellow): Consider use if no class A drug available
C (amber): Consider use if no class A or B drug

available
D (red): Consider use if no class A, B or C drug

available
U (grey): Unranked, insufficient data available

726

In the ‘HIV Therapy decision support’ screen, clicking on 727
any drug name in the ranking lists will display a list of avail- 728
able references from the scientific literature supporting the 729
particular ranking for that drug. In the ‘HIV Therapy deci- 730
sion support’ screen, clicking on the ‘Interpret substitution’ 731
button will show classification of the patient’s substitutions 732
into relevant, natural or additional. 733

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

12 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

Fig. 10. Web Retrogram: user enters patient substitutions (left), drug ranking results (right).

3.1.2. Roaming, wireless access734

In the designing phase of wireless versions of the application735
the constraints of the mobile devices should be considered.736
At the same time we have tried to maintain the same level737
of usability and readability as in the original Web version.738
This is accomplished by maintaining the same structure as739
that in the Web but with some modifications. For example,740
the Patient detail form has many fields and putting them741
in one screen would cause problems in the usability of742
the program (it’s supposed that the mobile device has a743
resolution comparable to a normal PDA, i.e., something744
around 160 × 160 pixels). Thus we use three screens for745
Patient Detail data. The Patient Detail Web page has 2746
required fields. We put them in the first screen after the747
‘login’ screen. In this way, if the user is not interested in748
entering optional data, she can directly go to the Laboratory749
Information.750

Proxy method Implementation. A Proxy method is imple-751
mented for accessing the web-based software from mobile752
devices. The Proxy server takes places between the remote753
server (the Retrogram server) and the mobile device. A754
mininavigator script developed in the Proxy is responsible755
for the following:756

• Take the patient data from the mobile user (i.e. patient757
detail, laboratory information)758

• Create an HTTP communication with the remote759
server,760

• Submit data to the remote server. These data are basically761
the input for the Retrogram ‘simulation’.762

• Take the result from the remote server (HTML code763
generated from retrogram.asp script),764

• Parse HTML code and retrieve only relevant informa-765
tion (i.e. drug ranking, error messages, drug references766

etc.). It uses this relevant information to build wireless 767
pages (i.e. WML page in case of WAP or Web-clipping 768
page). 769

• Send the wireless pages to the mobile device. 770

The Proxy is implemented using PHP: Hypertext Pre- 771
processor as a server-site scripting language [9–11] running 772
on the Apache Web server [12]. 773

Two versions are developed using the Proxy method: 774
WAP version and web clipping. If a user wants to enter the 775
‘patient details’ fields, he has to move from one screen to 776
the other and come back again. The fields already filled in 777
the previous screens should not be lost. Thus maintaining 778
the client’s state is necessary. In the WAP case we simply 779
use cookies but in web clipping cookies are supported only 780
in PALM OS 4.0 version or higher. For this reason the 781
“hidden field” method is used this is another method used 782
for maintaining state in the Internet. The following figures 783
are the user interfaces that have been captured. They track 784
the user’s path through the running of the application, as 785
shown in Figures 11(a) and 11(b), where the user enters 786
the patient’s details and accesses ranking results. 787

J2ME Implementation. The same user interface is applied in 788
the J2ME implementation. There are two main differences 789
between the J2ME implementation and the Proxy one: 790

1. J2ME enables the device to communicate directly to 791
the Retrogram server without an intermediate Proxy 792

2. In J2ME the client’s interface is contained within the 793
device. In the Proxy method, every time the interface 794
should be changed, the Proxy is responsible for gener- 795
ating a new page. 796

The following illustrates the necessary steps one should 797
take in order to fetch an HTML page generated from a 798

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 13

Fig. 11. (a) User corrects the input and submit again (left), drug ranking re-
sults (right). (b) Users clicks to the drug ‘indinavir’ (left), references supporting
this ranking (right).

script in the remote host. Specifically this is an example799
illustrating how the user can login to a script in the Ret-800
rogram server and extract the cookie from the header re-801
sponse:802

1. Open an HTTP connection803
2. Open an input stream804
3. Make an HTTP POST request805
4. Extract the cookie from the header response806
5. Close the connection807

In the J2ME implementation of Retrogram the entire808
client’s interface takes places in the device. The connec-809
tion to the server is established in the following cases: user810
login, with connection with the server is necessary in order811
to validate the user and/or password. The user submits the812

Fig. 12. J2ME method; user enters patient’s substitutions (left), drug ranking
results (right).

username and password, and the application judges them 813
for their correctness by scanning the HTML response from 814
the Retrogram server. The user submits the patient’s lab- 815
oratory information data. The application should connect 816
to the server in order to submit the data, take the result 817
(HTML format) and extract the drugs ranking. Next the 818
user looks for the references that suggest a certain drug 819
ranking. The database with all the references exists in the 820
Retrogram server, therefore the connection is necessary. 821
The application submits to a Retrogram script the cookie 822
and the name of the drug. The drug references are given 823
back from the server in HTML format. The application 824
should clean up the HTML tags and show the references 825
as plain text. Finally the user looks for classification of the 826
patient’s substitutions. This classification is part of the Ret- 827
rogram ‘simulation’ and thus the connection to the server 828
is still necessary. In Figure 12 we illustrate the process of 829
taking the drugs ranking using the J2ME method. 830

Currently we have the J2ME version in use for different 831
users to study the usability and extendibility. More details 832
on the implementation can be found in reference [13]. 833

3.2. Virtual laboratory infrastructure 834

3.2.1. A virtual organization for retrogram-centered workflow 835

Grid technology is a major cornerstone of today’s com- 836
putational science and engineering, with its basic unit of 837
Grid organization called the Virtual Organization (VO). 838
A VO is a set of Grid entities, such as individuals, appli- 839
cations, services or resources, which are related to each 840
other by some level of trust. In the most basic example, 841

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

14 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

Fig. 13. A Retrogram-centered workflow.

service providers would only allow access to the mem-842
bers of the same VO. We are currently building a dis-843
tributed Grid-based overall decision support infrastructure844
to support the Retrogram-centered workflow shown in845
Figure 13.846

This VO will offer a Grid virtual laboratory that will847
assist users in the interpretation the genotype of a patient848
by using rules developed by experts on the basis of the lit-849
erature, taking into account the relationship between the850
genotype and phenotype. The workflow is based on highly851
distributed available data from clinical studies and on the852
relationship between the presence of genotype and the clin-853
ical outcome. In order to cover the fast temporal and spatial854
scales required to infer information from a molecular (ge-855
nomic) level up to patient medical data multi-scale methods856
are applied, where simulation, statistical analysis and data857
mining are combined and used to enhance the rule-based858
decision. In this scenario, information sources are widely859
distributed, and the data processing requirements are highly860
variable, both in the type of resources required and the pro-861
cessing demands. Experiment design, integration of infor-862
mation from various sources, as well as transparent schedul-863
ing and execution of experiments will be supported by this864

support system based on distributed Grid middleware. The 865
DAS2 testbed (Netherlands) will initially provide the addi- 866
tional computational power for our compute intensive jobs. 867
We will reuse Grid middleware from successful European 868
projects such as CrossGrid (www.crossGrid.org) and VL-e 869
(www.vl-e.nl) to provide basic Grid services of data man- 870
agement, resource management, and information services 871
on top of Globus. For transparent use of this infrastructure 872
we will build a presentation layer that will provide a user- 873
friendly interface to both medical doctors and scientists. 874

4. DISCUSSION 875

4.1. Conclusions and future work 876

In this paper we discussed an integrative approach to bio- 877
medicine at large and to infectious diseases in particular. 878
We showed how in the understanding of processes ‘from 879
molecule to man’ Grid technology can play a crucial role. 880
In order to cover the fast time and spatial scales required to 881
infer information from a molecular (genomic) level up to 882

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Sloot et al.: Grid-Based HIV Expert System 15

patient medical data, we need to apply multi-scale meth-883
ods where simulation, statistical analysis, data-mining is884
combined in an efficient way. Moreover the required in-885
tegrative approach asks for distributed data collection (e.g.886
HIV mutation databases, patient data, literature reports etc.)887
and a virtual organization (physicians, hospital administra-888
tion, computational resources etc.). Also the access to and889
use of large-scale computation (both high performance as890
well as distributed) is essential since many of the compu-891
tations involved require near real-time response and are892
to complex to run on a personal computer or PDA. Fi-893
nally data presentation is crucial in order to lower the894
barrier of actual usage by the physicians, here the Grid895
technology (server-client approach) can play an important896
role.897

Although many of the aspects discussed in this paper898
have proven to work in concept, the complete integration899
of the systems and the evaluation of day-to-day use is900
still under development [17]. In addition each of the901
underlying methods (Rule-based, statistical and CA based902
models) remain topics of further studies. We will set up a903
use-base with the system described running under various904
European Grid testbeds. The first testbed we will use is905
the so-called DAS2, and eventually the CrossGrid testbed,906
which supports specific features for interactive computa-907
tion, an essential ingredient for a medical decision support908
system.909

The authors gratefully acknowledge Fan Chen and Ferdinand910
Alimadhi for assistance in implementing the CA models and911
the roaming PDA access. The Dutch Virtual Laboratory on e-912
science project supported parts of the research presented here:913
http://www.VL-e.nl.914

GLOSSARY915

Grid: Distributed architecture for solving computational916
problems by making use of the resources from the mem-917
bers of a virtual organization, treating them as a virtual918
cluster.919

CA: Cellular Automata, a discrete model studied in com-920
putational theory and mathematics, which consists of921
regular grid of cells, each in one of a finite number of922
states.923

Decision Support System: Computer-based system that924
helps in the process of decision-making.925

Web Interface: User interfaces for information available via926
the web.927

Proxy: Computer service which allows clients to make in-928
direct network connections to other services.929

HTTP: Hyper Text Transfer Protocol, a request/response 930
protocol for transferring information on the Web. 931

HTML: Hyper Text Markup Language, a markup language 932
designed for the creation of web pages. 933

WML: Wireless Markup Language, a markup language 934
used in mobile phones. 935

J2ME: Java 2 Platform Micro Edition, a collection of Java 936
interfaces for embedded consumer appliances such as 937
cellular phones. 938

DAS2: Distributed ASCI Super Computer 2, a wide-area 939
distributed computer connecting 5 Dutch Universities. 940

REFERENCES 941

1. Zhao Z, Belleman RG, van Albada GD, Sloot PMA. AG-IVE: 942
An Agent-Based Solution to Constructing Interactive Simula- 943
tion Systems, in Series Lecture Notes in Computer Science, 944
April 2002; 2329: 693–703. 945

2. Durant J, Clevenbergh P, Halfon P, Delguidice P, Porsin S, 946
Simonet P, Montagne N, Dohin E, Schapiro JM, Boucher 947
C, Dellamonica P. Improving HIV therapy with drug resis- 948
tance genotyping: The Viradapt Study. Lancet 1999; 353: 2195– 949
2199. 950

3. Sevin AD, DeGruttola, Nijhuis M, Schapiro JM, Foulkes AS, 951
Para MF, Boucher CAB. Methods for Investigation of the Re- 952
lationship between Drug-Susceptibility Phenotype and Human 953
Immunodeficiency Virus Type 1 Genotype with Applications 954
to AIDS Clinical Trials Groupw 333. The Journal of Infectious 955
Diseases 2000; 182: 59–67. 956

4. The Genotype database is obtained from a large service testing 957
laboratory from the US. It contains the resistance profiles of the 958
Protease and Reverse Transcriptase genes of the HIV-1 virus 959
obtained from plasma samples of HIV-1 infected patients. No 960
clinical background information on medication or drug history 961
is available. 962

5. Mathematical Methods for DNA Sequences. In. Waterman MS, 963
eds. CRC Press Inc., Boca Raton, Florida, 1999. 964

6. Kiryukhin I, Saskov K, Boukhanovsky AV, Keulen W, Boucher, 965
CA, Sloot PMA. Stochastic modeling of temporal variability of 966
HIV-1 population. In: Sloot PMA, Abrahamson D, Bogdanov 967
AV, Dongarra JJ, Zomaya AY, Gorbachev YE, eds. Compu- 968
tational Science – ICCS 2003, Melbourne, Australia and St. 969
Petersburg, Russia, Proceedings Part I, in series Lecture Notes 970
in Computer Science, vol. 2657, pp. 125–135. Springer Verlag, 971
June 2003. ISBN 3-540-40194-6. 972

7. Zorzenon dos Santos RM, Coutinho S. Dynamics of HIV infec- 973
tion: A cellular automata approach. Phys Rev Lett 2001; 87(16): 974
168102–1–4. 975

8. Sloot PMA, Chen F, Boucher CA. Cellular automata model 976
of drug therapy for HIV infection. In: Bandini S, Chopard 977
B, Tomassini M, eds. 5th International Conference on Cellu- 978
lar Automata for Research and Industry, ACRI 2002, Geneva, 979
Switzerland, October 9–11, 2002. Proceedings, in series Lecture 980
Notes in Computer Science, vol. 2493, pp. 282–293. October 981
2002. 982

9. PHP: Hypertext Preprocessor: http://www.php.net. 983

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

16 Journal of Clinical Monitoring and Computing Vol xxx No xxx 2005

10. The resource for PHP developers: http://www.phpbuilder.com984
11. Zend Technologies – PHP tools for the development, pro-985

tection and scalability of PHP applications – PHP for Linux,986
Unix and Apache, Encoder, Accelerator Studio, Debugger:987
http://www.zend.com.988

12. The Apache Software Foundation: http://www.apache.org.989
13. Alimadhi F. Mobile Internet: Wireless access to Web-990

based interfaces of legacy simulations, MSc thesis, Uni-991
versity of Amsterdam, The Netherlands, September 2002:992
http://www.science.uva.nl/research/pscs/papers/master.html.993

14. Cross-Grid: Grid technology of Interactive Distributed Com-994
putation: http://www.eu-crossGrid.org/.995

15. Little SJ, Holte S, Routy JP, Daar ES, Markowitz M, Collier AC,

Koup RA, Mellors JW, Connick E, Conway B, Kilby M, Wang 996
L, Whitcomb JM, Hellmann NS, Richman DD. Antiretroviral- 997
drug resistance among patients recently infected with HIV. N 998
Engl J Med 2002; 8;347(6): 385–394. 999

16. Karlin S. A First Course in Stochastic Processes. Academic Press. 1000
NY-London, 1968. 1001

17. Sloot PMA, Boucher CA, Kiryukhin I, Saskov K, 1002
Boukhanovsky AV. A grid-based problem-solving envi- 1003
ronment for biomedicine. In: Nørager S, ed. Proceedings of 1004
the First European HealthGrid Conference, January, 16th-17th, 1005
2003, pp. 300–323. Commission of the European Commu- 1006
nities, Information Society Directorate-General, Brussels, 1007
Belgium, 2003. 1008

AUTHOR'S PROOFS


UN
CO

RR
EC

TE
D

PR
OO

F

Query1009

Q1. Au: Pls. provide dates.1010

AUTHOR'S PROOFS